Skip to content

A Prometheus exporter and a REST API server to export metrics of compute units of resource managers like SLURM, Openstack, k8s, _etc_

License

Notifications You must be signed in to change notification settings

mahendrapaipuri/ceems

Repository files navigation

Compute Energy & Emissions Monitoring Stack (CEEMS)

CI/CD ci CircleCI Coverage
Docs docs
Package Release
Meta GitHub License Go Report Card code style

Compute Energy & Emissions Monitoring Stack (CEEMS) (pronounced as kiːms) contains a Prometheus exporter to export metrics of compute instance units and a REST API server that serves the metadata and aggregated metrics of each compute unit. Optionally, it includes a TSDB load balancer that supports basic access control on TSDB so that one user cannot access metrics of another user.

"Compute Unit" in the current context has a wider scope. It can be a batch job in HPC, a VM in cloud, a pod in k8s, etc. The main objective of the repository is to quantify the energy consumed and estimate emissions by each "compute unit". The repository itself does not provide any frontend apps to show dashboards and it is meant to use along with Grafana and Prometheus to show statistics to users.

Install CEEMS

Installation instructions of CEEMS components can be found in docs.

Visualizing metrics with Grafana

CEEMS is meant to be used with Grafana for visualization and below are some of the screenshots few possible metrics.

Time series compute unit CPU metrics

Time series compute unit GPU metrics

List of compute units of user with aggregate metrics

Aggregate usage metrics of a user

Talks and Demos

Contributing

We welcome contributions to this project, we hope to see this project grow and become a useful tool for people who are interested in the energy and carbon footprint of their workloads.

Please feel free to open issues and/or discussions for any potential ideas of improvement.