Common Information Model (CIM): A Shared Semantic Layer for Sustainable Digital Research Infrastructures

Adnan Tahir, Gonçalo Ferreira, Shashikant Ilager

GreenDIGIT Project  |  April 2026

 

The Problem: Data Everywhere, Consistent Understanding Nowhere

Digital research infrastructures (DRIs), including cloud platforms, HTC environments, and IoT and network infrastructure, support a wide variety of scientific workflows and applications, accelerating research. However, their environmental footprint in terms of energy and carbon emissions is significant. A key challenge in understanding and optimising these infrastructures is the lack of a standardised information model to collect, store, and interpret the operational metrics they generate. This includes everything from CPU utilisation and memory usage to energy consumption and environmental conditions. While this data is critical for monitoring and optimisation, it often exists in fragmented forms, with inconsistent naming, structure, and meaning across systems.

Consider a researcher comparing the energy efficiency of a workflow executed across three sites:

        Site A reports PUE-aligned metrics using ISO standards.

        Site B provides raw per-node power consumption logs.

        Site C exposes estimated energy per workload execution via a structured API.

All three datasets are valid. All three describe the same underlying reality. But they do not align. This fragmentation creates a fundamental barrier where systems cannot easily interpret each other’s data. As a result, comparing infrastructures, optimising energy usage, or reproducing experiments becomes far more difficult than it should be.

As digital infrastructures scale and sustainability becomes a central concern, the need for consistent and interpretable data has become unavoidable. Without a shared model, even basic questions, such as how much energy a system consumes, can produce inconsistent answers depending on how metrics are defined. This is not just a technical inconvenience; it creates real limitations in how infrastructures are managed and understood.

In practice, teams run into problems such as:

        Metrics that cannot be reliably compared across infrastructures.

        Sustainability indicators vary depending on how they are defined.

        Difficulty reproducing experiments under equivalent conditions.

        Increasing effort is spent on manual data mapping and interpretation.

The bottom line is straightforward: systems are effective at collecting data, but not at understanding each other’s data. This is the core problem the CIM, initiated as part of the GreenDIGIT project, was designed to solve by creating a shared semantic foundation, a common language for environmental metrics across DRIs.

 

What Already Exists, and What Is Still Missing

The current monitoring ecosystem includes mature tools and standards:

        Monitoring platforms such as Prometheus, Grafana, and OpenTelemetry provide operational visibility.

        Time-series databases like InfluxDB handle high-volume metric storage and querying.

        Semantic frameworks including Schema.org, JSON-LD, and RO-Crate, support structured data description.

        Standards such as ISO 30134, the JRC Code of Conduct, and SAREF define what should be measured.

Each plays a useful role, but each operates at a different layer. Monitoring tools collect and visualise. Databases store and query. Standards define measurement approaches. Semantic frameworks describe structure. None of them harmonise the meaning of metrics across systems. That cross-system semantic layer is what is missing today.

 

What CIM Does

The Common Information Model (CIM), developed as part of the EU Horizon GreenDIGIT project, addresses this gap. Rather than requiring systems to change how they generate data, CIM standardises how that data is interpreted. It acts as a translation layer so that different systems can speak their own language while still being understood in a common way.

Figure 1 illustrates how raw metrics from a system like DIRAC differ structurally from what the CIM standard produces, showing concretely where inconsistencies arise and what normalisation looks like in practice.

Figure 1. Raw metrics from a DRI system (left) compared to their CIM-normalised representation (right). Field names such as “Energy(kwh)” and “Site” are mapped to structured, namespaced identifiers like gd.energy.consumption.total and gd.environment.emissions.ci.

Figure 2 shows the CIM architecture across its five functional areas: data ingestion from heterogeneous RI sources, governance and policy alignment, analytics and reporting, interoperability via APIs and standards, and user access portals. Together, these feed into collaboration and impact assessment across the infrastructure.

Figure 2. Common Information Model for RI Metrics Unification, covering energy, carbon, sustainability, and efficiency metrics across compute systems, data centres, instruments, and facility management.

CIM’s core capabilities include:

        Unified namespace mapping: different metric names are mapped to a consistent structure.

        Hybrid classification: combines rule-based mappings with semantic inference to handle both known and novel metrics.

        Multi-format ingestion: supports APIs, structured files, logs, and unstructured data.

        Metadata enrichment via JSON-LD / RO-Crate: adds provenance and context to each metric.

        Continuous evolution: new metrics are detected, classified, and integrated automatically.

        Traceability and confidence tracking: every mapping is explainable, making the system auditable.

Full technical details are available in GreenDIGIT deliverables D4.2 and D5.2.

 

Why It Matters

For researchers, comparing results across infrastructures no longer requires reworking datasets. Experiments can be reproduced more reliably because the underlying metrics are defined consistently.

For infrastructure operators, monitoring and troubleshooting become more straightforward. Rather than dealing with mismatched terminology, systems can be analysed through a unified lens.

For sustainability and policy teams, the benefits are significant. Energy and emissions metrics can be tracked consistently and aligned with international standards, enabling more credible and data-driven decisions.

More broadly, CIM reduces complexity in data pipelines, makes integrations more robust, and improves adaptability as new metrics and technologies emerge.

 

Tools and Access

CIM is already operational. The following tools, APIs, and dashboards are available:

        CIM API Service

        KPI API Service

        CIM Metrics Dashboard

        Wattnet Dashboard

        CIM Pipeline on GitHub

 

Summary

Most infrastructures are built to collect data efficiently. CIM ensures that data can be understood consistently across systems, sites, and stakeholders. It is a foundational layer for interoperability, reproducibility, and sustainable operations in modern digital research infrastructures. As it matures, CIM can serve as a standard interoperability layer across research infrastructures, support federated sustainability monitoring, and connect operational data with scientific and policy insights.