Skip to content

Developer observability guide

This guide explains how to extend dpone runtime observability without creating god modules or leaking vendor-specific SDK assumptions into core runtime code.

Design rule

Commands are thin adapters. Business logic lives in dpone.observability.*. Exporter-specific rendering is isolated behind small classes.

classDiagram
    class RuntimeMetricsExtractor {
        +extract(run_report, metrics, labels)
    }
    class MetricPoint {
        +name
        +value
        +labels
        +description
        +unit
    }
    class PrometheusTextRenderer {
        +render(points)
    }
    class OpenTelemetryJsonRenderer {
        +render(points, service_name, namespace, resource_attributes)
    }
    class MetricsArtifactIndexService {
        +build(output_dir, artifacts)
    }
    class RuntimeMetricsExportService {
        +export(output_dir, run_report_path, metrics, labels)
    }
    class observability_cmd {
        +cmd_metrics_export(args, ctx, logger)
    }

    observability_cmd --> RuntimeMetricsExportService
    RuntimeMetricsExportService --> RuntimeMetricsExtractor
    RuntimeMetricsExportService --> PrometheusTextRenderer
    RuntimeMetricsExportService --> OpenTelemetryJsonRenderer
    RuntimeMetricsExportService --> MetricsArtifactIndexService
    RuntimeMetricsExtractor --> MetricPoint
    PrometheusTextRenderer --> MetricPoint
    OpenTelemetryJsonRenderer --> MetricPoint

Package taxonomy

Module Responsibility
dpone.observability.metrics Canonical metric names, labels, units, and extraction from run reports.
dpone.observability.prometheus Prometheus text exposition rendering only.
dpone.observability.opentelemetry OTLP-like JSON rendering only.
dpone.observability.artifacts Per-export checksum index for observability evidence files.
dpone.observability.export Use-case service that writes all artifacts and report files.
dpone.commands.observability_cmd Argparse adapter and stdout formatting only.

Do not add observability business logic to ops_cmd.py, run_cmd.py, or the CLI registry.

Adding a new metric source

Add extraction to RuntimeMetricsExtractor when the source is a standard dpone artifact such as:

  • dpone run --format json;
  • run registry report;
  • certification suite report;
  • benchmark baseline report.

Keep extraction deterministic and tolerant of missing optional fields.

Adding a new export target

Add a renderer class when the target format has its own rules:

class VendorMetricsRenderer:
    def render(self, points: Iterable[MetricPoint]) -> dict[str, object]:
        ...

Then inject it into a dedicated service or adapter. Do not make RuntimeMetricsExportService depend on vendor credentials or network clients. Network push belongs in a separate optional adapter because local CI must stay credential-free.

Test contract

Every observability change must include:

Test Purpose
Service test Confirms files are written and report status is correct.
Renderer test Confirms escaping, labels, and stable output shape.
CLI test Confirms dpone observability metrics-export works through argparse.
Artifact index test Confirms metrics_index.json contains checksums for every generated file.
Docs contract test Confirms user and developer docs mention command, artifacts, Prometheus, OpenTelemetry, and runbooks.

CI/CD contract

When adding observability to a workflow:

  • write artifacts under test_artifacts/observability/<run>/;
  • upload artifacts with if: always();
  • keep .github/workflows/observability-maturity.yml green when changing metrics contracts;
  • never write raw secrets into labels;
  • keep labels low-cardinality enough for Prometheus;
  • link the failure path in Failure runbooks if the workflow can fail on observability evidence.

Architecture update checklist

Update Architecture when one of these changes:

  • new observability package or module;
  • new supported artifact type;
  • new exported telemetry format;
  • new artifact index field or checksum semantics;
  • new runtime dependency or optional extra;
  • new command group or workflow gate.

User docs checklist

Update Runtime observability and CLI reference when a CLI flag, artifact file name, blocker code, or runbook action changes.