Runtime observability¶
dpone observability metrics-export converts dpone run evidence into
Prometheus text exposition, OpenTelemetry-compatible JSON, a machine-readable
summary, and a Markdown report.
The exporter is intentionally local-first and dependency-light. It does not require a running collector, Prometheus server, or vendor SDK to produce artifacts. That keeps CI deterministic while still giving operators files they can ship to their observability stack.
Contents¶
- Quickstart
- How it works
- Prometheus artifact
- OpenTelemetry artifact
- Using custom metrics
- CI artifact pattern
- Runbook
- Developer links
Quickstart¶
Run a process and save JSON output:
uv run dpone run examples/postgres_to_mssql.yaml \
--format json > .dpone/runs/orders/run_report.json
Export observability artifacts:
uv run dpone observability metrics-export \
--run-report .dpone/runs/orders/run_report.json \
--output-dir .dpone/observability/orders \
--label env=local \
--label pipeline=orders \
--service-name dpone \
--namespace dpone.local \
--format json
Generated files:
| File | Purpose |
|---|---|
prometheus_metrics.prom |
Prometheus text exposition for scraping or pushgateway upload. |
opentelemetry_metrics.json |
OTLP-shaped JSON that can be adapted by collectors or CI uploaders. |
metrics_index.json |
SHA-256 checksum manifest for the files produced by this export. |
runtime_metrics.json |
dpone export report with metric list and artifact paths. |
runtime_metrics.md |
Human-readable summary for run artifacts or pull requests. |
How it works¶
flowchart LR
Run["dpone run --format json"]
Extractor["RuntimeMetricsExtractor"]
Model["MetricPoint[]"]
Prom["PrometheusTextRenderer"]
OTel["OpenTelemetryJsonRenderer"]
Report["RuntimeMetricsExportReport"]
Artifacts[".dpone/observability/<run>/"]
Run --> Extractor
Extractor --> Model
Model --> Prom
Model --> OTel
Prom --> Artifacts
OTel --> Artifacts
Model --> Report
Report --> Artifacts
The exporter reads the standard JSON shape produced by
dpone run. It extracts:
| Run field | Metric |
|---|---|
result.extracted_rows |
dpone_extracted_rows |
result.inserted_rows |
dpone_inserted_rows |
result.updated_rows |
dpone_updated_rows |
result.final_rows |
dpone_final_rows |
result.duration_seconds |
dpone_duration_seconds |
result.throughput_rows_per_second |
dpone_throughput_rows_per_second |
attempts |
dpone_attempts |
max_attempts |
dpone_max_attempts |
retry_backoff_seconds |
dpone_retry_backoff_seconds |
result.errors / errors / blockers |
dpone_error_count |
warnings |
dpone_warning_count |
passed |
dpone_run_passed |
Labels are attached to every metric. Run reports automatically add useful
labels such as run_id, process, selector, and status when available.
Prometheus artifact¶
Example output:
# HELP dpone_extracted_rows Rows extracted by the dpone run.
# TYPE dpone_extracted_rows gauge
dpone_extracted_rows{env="local",process="orders",run_id="01J...",status="success"} 10000
Use this path when you want a simple Prometheus-compatible file that can be:
- uploaded as a CI artifact;
- pushed to a Prometheus Pushgateway by a wrapper job;
- read by a node-local file collector;
- attached to release evidence.
OpenTelemetry artifact¶
The OpenTelemetry artifact is an OTLP-shaped JSON payload with:
service.name;service.namespace;- optional resource attributes passed with
--resource-attr key=value; - metric names, descriptions, units, labels, and gauge points.
It is intentionally not a direct collector client. Production deployments can choose their own collector path without forcing optional dependencies on local users.
Using custom metrics¶
Add manual metrics with repeated --metric name=value flags:
uv run dpone observability metrics-export \
--run-report .dpone/runs/orders/run_report.json \
--metric throughput_rows_per_second=85000 \
--metric freshness_lag_seconds=180 \
--label env=prod \
--label pipeline=orders
Custom names are normalized to dpone_<name> unless they already start with
dpone_.
Add OpenTelemetry resource attributes with repeated --resource-attr key=value
flags. Use resource attributes for stable deployment identity and metric labels
for low-cardinality run dimensions:
uv run dpone observability metrics-export \
--run-report .dpone/runs/orders/run_report.json \
--output-dir .dpone/observability/orders \
--label pipeline=orders \
--label strategy=incremental_merge \
--resource-attr deployment.environment=prod \
--resource-attr service.version=0.7.1
Prometheus label names are sanitized to Prometheus-compatible names. For
example, source.system=postgres is rendered as source_system="postgres".
CI artifact pattern¶
Recommended CI path:
uv run dpone run "$MANIFEST" --format json > test_artifacts/runs/run_report.json
uv run dpone observability metrics-export \
--run-report test_artifacts/runs/run_report.json \
--output-dir test_artifacts/observability/current \
--label ci_run_id="$GITHUB_RUN_ID" \
--label branch="$GITHUB_REF_NAME" \
--format json
Upload the whole test_artifacts/observability/current/ directory even on
failure. This preserves metrics for failed runs and makes regression triage much
faster.
The repository includes .github/workflows/observability-maturity.yml as the
manual and weekly credential-free gate. It runs the observability tests, exports
Prometheus/OpenTelemetry artifacts from a synthetic run report, evaluates an SLO
smoke check, builds an artifact index, and uploads observability-maturity-report.
Runbook¶
| Symptom | Likely cause | Action |
|---|---|---|
metrics.empty blocker |
No run report and no --metric flags were provided. |
Pass --run-report or at least one --metric name=value. |
run_report.missing blocker |
Path in --run-report does not exist in the current job workspace. |
Upload/download the run artifact first, or use an absolute path inside CI. |
run_report.invalid_json blocker |
The input file is not JSON output from dpone run --format json. |
Re-run with --format json and redirect only stdout. |
| Prometheus labels look wrong | The wrapper passed inconsistent labels. | Standardize labels in CI: env, pipeline, source, sink, strategy, ci_run_id. |
| OTel collector rejects the artifact | The artifact is OTLP-shaped JSON, not a direct protobuf request. | Use a collector/file adapter or a small upload wrapper owned by your platform. |
metrics_index.json checksum drift |
A file was modified after export or an artifact was regenerated out of order. | Re-run dpone observability metrics-export and upload the full output directory as one immutable artifact. |
observability-maturity.yml is red |
Tests, metrics export, SLO smoke, or artifact indexing failed. | Open observability-maturity-report, inspect runtime_metrics.json, metrics_index.json, and slo_report.json, then reproduce the failing command locally. |