Operational control plane¶
The operational control plane is the layer around dpone run that turns a
single pipeline execution into a repeatable production operating model. It does
not replace the runtime. It creates plans, evidence, catalogs, deployment
handoff files, recovery guidance, and observability templates that operators can
review and automate.
Contents¶
- Control-plane flow
- Certification pack
- Runtime recovery plan
- Reconciliation 2.0
- Live certification
- Observability pack
- Deployment profiles
- Object-storage staging evidence
- Catalog publication
- Production runbook
- Developer extension points
Control-plane flow¶
flowchart TD
Run["dpone run / dpone orchestrate run"]
Registry["dpone ops run-registry"]
Reconcile["dpone ops reconcile"]
Metrics["dpone observability metrics-export"]
Staging["dpone ops staging-evidence"]
Catalog["dpone ops catalog-publish"]
Cert["dpone ops certification-pack"]
Recovery["dpone ops recovery-plan"]
Deploy["dpone ops deploy-render"]
Release["Release or incident evidence"]
Run --> Registry
Run --> Reconcile
Run --> Metrics
Run --> Staging
Registry --> Catalog
Registry --> Cert
Reconcile --> Cert
Metrics --> Cert
Staging --> Cert
Cert --> Release
Recovery --> Release
Deploy --> Release
The design is intentionally artifact-first:
| Principle | Behavior |
|---|---|
| Runtime stays canonical | dpone run and dpone orchestrate run remain the only execution paths. |
| Ops commands are non-invasive by default | Commands plan, validate, and render artifacts unless a dedicated destructive command explicitly requires --yes. |
| Evidence is immutable | JSON and Markdown outputs include checksums or references suitable for release, incident, and certification review. |
| CI-friendly output | Every command supports --format json or writes machine-readable JSON artifacts. |
| No credentials in artifacts | Deployment and catalog artifacts show handoff shape, not inline secrets. |
Certification pack¶
Use dpone ops certification-pack to aggregate matrix, observability, lineage,
reconciliation, staging, and contract artifacts into one connector certification
2.0 evidence package.
dpone ops certification-pack \
--pack-id postgres-mssql-orders-2026-06-06 \
--output-dir test_artifacts/certification/postgres_mssql_orders \
--artifact certification_report=test_artifacts/integration_matrix/certification_report.json \
--artifact observability=.dpone/observability/orders/runtime_metrics.json \
--artifact reconciliation=.dpone/reconciliation/orders/reconciliation_report.json \
--artifact lineage=.dpone/lineage/orders/openlineage_catalog_event.json \
--require certification_report \
--require observability \
--require reconciliation \
--format json
Generated files:
| File | Purpose |
|---|---|
connector_certification_pack.json |
Machine-readable pack with blockers, coverage, artifact checksums, and item statuses. |
connector_certification_pack.md |
Human-readable release or incident review artifact. |
Coverage is inferred from source -> sink case IDs such as
postgres_to_mssql__incremental_merge. This makes missing connector, sink, or
strategy coverage visible before a release is promoted.
Runbook when the pack is red:
- Open
blockersinconnector_certification_pack.json. - Re-run missing or failing required artifacts first.
- Compare
coverage.sources,coverage.sinks, andcoverage.strategiesagainst the expected certification matrix. - Do not publish connector badges or production readiness claims while the pack is red.
Live certification¶
Use Live certification when connector, strategy, native fast path, state backend, or source -> sink behavior needs proof against local disposable services or real vendor systems.
Plan a local live run:
dpone ops live-certification-plan \
--profile local_live \
--row-count 25000 \
--output-dir test_artifacts/live_certification/plan \
--format json
Evaluate the combined performance/SLO gate:
dpone ops benchmark-slo-gate \
--metrics-json '{"throughput_rows_per_second":120000,"freshness_lag_seconds":120}' \
--baseline-json '{"throughput_rows_per_second":{"value":100000,"direction":"higher"}}' \
--objectives-json '{"freshness_lag_seconds":{"max":300}}' \
--output-dir test_artifacts/live_certification/benchmark-slo \
--format json
The local_live profile uses docker/docker-compose.integration.yml for
Postgres, MSSQL, ClickHouse, Kafka, Schema Registry, and MinIO. Use
real_local before minor and major releases; it adds
performance-certification, live-state-reconciliation, and
pre-release-checklist / release-evidence-pack artifacts without requiring
external credentials. The vendor_live profile is manual and requires
configured provider secrets.
Build the release go/no-go evidence pack:
dpone ops release-evidence-pack \
--release v0.7.1 \
--profile real_local \
--artifact certification_pack=test_artifacts/live_certification/certification-pack/connector_certification_pack.json \
--artifact performance_certification=test_artifacts/live_certification/performance-certification/performance_certification.json \
--artifact live_state_reconciliation=test_artifacts/live_certification/live-state-reconciliation/live_state_reconciliation.json \
--artifact evidence_chain=test_artifacts/live_certification/evidence-chain/evidence_chain_index.json \
--format json
Runtime recovery plan¶
Use dpone ops recovery-plan after an interrupted run or before manually
cleaning local state.
dpone ops recovery-plan \
--state-dir .dpone/orchestration-state \
--lock-dir .dpone/locks \
--load-package-dir .dpone/load-packages \
--output-dir .dpone/recovery/orders \
--format json
The planner reads:
| Input | What it detects |
|---|---|
| Orchestration job state | failed/resumable runs and previous blockers. |
| Lock files | active local concurrency locks that may still represent running jobs. |
| Load packages | started or staged load packages that need commit, rollback, or inspection. |
The planner is non-destructive. It returns actions such as:
dpone orchestrate run --resume-policy resume
inspect_or_cleanup_lock
rollback_or_commit_load_package
Runbook:
- Prefer resume over restart when the previous job state is
resumable. - Inspect active locks before deleting them.
- Resolve staged load packages with target-native rollback or commit evidence.
- Re-run the recovery plan after every manual action.
Reconciliation 2.0¶
Use dpone ops reconcile for bounded row-level source-target validation and a
repair plan.
dpone ops reconcile \
--source-rows-json '[{"id":1,"amount":100},{"id":2,"amount":200}]' \
--target-rows-json '[{"id":1,"amount":100},{"id":2,"amount":250}]' \
--key id \
--compare-columns amount \
--output-dir .dpone/reconciliation/orders \
--format json
The service computes:
| Signal | Meaning |
|---|---|
missing_count |
Source rows not present in target. |
extra_count |
Target rows not present in source or target rows whose source row is physically deleted. |
mismatch_count |
Same key but different compared values. |
delete_count |
Source rows marked deleted through --delete-column. |
| Checksums | Stable source and target comparison hashes. |
| Repair actions | insert_target_row, update_target_row, or delete_target_row. |
Physical deletes are safe by construction: a source delete marker only generates a target delete action when the target still contains that key.
Runbook:
- Use staging-first
incremental_mergeto repair inserts and updates. - Verify delete policy before applying hard deletes.
- Keep reconciliation red as a state-commit blocker.
- Attach the reconciliation report to certification packs.
Observability pack¶
Use dpone ops observability-pack to generate reusable Grafana and Prometheus
templates for runtime metrics.
dpone ops observability-pack \
--output-dir .dpone/observability-pack/orders \
--service-name dpone \
--dashboard-title "dpone orders runtime" \
--alert dpone_run_passed.min=1 \
--alert dpone_error_count.max=0 \
--format json
Generated files:
| File | Purpose |
|---|---|
grafana_dashboard.json |
Provisionable Grafana dashboard with rows, runtime, errors, warnings, and lag panels. |
prometheus_alerts.yml |
Prometheus alert rules for failure and configured thresholds. |
observability_pack.json |
Machine-readable pack report. |
observability_pack.md |
Operator summary and runbook. |
Pair this with Runtime observability, which exports actual Prometheus and OpenTelemetry runtime metrics from a run report.
Deployment profiles¶
Use dpone ops deploy-render to generate scheduler handoff templates while
keeping dpone orchestrate run as the production execution command.
dpone ops deploy-render \
--target k8s-cronjob \
--manifest manifests/orders.yml \
--selector daily_orders \
--image ghcr.io/paulkov/dpone:0.7.1 \
--schedule "0 2 * * *" \
--output-dir .dpone/deploy/orders \
--format json
Supported targets:
| Target | Artifact |
|---|---|
docker-compose |
docker-compose.yml |
k8s-cronjob |
k8s-cronjob.yml |
airflow |
airflow_dag.py |
dagster |
dagster_asset.py |
Runbook:
- Review generated commands before deploying.
- Inject credentials through platform secrets.
- Keep lock/state directories on durable storage when jobs can overlap.
- Attach the rendered profile to release or environment-promotion evidence.
Object-storage staging evidence¶
Use dpone ops staging-evidence after staging files through
Object storage staging.
dpone ops staging-evidence \
--manifest .dpone/staging/orders/object_storage_manifest.json \
--sink clickhouse \
--target-table landing.orders \
--output-dir .dpone/staging-evidence/orders \
--format json
The evidence report validates:
| Check | Purpose |
|---|---|
| Object list is present | Prevents empty staging manifests from being promoted. |
sha256 length |
Ensures every staged object has checksum evidence. |
size_bytes positive |
Catches empty or failed uploads. |
| Native load hint | Shows the target-specific loading shape for operator review. |
Runbook:
- Verify object count and checksums before native target load.
- Keep staging objects immutable until target load and reconciliation pass.
- Use cleanup only after load evidence and source state commit are recorded.
Catalog publication¶
Use dpone ops catalog-publish to produce catalog handoff payloads for
OpenLineage, dbt, and DataHub-compatible ingestion.
dpone ops catalog-publish \
--run-registry-entry .dpone/run-registry/<run_id>__run_registry.json \
--namespace dpone.local \
--input postgres=public.orders \
--output mssql=landing.orders \
--output-dir .dpone/catalog/orders \
--format json
Generated files:
| File | Purpose |
|---|---|
openlineage_catalog_event.json |
OpenLineage-compatible catalog event. |
dbt_sources.yml |
dbt source definitions for downstream projects. |
datahub_mcp.json |
DataHub MCP-like dataset payload. |
catalog_publication.json |
Machine-readable dpone publication report. |
catalog_publication.md |
Human-readable operator summary. |
Runbook:
- Publish OpenLineage to the same collector used for runtime lineage.
- Review generated dbt sources before committing them to the analytics project.
- Feed DataHub payloads through the platform-owned ingestion job.
- Keep
run_idstable across run registry, lineage, and catalog artifacts.
Production runbook¶
Recommended high-confidence release path:
dpone orchestrate run \
--manifest manifests/orders.yml \
--selector daily_orders \
--resume-policy resume \
--format json > .dpone/runs/orders/run_result.json
dpone ops run-registry \
--run-result .dpone/runs/orders/run_result.json \
--output-dir .dpone/run-registry \
--format json
dpone observability metrics-export \
--run-report .dpone/runs/orders/run_result.json \
--output-dir .dpone/observability/orders \
--format json
dpone ops reconcile \
--source-rows-json "$SOURCE_SAMPLE" \
--target-rows-json "$TARGET_SAMPLE" \
--key id \
--output-dir .dpone/reconciliation/orders \
--format json
dpone ops certification-pack \
--pack-id orders-release \
--artifact certification_report=test_artifacts/integration_matrix/certification_report.json \
--artifact observability=.dpone/observability/orders/runtime_metrics.json \
--artifact reconciliation=.dpone/reconciliation/orders/reconciliation_report.json \
--require certification_report \
--require observability \
--require reconciliation \
--output-dir .dpone/certification-pack/orders \
--format json
If anything is red:
- Run
dpone ops recovery-plan. - Inspect the red artifact and follow its runbook.
- Re-run the focused command.
- Regenerate
certification-pack. - Promote only when all required artifacts are green.
Developer extension points¶
| Need | Module |
|---|---|
| Add a new evidence item type | dpone.ops.certification_pack |
| Add recovery signals | dpone.ops.recovery |
| Extend row comparison or checksums | dpone.ops.reconciliation |
| Add dashboards or alert templates | dpone.ops.observability_pack |
| Add a deployment target | dpone.ops.deploy_profiles |
| Add target-native staging hints | dpone.ops.staging_evidence |
| Add a catalog payload | dpone.ops.catalog_publish |
Design constraints:
- Keep CLI adapters thin.
- Keep services dependency-light and deterministic.
- Avoid target writes in planning services.
- Prefer JSON+Markdown artifacts for every operator workflow.
- Add tests for service behavior, CLI output, docs links, and runbook coverage.