Skip to content

Operational control plane

The operational control plane is the layer around dpone run that turns a single pipeline execution into a repeatable production operating model. It does not replace the runtime. It creates plans, evidence, catalogs, deployment handoff files, recovery guidance, and observability templates that operators can review and automate.

Contents

Control-plane flow

flowchart TD
    Run["dpone run / dpone orchestrate run"]
    Registry["dpone ops run-registry"]
    Reconcile["dpone ops reconcile"]
    Metrics["dpone observability metrics-export"]
    Staging["dpone ops staging-evidence"]
    Catalog["dpone ops catalog-publish"]
    Cert["dpone ops certification-pack"]
    Recovery["dpone ops recovery-plan"]
    Deploy["dpone ops deploy-render"]
    Release["Release or incident evidence"]

    Run --> Registry
    Run --> Reconcile
    Run --> Metrics
    Run --> Staging
    Registry --> Catalog
    Registry --> Cert
    Reconcile --> Cert
    Metrics --> Cert
    Staging --> Cert
    Cert --> Release
    Recovery --> Release
    Deploy --> Release

The design is intentionally artifact-first:

Principle Behavior
Runtime stays canonical dpone run and dpone orchestrate run remain the only execution paths.
Ops commands are non-invasive by default Commands plan, validate, and render artifacts unless a dedicated destructive command explicitly requires --yes.
Evidence is immutable JSON and Markdown outputs include checksums or references suitable for release, incident, and certification review.
CI-friendly output Every command supports --format json or writes machine-readable JSON artifacts.
No credentials in artifacts Deployment and catalog artifacts show handoff shape, not inline secrets.

Certification pack

Use dpone ops certification-pack to aggregate matrix, observability, lineage, reconciliation, staging, and contract artifacts into one connector certification 2.0 evidence package.

dpone ops certification-pack \
  --pack-id postgres-mssql-orders-2026-06-06 \
  --output-dir test_artifacts/certification/postgres_mssql_orders \
  --artifact certification_report=test_artifacts/integration_matrix/certification_report.json \
  --artifact observability=.dpone/observability/orders/runtime_metrics.json \
  --artifact reconciliation=.dpone/reconciliation/orders/reconciliation_report.json \
  --artifact lineage=.dpone/lineage/orders/openlineage_catalog_event.json \
  --require certification_report \
  --require observability \
  --require reconciliation \
  --format json

Generated files:

File Purpose
connector_certification_pack.json Machine-readable pack with blockers, coverage, artifact checksums, and item statuses.
connector_certification_pack.md Human-readable release or incident review artifact.

Coverage is inferred from source -> sink case IDs such as postgres_to_mssql__incremental_merge. This makes missing connector, sink, or strategy coverage visible before a release is promoted.

Runbook when the pack is red:

  1. Open blockers in connector_certification_pack.json.
  2. Re-run missing or failing required artifacts first.
  3. Compare coverage.sources, coverage.sinks, and coverage.strategies against the expected certification matrix.
  4. Do not publish connector badges or production readiness claims while the pack is red.

Live certification

Use Live certification when connector, strategy, native fast path, state backend, or source -> sink behavior needs proof against local disposable services or real vendor systems.

Plan a local live run:

dpone ops live-certification-plan \
  --profile local_live \
  --row-count 25000 \
  --output-dir test_artifacts/live_certification/plan \
  --format json

Evaluate the combined performance/SLO gate:

dpone ops benchmark-slo-gate \
  --metrics-json '{"throughput_rows_per_second":120000,"freshness_lag_seconds":120}' \
  --baseline-json '{"throughput_rows_per_second":{"value":100000,"direction":"higher"}}' \
  --objectives-json '{"freshness_lag_seconds":{"max":300}}' \
  --output-dir test_artifacts/live_certification/benchmark-slo \
  --format json

The local_live profile uses docker/docker-compose.integration.yml for Postgres, MSSQL, ClickHouse, Kafka, Schema Registry, and MinIO. Use real_local before minor and major releases; it adds performance-certification, live-state-reconciliation, and pre-release-checklist / release-evidence-pack artifacts without requiring external credentials. The vendor_live profile is manual and requires configured provider secrets.

Build the release go/no-go evidence pack:

dpone ops release-evidence-pack \
  --release v0.7.1 \
  --profile real_local \
  --artifact certification_pack=test_artifacts/live_certification/certification-pack/connector_certification_pack.json \
  --artifact performance_certification=test_artifacts/live_certification/performance-certification/performance_certification.json \
  --artifact live_state_reconciliation=test_artifacts/live_certification/live-state-reconciliation/live_state_reconciliation.json \
  --artifact evidence_chain=test_artifacts/live_certification/evidence-chain/evidence_chain_index.json \
  --format json

Runtime recovery plan

Use dpone ops recovery-plan after an interrupted run or before manually cleaning local state.

dpone ops recovery-plan \
  --state-dir .dpone/orchestration-state \
  --lock-dir .dpone/locks \
  --load-package-dir .dpone/load-packages \
  --output-dir .dpone/recovery/orders \
  --format json

The planner reads:

Input What it detects
Orchestration job state failed/resumable runs and previous blockers.
Lock files active local concurrency locks that may still represent running jobs.
Load packages started or staged load packages that need commit, rollback, or inspection.

The planner is non-destructive. It returns actions such as:

dpone orchestrate run --resume-policy resume
inspect_or_cleanup_lock
rollback_or_commit_load_package

Runbook:

  1. Prefer resume over restart when the previous job state is resumable.
  2. Inspect active locks before deleting them.
  3. Resolve staged load packages with target-native rollback or commit evidence.
  4. Re-run the recovery plan after every manual action.

Reconciliation 2.0

Use dpone ops reconcile for bounded row-level source-target validation and a repair plan.

dpone ops reconcile \
  --source-rows-json '[{"id":1,"amount":100},{"id":2,"amount":200}]' \
  --target-rows-json '[{"id":1,"amount":100},{"id":2,"amount":250}]' \
  --key id \
  --compare-columns amount \
  --output-dir .dpone/reconciliation/orders \
  --format json

The service computes:

Signal Meaning
missing_count Source rows not present in target.
extra_count Target rows not present in source or target rows whose source row is physically deleted.
mismatch_count Same key but different compared values.
delete_count Source rows marked deleted through --delete-column.
Checksums Stable source and target comparison hashes.
Repair actions insert_target_row, update_target_row, or delete_target_row.

Physical deletes are safe by construction: a source delete marker only generates a target delete action when the target still contains that key.

Runbook:

  1. Use staging-first incremental_merge to repair inserts and updates.
  2. Verify delete policy before applying hard deletes.
  3. Keep reconciliation red as a state-commit blocker.
  4. Attach the reconciliation report to certification packs.

Observability pack

Use dpone ops observability-pack to generate reusable Grafana and Prometheus templates for runtime metrics.

dpone ops observability-pack \
  --output-dir .dpone/observability-pack/orders \
  --service-name dpone \
  --dashboard-title "dpone orders runtime" \
  --alert dpone_run_passed.min=1 \
  --alert dpone_error_count.max=0 \
  --format json

Generated files:

File Purpose
grafana_dashboard.json Provisionable Grafana dashboard with rows, runtime, errors, warnings, and lag panels.
prometheus_alerts.yml Prometheus alert rules for failure and configured thresholds.
observability_pack.json Machine-readable pack report.
observability_pack.md Operator summary and runbook.

Pair this with Runtime observability, which exports actual Prometheus and OpenTelemetry runtime metrics from a run report.

Deployment profiles

Use dpone ops deploy-render to generate scheduler handoff templates while keeping dpone orchestrate run as the production execution command.

dpone ops deploy-render \
  --target k8s-cronjob \
  --manifest manifests/orders.yml \
  --selector daily_orders \
  --image ghcr.io/paulkov/dpone:0.7.1 \
  --schedule "0 2 * * *" \
  --output-dir .dpone/deploy/orders \
  --format json

Supported targets:

Target Artifact
docker-compose docker-compose.yml
k8s-cronjob k8s-cronjob.yml
airflow airflow_dag.py
dagster dagster_asset.py

Runbook:

  1. Review generated commands before deploying.
  2. Inject credentials through platform secrets.
  3. Keep lock/state directories on durable storage when jobs can overlap.
  4. Attach the rendered profile to release or environment-promotion evidence.

Object-storage staging evidence

Use dpone ops staging-evidence after staging files through Object storage staging.

dpone ops staging-evidence \
  --manifest .dpone/staging/orders/object_storage_manifest.json \
  --sink clickhouse \
  --target-table landing.orders \
  --output-dir .dpone/staging-evidence/orders \
  --format json

The evidence report validates:

Check Purpose
Object list is present Prevents empty staging manifests from being promoted.
sha256 length Ensures every staged object has checksum evidence.
size_bytes positive Catches empty or failed uploads.
Native load hint Shows the target-specific loading shape for operator review.

Runbook:

  1. Verify object count and checksums before native target load.
  2. Keep staging objects immutable until target load and reconciliation pass.
  3. Use cleanup only after load evidence and source state commit are recorded.

Catalog publication

Use dpone ops catalog-publish to produce catalog handoff payloads for OpenLineage, dbt, and DataHub-compatible ingestion.

dpone ops catalog-publish \
  --run-registry-entry .dpone/run-registry/<run_id>__run_registry.json \
  --namespace dpone.local \
  --input postgres=public.orders \
  --output mssql=landing.orders \
  --output-dir .dpone/catalog/orders \
  --format json

Generated files:

File Purpose
openlineage_catalog_event.json OpenLineage-compatible catalog event.
dbt_sources.yml dbt source definitions for downstream projects.
datahub_mcp.json DataHub MCP-like dataset payload.
catalog_publication.json Machine-readable dpone publication report.
catalog_publication.md Human-readable operator summary.

Runbook:

  1. Publish OpenLineage to the same collector used for runtime lineage.
  2. Review generated dbt sources before committing them to the analytics project.
  3. Feed DataHub payloads through the platform-owned ingestion job.
  4. Keep run_id stable across run registry, lineage, and catalog artifacts.

Production runbook

Recommended high-confidence release path:

dpone orchestrate run \
  --manifest manifests/orders.yml \
  --selector daily_orders \
  --resume-policy resume \
  --format json > .dpone/runs/orders/run_result.json

dpone ops run-registry \
  --run-result .dpone/runs/orders/run_result.json \
  --output-dir .dpone/run-registry \
  --format json

dpone observability metrics-export \
  --run-report .dpone/runs/orders/run_result.json \
  --output-dir .dpone/observability/orders \
  --format json

dpone ops reconcile \
  --source-rows-json "$SOURCE_SAMPLE" \
  --target-rows-json "$TARGET_SAMPLE" \
  --key id \
  --output-dir .dpone/reconciliation/orders \
  --format json

dpone ops certification-pack \
  --pack-id orders-release \
  --artifact certification_report=test_artifacts/integration_matrix/certification_report.json \
  --artifact observability=.dpone/observability/orders/runtime_metrics.json \
  --artifact reconciliation=.dpone/reconciliation/orders/reconciliation_report.json \
  --require certification_report \
  --require observability \
  --require reconciliation \
  --output-dir .dpone/certification-pack/orders \
  --format json

If anything is red:

  1. Run dpone ops recovery-plan.
  2. Inspect the red artifact and follow its runbook.
  3. Re-run the focused command.
  4. Regenerate certification-pack.
  5. Promote only when all required artifacts are green.

Developer extension points

Need Module
Add a new evidence item type dpone.ops.certification_pack
Add recovery signals dpone.ops.recovery
Extend row comparison or checksums dpone.ops.reconciliation
Add dashboards or alert templates dpone.ops.observability_pack
Add a deployment target dpone.ops.deploy_profiles
Add target-native staging hints dpone.ops.staging_evidence
Add a catalog payload dpone.ops.catalog_publish

Design constraints:

  1. Keep CLI adapters thin.
  2. Keep services dependency-light and deterministic.
  3. Avoid target writes in planning services.
  4. Prefer JSON+Markdown artifacts for every operator workflow.
  5. Add tests for service behavior, CLI output, docs links, and runbook coverage.