Operations guide¶
This guide describes how to operate dpone in local, CI, staging, and
production-like environments. Operational controls are exposed through
dpone ops; there is no separate production API or runtime
mode.
Release channels¶
- GitHub release artifacts are attached to version tags.
- PyPI releases are published under the
dponedistribution. - Release workflow supports PyPI Trusted Publishing when configured on PyPI.
- Release workflow also supports the
PYPI_API_TOKENGitHub secret as a fallback.
Recommended deployment model¶
- Pin
dponeversions in production environments. - Promote releases through dev, staging, and production.
- Keep provider credentials outside manifests.
- Use environment variables, Airflow connections, Vault-compatible providers, or secret managers.
- Enable structured log collection for every DAG run.
Runtime health checks¶
For each production pipeline, track:
- Last successful run timestamp.
- Extracted row count.
- Loaded row count.
- Final row count.
- Error count.
- Retry count.
- Runtime duration.
- Checkpoint age.
- Soft-delete count when reconciliation is enabled.
dpone ops toolbox¶
Use dpone ops when you need operational evidence or safe
incident controls:
dpone ops certification-run --source postgres --sink mssql --strategy snapshot_diff
dpone ops certification-pack --pack-id orders-release --artifact certification_report=test_artifacts/integration_matrix/certification_report.json
dpone ops recovery-plan --state-dir .dpone/orchestration-state --lock-dir .dpone/locks
dpone ops reconcile --source-rows-json "$SOURCE_SAMPLE" --target-rows-json "$TARGET_SAMPLE" --key id
dpone ops deploy-render --target k8s-cronjob --manifest manifests/orders.yml --selector daily_orders
dpone ops contract-check --rows-json '[{"id":1}]' --contract-json '{"required_columns":["id"]}'
dpone ops marketplace --format md
dpone ops rollback-plan --sink mssql --target landing.orders --load-id 01JLOAD0000000000000000000
The toolbox covers certification artifacts, data contracts, quarantine replay, load package lifecycle, rollback plans, connector marketplace badges, reconciliation, recovery planning, deployment handoff, object-storage staging evidence, catalog publication, and links to performance advice. See the Operational control plane guide for the full operator flow and runbooks.
Incident response¶
When a run fails:
- Capture the manifest path and selector.
- Capture the run ID and task ID.
- Classify the failure as configuration, credentials, source, sink, schema, network, quota, or framework.
- Check whether a checkpoint was written before failure.
- Decide whether rerun is safe based on the connector's idempotency contract.
- Record the failure mode in connector docs if it is new.
Upgrade policy¶
- Patch upgrades should be safe for existing manifests.
- Minor upgrades may change behavior during
0.x, but must document migration steps inCHANGELOG.md. - Breaking changes must include a compatibility note and migration example.
Secrets¶
Never commit or paste credentials into manifests, docs, issues, pull requests, or logs. If a credential is exposed:
- Revoke it immediately.
- Rotate dependent systems.
- Run repository and history secret scans.
- Document the incident impact.