Build reliable data pipelines without hiding the machinery.¶
dpone is a production-oriented Python framework for moving data between databases, APIs, Kafka, and analytical targets with explicit state, staging-first loads, schema evolution, reconciliation, quality gates, and operational artifacts.
What dpone is built for¶
Staging-first loading¶
Every database sink uses staging or shadow-table flows before target promotion, so heavy writes are kept away from final tables until commit time.
Explicit state¶
XMin, Kafka offsets, CDC offsets, run state, and source cursors can be persisted in supported state backends instead of disappearing into logs.
Schema evolution¶
Safe additions and widening are automated by default. Breaking type changes can fail fast or route into __dpone__nc__* generated columns.
Operational UX¶
Doctor, plan, run reports, quality checks, state inspection, connector certification, and performance advice are first-class workflows.
Fast paths¶
| Need | Start here |
|---|---|
| Install and smoke-test dpone | Installation |
| Run the first manifest | Quickstart |
| Execute from CLI or Python | Running pipelines |
| Try a local database pipeline | First local pipeline |
| Configure database/API/Kafka credentials | Connections and credentials |
| Choose a pipeline combination | Source -> sink matrix |
| Pick append/upsert/replace semantics | Load strategies |
| Split nested JSON into root/child tables | Nested normalization |
| Understand type conversion | Type mapping matrix |
| Control automatic type detection | Type inference |
| Declare explicit column contracts | Schema contracts |
| Tune target DDL, indexes, compression, and storage | Physical design |
| Enforce row contracts and quarantine bad rows | Runtime data contracts |
| Keep contracts safe on streaming/native fast paths | Streaming-safe contracts |
| Apply physical DDL safely | Physical DDL apply |
| Use production-safe defaults | Production profiles |
| Bundle run evidence for certification | Unified run evidence |
| Decide whether a minor/major release can ship | Release evidence |
| Use PostgreSQL as source, sink, or state | PostgreSQL guide |
| Use SQL Server as source, sink, or state | MSSQL guide |
| Use BigQuery as analytical sink or state backend | BigQuery guide |
| Use ClickHouse as analytical source or sink | ClickHouse guide |
| Use bounded Kafka batch source/sink | Kafka guide |
| Use Postgres transaction-ID incremental extraction | Postgres XMin |
| Export Prometheus and OpenTelemetry runtime metrics | Runtime observability |
| Operate certification, recovery, reconciliation, deployment, and catalog evidence | Operational control plane |
| Prove connectors with local-live/real-local/vendor-live gates | Live certification |
| Stage large files through S3/GCS/Azure | Object storage staging |
| Produce SBOM/provenance/signing evidence | Supply-chain evidence |
| Run certification, contracts, quarantine, rollback, and marketplace controls | dpone ops |
| Prepare for production operations | Production readiness |
| Understand CI/CD and release automation | CI/CD |
Install¶
pip install dpone
dpone --version
dpone -v
pip install "dpone[postgres,mssql,clickhouse,kafka,gcp,s3,azure,pandas,vault]"
Local documentation preview¶
The GitHub Pages workflow builds the same site with mkdocs build --strict on every docs pull request and deploys from master.