Manifest basics¶

dpone pipelines are driven by YAML manifests. A manifest says where data comes from, where it goes, how it should be loaded, and where state should be stored.

Minimal shape¶

source:
  type: postgres
  connection_id: postgres_oltp
  connection_type: env
  table:
    schema: public
    name: orders

sink:
  type: mssql
  connection_id: mssql_dwh
  connection_type: env
  table:
    schema: landing
    name: orders
  strategy:
    mode: incremental_append

Core sections¶

Section	Purpose
`source`	Extract data from a database, API, or Kafka topic.
`sink`	Load rows into a database, warehouse, or Kafka topic.
`state`	Store offsets, XMin cursors, CDC offsets, Kafka offsets, and run metadata.
`quality`	Run checks such as not-null, uniqueness, freshness, and row-count deltas.
`observability`	Configure run artifacts, OpenTelemetry, and Prometheus output.

Strategy section¶

sink:
  strategy:
    mode: incremental_merge
    unique_key: order_id

Common strategies:

Strategy	Meaning
`full_refresh`	Replace the target dataset with the current source snapshot.
`incremental_append`	Append only new rows.
`incremental_merge`	Upsert rows by unique key.
`replace`	Replace a configured target slice.
`partition_replace`	Replace target partitions represented by staged partition values.
`xmin`	PostgreSQL XMin incremental extraction.
`cdc`	Consume source change data capture when enabled.

Read Load strategies for exact semantics and source/sink support.

Schema evolution defaults¶

Schema evolution is enabled by default for safe changes:

sink:
  options:
    schema_evolution:
      enabled: true
      on_breaking: fail
      on_type_change: fail

Use on_type_change: new_column only when you intentionally want incompatible source values written to __dpone__nc__<column>.

Batch manifests¶

Variant C batch manifests let one YAML file describe many related processes with defaults, variables, and overrides.

layer_defaults:
  target_schema: landing

vars:
  batch_size: 50000

schemas:
  - source_schema: public
    tables:
      - source_table: orders
      - source_table: customers

Schema: src/dpone/schema/etl-batch-manifest.schema.json

Deep dive: Variant C manifests

Validation and planning¶

dpone manifest validate manifests/orders.yaml
dpone plan manifests/orders.yaml

Use plan before the first write to inspect source queries, staging tables, schema evolution DDL, reconciliation behavior, state transitions, and quality gates.