Skip to content

Configuration reference

This page is a compact map of the main manifest configuration sections. It is intentionally shorter than the deep-dive docs and links out when a section has complex behavior.

Top-level sections

Section Required Description
source yes Source connector, credentials, source object, and extraction options.
sink yes Sink connector, credentials, target object, load strategy, and sink options.
state no Explicit state backend for XMin, CDC, Kafka offsets, and run state.
quality no Data quality checks and fail/warn behavior.
observability no Run artifacts, OpenTelemetry, Prometheus, and report settings.
performance no Performance advisor options and runtime hints.

Source configuration

source:
  type: postgres
  connection_id: postgres_oltp
  connection_type: env
  table:
    schema: public
    name: orders
  options:
    incremental_column: updated_at
    batch_size: 50000

Source types include postgres, mssql, clickhouse, api, and kafka.

Sink configuration

sink:
  type: mssql
  connection_id: mssql_dwh
  connection_type: env
  table:
    schema: landing
    name: orders
  strategy:
    mode: incremental_merge
    unique_key: order_id

Sink types include postgres, mssql, clickhouse, bigquery, and kafka.

Strategy configuration

Mode Common required fields Read more
full_refresh none Load strategies
incremental_append source cursor or bounded source Load strategies
incremental_merge unique_key Load strategies
replace custom_predicate or replacement scope Load strategies
partition_replace partition.column and complete partition slice Load strategies
snapshot_diff unique_key and row hash comparison Load strategies
scd2 unique_key and SCD2 validity columns Load strategies
cdc_apply normalized insert/update/delete events Reconciliation and CDC
backfill chunk definition and resumable state Load strategies
xmin PostgreSQL source Postgres XMin
cdc CDC-enabled source Reconciliation and CDC

Credentials

connection_type: env

Supported providers are env, params, airflow, and vault. Start with Credentials quickstart, then use Connections and credentials for all fields and runbooks.

Schema evolution

sink:
  options:
    schema_evolution:
      enabled: true
      mode: widening
      on_breaking: fail
      on_type_change: fail

Read Schema evolution before enabling generated new-column mode for incompatible type changes.

Type inference

sink:
  options:
    type_inference:
      enabled: true
      prefer_source_metadata: true
      sample_rows: 10000
      empty_string_is_null: false
      conflict_policy: fail

Read Type inference for precedence, confidence, source metadata, sampled rows, and empty string vs NULL behavior.

Schema contracts

schema_contract:
  enforcement: strict
  columns:
    amount:
      type: decimal
      precision: 18
      scale: 4
      nullable: false

Read Schema contracts for explicit logical column contracts, enforcement modes, and generated variant columns.

Physical design

sink:
  options:
    physical_design:
      apply: online
      indexes:
        primary_key: [order_id]
      storage:
        clickhouse:
          low_cardinality:
            mode: auto

Read Physical design for target DDL controls: partitioning, indexes, storage, compression, ClickHouse LowCardinality, and BigQuery clustering.

Load lineage

sink:
  options:
    lineage:
      enabled: true
      preset: standard
      features:
        quarantine: true

Read Load lineage for canonical __dpone__* columns, run_id, load_id, row identity, audit lifecycle, and migration guidance for legacy meta__* columns.

Quality checks

quality:
  mode: fail
  checks:
    - type: not_null
      columns: [order_id]
    - type: unique
      columns: [order_id]

Read Quality metrics for supported checks and artifact behavior.