Configuration reference¶
This page is a compact map of the main manifest configuration sections. It is intentionally shorter than the deep-dive docs and links out when a section has complex behavior.
Top-level sections¶
| Section | Required | Description |
|---|---|---|
source |
yes | Source connector, credentials, source object, and extraction options. |
sink |
yes | Sink connector, credentials, target object, load strategy, and sink options. |
state |
no | Explicit state backend for XMin, CDC, Kafka offsets, and run state. |
quality |
no | Data quality checks and fail/warn behavior. |
observability |
no | Run artifacts, OpenTelemetry, Prometheus, and report settings. |
performance |
no | Performance advisor options and runtime hints. |
Source configuration¶
source:
type: postgres
connection_id: postgres_oltp
connection_type: env
table:
schema: public
name: orders
options:
incremental_column: updated_at
batch_size: 50000
Source types include postgres, mssql, clickhouse, api, and kafka.
Sink configuration¶
sink:
type: mssql
connection_id: mssql_dwh
connection_type: env
table:
schema: landing
name: orders
strategy:
mode: incremental_merge
unique_key: order_id
Sink types include postgres, mssql, clickhouse, bigquery, and kafka.
Strategy configuration¶
| Mode | Common required fields | Read more |
|---|---|---|
full_refresh |
none | Load strategies |
incremental_append |
source cursor or bounded source | Load strategies |
incremental_merge |
unique_key |
Load strategies |
replace |
custom_predicate or replacement scope |
Load strategies |
partition_replace |
partition.column and complete partition slice |
Load strategies |
snapshot_diff |
unique_key and row hash comparison |
Load strategies |
scd2 |
unique_key and SCD2 validity columns |
Load strategies |
cdc_apply |
normalized insert/update/delete events | Reconciliation and CDC |
backfill |
chunk definition and resumable state | Load strategies |
xmin |
PostgreSQL source | Postgres XMin |
cdc |
CDC-enabled source | Reconciliation and CDC |
Credentials¶
Supported providers are env, params, airflow, and vault. Start with Credentials quickstart, then use Connections and credentials for all fields and runbooks.
Schema evolution¶
sink:
options:
schema_evolution:
enabled: true
mode: widening
on_breaking: fail
on_type_change: fail
Read Schema evolution before enabling generated new-column mode for incompatible type changes.
Type inference¶
sink:
options:
type_inference:
enabled: true
prefer_source_metadata: true
sample_rows: 10000
empty_string_is_null: false
conflict_policy: fail
Read Type inference for precedence, confidence, source
metadata, sampled rows, and empty string vs NULL behavior.
Schema contracts¶
schema_contract:
enforcement: strict
columns:
amount:
type: decimal
precision: 18
scale: 4
nullable: false
Read Schema contracts for explicit logical column contracts, enforcement modes, and generated variant columns.
Physical design¶
sink:
options:
physical_design:
apply: online
indexes:
primary_key: [order_id]
storage:
clickhouse:
low_cardinality:
mode: auto
Read Physical design for target DDL controls:
partitioning, indexes, storage, compression, ClickHouse LowCardinality, and
BigQuery clustering.
Load lineage¶
Read Load lineage for canonical __dpone__* columns,
run_id, load_id, row identity, audit lifecycle, and migration guidance for
legacy meta__* columns.
Quality checks¶
Read Quality metrics for supported checks and artifact behavior.