Schema contracts¶
Schema contracts let users define portable logical column types before dpone
renders target-specific DDL. They are the right place for business-critical
columns, sparse columns, financial decimals, timestamps, and fields where
sampled inference is not acceptable.
Basic contract¶
schema_contract:
enforcement: strict
columns:
amount:
type: decimal
precision: 18
scale: 4
nullable: false
updated_at:
type: timestamp
timezone: true
nullable: false
payload:
type: json
nullable: true
Enforcement modes¶
| Mode | Behavior |
|---|---|
strict |
Fail before or during load when values do not match the contract. |
coerce |
Attempt explicit value conversion before staging. |
quarantine |
Route bad rows to __dpone__quarantine and keep the target clean. |
warn |
Continue best-effort and emit run artifact warnings. |
Production default is strict. For dirty APIs and files, use quarantine.
Runtime behavior is implemented by
Runtime data contracts. That layer decides which
rows are safe to stage, which rows go to quarantine, and whether state can
advance after the load.
Type conflict policies¶
type_inference.conflict_policy handles conflicts between source values,
contracts, and target shape:
| Policy | Behavior |
|---|---|
fail |
Stop before target writes. |
variant_column |
Route incompatible values into __dpone__nc__<column>. |
quarantine |
Keep target columns clean and store rejected rows with diagnostics. |
Example:
sink:
options:
type_inference:
conflict_policy: variant_column
schema_evolution:
data_type: variant_column
If amount was decimal(18,2) and source starts sending JSON/text payloads,
dpone creates or reuses __dpone__nc__amount. The old amount column remains
untouched.
Target-specific type override¶
Use logical contracts for portable semantics and physical overrides for one specific sink:
sink:
options:
physical_design:
columns:
amount:
target_type:
mssql: decimal(18,4)
postgres: numeric(18,4)
clickhouse: Decimal(18,4)
bigquery: NUMERIC
The override wins over inference. If it would narrow an existing target column, the online DDL governance layer blocks it unless a safe-window or manual approval flow is configured.
CLI¶
dpone schema infer --manifest manifests/orders.batch.yaml --format md
dpone schema physical-plan --manifest manifests/orders.batch.yaml --format json
dpone plan manifests/orders.batch.yaml --selector public.orders --format json
Runbook¶
| Change | Recommended action |
|---|---|
| Add a sparse field | Add it to schema_contract.columns so it appears before first non-null value. |
| Currency amount | Use decimal with explicit precision and scale. |
| Timestamp from API | Set type: timestamp and timezone intentionally. |
| Source type changed incompatibly | Prefer fail; use variant_column only for planned downstream migration. |
| Bad rows should not block ingestion | Use enforcement: quarantine and monitor quarantine artifacts. |