Schema Evolution¶
dpone applies safe schema evolution automatically before every staging-first load. For low-lock DDL governance, lock budgets, ledger artifacts, and expand-contract workflows, see Online schema evolution. The goal is to let routine source drift move forward without breaking production pipelines, while still stopping on changes that can silently corrupt data.
Default runtime policy:
sink:
options:
schema_evolution:
enabled: true
mode: widening
apply_safe: true
on_breaking: fail
on_type_change: fail
new_column_prefix: "__dpone__nc__"
To turn it off for a pipeline:
Use the expanded form in normal manifests:
Runtime flow¶
sequenceDiagram
participant Source
participant Runtime as dpone Runtime
participant Evolution as SchemaEvolutionService
participant Sink
participant Target
Source->>Runtime: ExtractResult(artifact, schema, state)
Runtime->>Sink: get_target_schema(table)
Sink->>Target: introspect current columns
Runtime->>Evolution: compare source schema vs target schema
Evolution-->>Runtime: plan(actions, DDL, column_mapping)
alt safe changes and apply_safe=true
Runtime->>Sink: apply_schema_plan(plan)
Sink->>Target: safe ADD / widen DDL
else breaking change
Runtime-->>Runtime: fail closed before staging
end
Runtime->>Sink: load artifact through staging/shadow
Sink->>Target: finalize set-based commit
Runtime-->>Runtime: persist state only after sink success
What happens before load¶
- The source returns
ExtractResult.schema. - The sink introspects the current target table schema.
SchemaEvolutionServicebuilds a plan withSchemaComparator.- Safe DDL is applied before staging is created.
- If a type-change mapping is configured, source rows are remapped before sink load.
- The normal staging-first strategy continues.
This means schema evolution never writes data directly into the final target table. Data still flows through staging/shadow paths implemented by each sink.
Safe vs breaking changes¶
Safe by default:
| Change | Runtime action |
|---|---|
| New nullable source column | ALTER TABLE ADD COLUMN, then load into staging with the new column |
| Safe numeric/string widening | ALTER/ MODIFY COLUMN where the dialect supports it |
| Existing generated column for type change | Reuse it and route source values there |
| Technical columns | Managed by sink/reconciliation strategy |
Breaking by default:
| Change | Why it stops |
|---|---|
| Source column removed | Could be a real drop or rename; dpone cannot infer intent safely |
| Rename | Looks like drop + add; manual mapping is required |
| Narrowing | Can truncate/overflow data |
| Nullable -> not-null without default | Existing target rows can violate it |
| Incompatible type change | Can corrupt values unless explicitly routed to a generated column |
Source column starting with __dpone__ |
Reserved framework namespace |
Incompatible type changes with generated columns¶
If a source column changes type incompatibly, set:
Example:
| Source | Target before | Target after |
|---|---|---|
amount nvarchar(100) |
amount int |
amount int, __dpone__nc__amount nvarchar(100) |
Runtime behavior:
- The original target column is not overwritten.
__dpone__nc__<original_column_name>is added as nullable.- Source values from
amountare written into__dpone__nc__amount. - Downstream consumers can migrate at their own pace.
- After migration, an operator can manually rename/drop columns during a planned change window.
If the generated column already exists with the same type, dpone reuses it. If it exists with a different type, the load fails.
Framework-generated companion columns¶
__dpone__* remains a reserved framework namespace. Source columns that start
with __dpone__ fail by default unless they are known framework-generated
columns. This prevents user data from accidentally colliding with runtime
metadata.
One important generated family is temporal offset fidelity:
For every offset-aware source column, dpone adds
__dpone__tz_offset_minutes__<column> as a nullable/safe companion column. The
normal schema evolution flow adds that column before staging. If the companion
column already exists with an incompatible type, the load fails before data is
written. If schema_evolution.enabled: false, create the companion column
manually or choose utc_instant, fixed_timezone, or preserve_text.
CLI planning¶
Use the same planner outside runtime:
dpone schema plan \
--source source-columns.json \
--target target-columns.json \
--table landing.orders \
--dialect postgres \
--mode widening \
--on-type-change new_column \
--format json
Supported dialects:
mssqlpostgresclickhousebigquery
Column JSON shape:
[
{"name": "id", "dtype": "bigint", "nullable": false},
{"name": "amount", "dtype": "numeric(18,2)", "nullable": true}
]
Source -> sink matrix¶
Schema evolution is sink-side after extraction, so the same policy works for each source. The target dialect controls DDL and type mapping.
For production implementation details, use the dedicated Source -> sink matrix index. It links to one guide per supported source -> sink combination. Use Type mapping matrix for the cross-system type conversion defaults and caveats.
| Flow | Safe add column | Widening | Incompatible type change | Notes |
|---|---|---|---|---|
| postgres -> mssql | Auto ALTER TABLE ADD |
Auto where MSSQL type widening is supported | Fail or __dpone__nc__* |
Best for high-throughput Postgres COPY -> bcp loads |
| postgres -> postgres | Auto ADD COLUMN |
Auto ALTER COLUMN TYPE |
Fail or __dpone__nc__* |
Prefer explicit maintenance window for narrowing |
| postgres -> clickhouse | Auto ADD COLUMN |
Auto MODIFY COLUMN for safe widening |
Fail or __dpone__nc__* |
Full/replace still use staging/shadow paths |
| postgres -> bigquery | Auto schema field addition | Auto supported ALTER COLUMN SET DATA TYPE |
Fail or __dpone__nc__* |
BigQuery limits some nested/repeated changes |
| mssql -> mssql | Auto ALTER TABLE ADD |
Auto where supported | Fail or __dpone__nc__* |
Same-DB internal query path remains supported |
| mssql -> postgres | Auto ADD COLUMN |
Auto ALTER COLUMN TYPE |
Fail or __dpone__nc__* |
Watch datetime/uniqueidentifier mappings |
| mssql -> clickhouse | Auto ADD COLUMN |
Auto MODIFY COLUMN |
Fail or __dpone__nc__* |
Native TSV/client ingest keeps column order |
| mssql -> bigquery | Auto schema field addition | Auto supported widening | Fail or __dpone__nc__* |
Prefer string-safe landing for sql_variant-like fields |
| clickhouse -> mssql | Auto ALTER TABLE ADD |
Auto where supported | Fail or __dpone__nc__* |
Arrays/nested values should land as string/json unless mapped |
| clickhouse -> postgres | Auto ADD COLUMN |
Auto ALTER COLUMN TYPE |
Fail or __dpone__nc__* |
LowCardinality/Nullable wrappers are normalized by source schema |
| clickhouse -> clickhouse | Auto ADD COLUMN |
Auto MODIFY COLUMN |
Fail or __dpone__nc__* |
No mutations are used for load or reconciliation paths |
| clickhouse -> bigquery | Auto schema field addition | Auto supported widening | Fail or __dpone__nc__* |
Parquet/GCS paths may need explicit nested type handling |
| api -> mssql | Auto ALTER TABLE ADD |
Auto where supported | Fail or __dpone__nc__* |
Keep REST columns explicit for predictable DDL |
| api -> postgres | Auto ADD COLUMN |
Auto ALTER COLUMN TYPE |
Fail or __dpone__nc__* |
JSON payloads can be typed as jsonb or text |
| api -> clickhouse | Auto ADD COLUMN |
Auto MODIFY COLUMN |
Fail or __dpone__nc__* |
Prefer String for highly variable fields |
| api -> bigquery | Auto schema field addition | Auto supported widening | Fail or __dpone__nc__* |
BigQuery JSON is useful for semi-structured payloads |
Example manifests¶
Postgres -> MSSQL with default safe evolution:
source:
type: postgres
connection_id: postgres_source
table: {schema: public, name: orders}
sink:
type: mssql
connection_id: mssql_dwh
table: {schema: landing, name: orders}
strategy: {mode: incremental_merge}
options:
schema_evolution:
enabled: true
mode: widening
MSSQL -> ClickHouse with generated-column type-change handling:
source:
type: mssql
connection_id: mssql_source
table: {schema: dbo, name: orders}
sink:
type: clickhouse
connection_id: clickhouse_dwh
table: {schema: landing, name: orders}
strategy: {mode: incremental_append}
options:
schema_evolution:
on_type_change: new_column
REST/API -> BigQuery in plan-only runtime mode:
source:
type: api
api_type: rest
options:
resource: orders
columns:
- {name: id, type: bigint}
- {name: payload, type: json}
sink:
type: bigquery
connection_id: bq_dwh
table: {schema: landing_api, name: orders}
strategy: {mode: full_refresh}
options:
schema_evolution:
apply_safe: false
With apply_safe: false, runtime fails before loading when safe DDL is needed. Use this in CI/preflight jobs when you want a plan but do not want the runner to mutate schemas.
Runbooks¶
New column appears in source¶
- Confirm the new column is expected.
- Keep
schema_evolution.enabled: true. - Let dpone add it automatically if nullable/default-safe.
- If the column is required/non-null, add it manually with a default or make the source expose it nullable first.
Source type widened¶
- Run
dpone schema planwith the target dialect. - If the plan shows
type_widen, allow runtime to apply it. - For BigQuery and ClickHouse, confirm the target engine supports the planned DDL for that concrete type.
- Re-run the pipeline and check normal row-count reconciliation.
Source type changed incompatibly¶
- Keep default
on_type_change: failif downstream consumers must not see contract drift. - Use
on_type_change: new_columnif you want ingestion to continue safely. - Update downstream models to read
__dpone__nc__<column>. - During a planned migration, backfill/rename/drop manually.
- Remove the generated-column exception only after downstream consumers are migrated.
Column disappeared¶
- Treat it as possible rename/drop.
- Stop the pipeline on the default failure.
- Check source DDL/change request.
- If it is a rename, add a transform/source alias or manual migration.
- If it is a real drop, schedule target cleanup manually.
Nullability changed¶
- Nullable -> not-null is not auto-applied unless a default/backfill is planned.
- Add a default or backfill existing target rows.
- Apply the not-null constraint manually in a maintenance window.
Reserved __dpone__ source column¶
- Rename the source column or alias it in extraction.
- If this is a controlled framework-to-framework transfer, set
allow_reserved_dpone_columns: trueexplicitly. - Document why the exception exists.
Troubleshooting¶
| Symptom | Likely cause | Response |
|---|---|---|
| Load fails with breaking schema evolution changes | Drop/rename/narrowing/type change detected | Inspect plan JSON and follow the matching runbook |
| DDL permission denied | Runner lacks ALTER TABLE or dataset update rights |
Grant least-privilege schema evolution permissions or set apply_safe: false for preflight only |
| ClickHouse DDL fails | Engine/type does not support planned MODIFY COLUMN |
Use generated column mode or manual migration |
| BigQuery schema update fails | BigQuery does not support that exact change | Add a generated column or rebuild through a planned shadow table |
| Existing generated column has wrong type | Prior incompatible change used a different type | Create a manual migration plan; dpone will not guess suffixes |
| Target has framework columns missing from source | Expected for meta__* and __dpone__* |
These are ignored as target-only columns |
Developer notes¶
SchemaComparatoris pure and belongs to readiness/planning.SchemaEvolutionServiceis runtime orchestration and runs beforesink.load().- Sinks expose
get_target_schema(load_config)andapply_schema_plan(load_config, plan). - Load strategies should not implement schema drift logic themselves.
- New sink implementations should add contract tests for DDL rendering, target introspection, and payload column remapping.
Kafka schema evolution notes¶
Kafka sink participates in schema evolution only when Schema Registry is enabled. With schema_registry.auto_register_schemas: true, dpone registers safe evolved value schemas before producing. Without Schema Registry, Kafka target schema evolution is intentionally a no-op; downstream systems receive inferred source schema through the normal ExtractResult.schema contract.