Schema Evolution¶

dpone applies safe schema evolution automatically before every staging-first load. For low-lock DDL governance, lock budgets, ledger artifacts, and expand-contract workflows, see Online schema evolution. For planned renames, use Schema Identity so semantic aliases are resolved before name-based schema evolution. The goal is to let routine source drift move forward without breaking production pipelines, while still stopping on changes that can silently corrupt data.

Default runtime policy:

sink:
  options:
    schema_evolution:
      enabled: true
      mode: widening
      apply_safe: true
      on_breaking: fail
      on_type_change: fail
      new_column_prefix: "__dpone__nc__"

To turn it off for a pipeline:

sink:
  options:
    schema_evolution.enabled: false

Use the expanded form in normal manifests:

sink:
  options:
    schema_evolution:
      enabled: false

Runtime flow¶

sequenceDiagram
    participant Source
    participant Runtime as dpone Runtime
    participant Evolution as SchemaEvolutionService
    participant Sink
    participant Target

    Source->>Runtime: ExtractResult(artifact, schema, state)
    Runtime->>Sink: get_target_schema(table)
    Sink->>Target: introspect current columns
    Runtime->>Evolution: compare source schema vs target schema
    Evolution-->>Runtime: plan(actions, DDL, column_mapping)
    alt safe changes and apply_safe=true
        Runtime->>Sink: apply_schema_plan(plan)
        Sink->>Target: safe ADD / widen DDL
    else breaking change
        Runtime-->>Runtime: fail closed before staging
    end
    Runtime->>Sink: load artifact through staging/shadow
    Sink->>Target: finalize set-based commit
    Runtime-->>Runtime: persist state only after sink success

What happens before load¶

The source returns ExtractResult.schema.
The sink introspects the current target table schema.
SchemaEvolutionService builds a plan with SchemaComparator.
Safe DDL is applied before staging is created.
If a type-change mapping is configured, source rows are remapped before sink load.
The normal staging-first strategy continues.

This means schema evolution never writes data directly into the final target table. Data still flows through staging/shadow paths implemented by each sink.

Safe vs breaking changes¶

Safe by default:

Change	Runtime action
New nullable source column	`ALTER TABLE ADD COLUMN`, then load into staging with the new column
Safe numeric/string widening	`ALTER/ MODIFY COLUMN` where the dialect supports it
Existing generated column for type change	Reuse it and route source values there
Technical columns	Managed by sink/reconciliation strategy

Breaking by default:

Change	Why it stops
Source column removed	Could be a real drop or rename; dpone cannot infer intent safely
Rename without Schema Identity	Looks like drop + add; configure `schema_identity` aliases for planned rename lifecycle
Narrowing	Can truncate/overflow data
Nullable -> not-null without default	Existing target rows can violate it
Incompatible type change	Can corrupt values unless explicitly routed to a generated column
Source column starting with `__dpone__`	Reserved framework namespace

Incompatible type changes with generated columns¶

If a source column changes type incompatibly, set:

sink:
  options:
    schema_evolution:
      on_type_change: new_column

Example:

Source	Target before	Target after
`amount nvarchar(100)`	`amount int`	`amount int`, `__dpone__nc__amount nvarchar(100)`

Runtime behavior:

The original target column is not overwritten.
__dpone__nc__<original_column_name> is added as nullable.
Source values from amount are written into __dpone__nc__amount.
Downstream consumers can migrate at their own pace.
After migration, an operator can manually rename/drop columns during a planned change window.

If the generated column already exists with the same type, dpone reuses it. If it exists with a different type, the load fails.

When a rename and incompatible type change happen together, Schema Identity runs first. The generated column uses the canonical name: __dpone__nc__<canonical_column>.

Framework-generated companion columns¶

__dpone__* remains a reserved framework namespace. Source columns that start with __dpone__ fail by default unless they are known framework-generated columns. This prevents user data from accidentally colliding with runtime metadata.

One important generated family is temporal offset fidelity:

source:
  options:
    type_fidelity:
      temporal:
        offset_timestamp:
          mode: preserve_offset

For every offset-aware source column, dpone adds __dpone__tz_offset_minutes__<column> as a nullable/safe companion column. The normal schema evolution flow adds that column before staging. If the companion column already exists with an incompatible type, the load fails before data is written. If schema_evolution.enabled: false, create the companion column manually or choose utc_instant, fixed_timezone, or preserve_text.

CLI planning¶

Use the same planner outside runtime:

dpone schema plan \
  --source source-columns.json \
  --target target-columns.json \
  --table landing.orders \
  --dialect postgres \
  --mode widening \
  --on-type-change new_column \
  --format json

Supported dialects:

mssql
postgres
clickhouse
bigquery

Column JSON shape:

[
  {"name": "id", "dtype": "bigint", "nullable": false},
  {"name": "amount", "dtype": "numeric(18,2)", "nullable": true}
]

When a plan reports type changes for a cross-system load, run the explain view before changing manifests. It joins the schema comparator with the source -> sink type matrix:

dpone schema explain \
  --source source-columns.json \
  --target target-columns.json \
  --source-system mssql \
  --sink-system clickhouse \
  --format md

Use this when a target column is semantically correct but written in target dialect form. For example, MSSQL int nullable is expected to match ClickHouse Nullable(Int32), and MSSQL nvarchar(510) nullable is expected to match ClickHouse Nullable(String). The explain output shows:

raw comparator equality;
pair-specific expected target type;
whether the current target matches the profile;
the runbook or diagnostic reason.

If matrix_matches_target=true, the issue is usually a missing type profile or stale plan artifact rather than a real breaking schema change. If it is false, follow the expand-contract or explicit schema_contract runbook for the affected column.

Source -> sink matrix¶

Schema evolution is sink-side after extraction, so the same policy works for each source. The target dialect controls DDL and type mapping.

For production implementation details, use the dedicated Source -> sink matrix index. It links to one guide per supported source -> sink combination. Use Type mapping matrix for the cross-system type conversion defaults and caveats.

Flow	Safe add column	Widening	Incompatible type change	Notes
postgres -> mssql	Auto `ALTER TABLE ADD`	Auto where MSSQL type widening is supported	Fail or `__dpone__nc__*`	Best for high-throughput Postgres COPY -> bcp loads
postgres -> postgres	Auto `ADD COLUMN`	Auto `ALTER COLUMN TYPE`	Fail or `__dpone__nc__*`	Prefer explicit maintenance window for narrowing
postgres -> clickhouse	Auto `ADD COLUMN`	Auto `MODIFY COLUMN` for safe widening	Fail or `__dpone__nc__*`	Full/replace still use staging/shadow paths
postgres -> bigquery	Auto schema field addition	Auto supported `ALTER COLUMN SET DATA TYPE`	Fail or `__dpone__nc__*`	BigQuery limits some nested/repeated changes
mssql -> mssql	Auto `ALTER TABLE ADD`	Auto where supported	Fail or `__dpone__nc__*`	Same-DB internal query path remains supported
mssql -> postgres	Auto `ADD COLUMN`	Auto `ALTER COLUMN TYPE`	Fail or `__dpone__nc__*`	Watch datetime/uniqueidentifier mappings
mssql -> clickhouse	Auto `ADD COLUMN`	Auto `MODIFY COLUMN`	Fail or `__dpone__nc__*`	Native TSV/client ingest keeps column order
mssql -> bigquery	Auto schema field addition	Auto supported widening	Fail or `__dpone__nc__*`	Prefer string-safe landing for sql_variant-like fields
clickhouse -> mssql	Auto `ALTER TABLE ADD`	Auto where `clickhouse_to_mssql_landing_v1` proves compatibility	Fail or `__dpone__nc__*`	Arrays/nested/enums/high-precision values require explicit `schema_contract`; event tables need a cursor or partition boundary
clickhouse -> postgres	Auto `ADD COLUMN`	Auto `ALTER COLUMN TYPE`	Fail or `__dpone__nc__*`	LowCardinality/Nullable wrappers are normalized by source schema
clickhouse -> clickhouse	Auto `ADD COLUMN`	Auto `MODIFY COLUMN`	Fail or `__dpone__nc__*`	No mutations are used for load or reconciliation paths
clickhouse -> bigquery	Auto schema field addition	Auto supported widening	Fail or `__dpone__nc__*`	Parquet/GCS paths may need explicit nested type handling
api -> mssql	Auto `ALTER TABLE ADD`	Auto where supported	Fail or `__dpone__nc__*`	Keep REST `columns` explicit for predictable DDL
api -> postgres	Auto `ADD COLUMN`	Auto `ALTER COLUMN TYPE`	Fail or `__dpone__nc__*`	JSON payloads can be typed as `jsonb` or `text`
api -> clickhouse	Auto `ADD COLUMN`	Auto `MODIFY COLUMN`	Fail or `__dpone__nc__*`	Prefer `String` for highly variable fields
api -> bigquery	Auto schema field addition	Auto supported widening	Fail or `__dpone__nc__*`	BigQuery JSON is useful for semi-structured payloads

Example manifests¶

Postgres -> MSSQL with default safe evolution:

source:
  type: postgres
  connection_id: postgres_source
  table: {schema: public, name: orders}

sink:
  type: mssql
  connection_id: mssql_dwh
  table: {schema: landing, name: orders}
  strategy: {mode: incremental_merge}
  options:
    schema_evolution:
      enabled: true
      mode: widening

MSSQL -> ClickHouse with generated-column type-change handling:

source:
  type: mssql
  connection_id: mssql_source
  table: {schema: dbo, name: orders}

sink:
  type: clickhouse
  connection_id: clickhouse_dwh
  table: {schema: landing, name: orders}
  strategy: {mode: incremental_append}
  options:
    schema_evolution:
      on_type_change: new_column

REST/API -> BigQuery in plan-only runtime mode:

source:
  type: api
  api_type: rest
  options:
    resource: orders
    columns:
      - {name: id, type: bigint}
      - {name: payload, type: json}

sink:
  type: bigquery
  connection_id: bq_dwh
  table: {schema: landing_api, name: orders}
  strategy: {mode: full_refresh}
  options:
    schema_evolution:
      apply_safe: false

With apply_safe: false, runtime fails before loading when safe DDL is needed. Use this in CI/preflight jobs when you want a plan but do not want the runner to mutate schemas.

Runbooks¶

New column appears in source¶

Confirm the new column is expected.
Keep schema_evolution.enabled: true.
Let dpone add it automatically if nullable/default-safe.
If the column is required/non-null, add it manually with a default or make the source expose it nullable first.

Source type widened¶

Run dpone schema plan with the target dialect.
If the plan shows type_widen, allow runtime to apply it.
For BigQuery and ClickHouse, confirm the target engine supports the planned DDL for that concrete type.
Re-run the pipeline and check normal row-count reconciliation.

Source type changed incompatibly¶

Keep default on_type_change: fail if downstream consumers must not see contract drift.
Use on_type_change: new_column if you want ingestion to continue safely.
Update downstream models to read __dpone__nc__<column>.
During a planned migration, backfill/rename/drop manually.
Remove the generated-column exception only after downstream consumers are migrated.

Column disappeared¶

Treat it as possible rename/drop.
Stop the pipeline on the default failure.
Check source DDL/change request.
If it is a rename, add a transform/source alias or manual migration.
If it is a real drop, schedule target cleanup manually.

Nullability changed¶

Nullable -> not-null is not auto-applied unless a default/backfill is planned.
Add a default or backfill existing target rows.
Apply the not-null constraint manually in a maintenance window.

Reserved `dpone` source column¶

Rename the source column or alias it in extraction.
If this is a controlled framework-to-framework transfer, set allow_reserved_dpone_columns: true explicitly.
Document why the exception exists.

Troubleshooting¶

Symptom	Likely cause	Response
Load fails with breaking schema evolution changes	Drop/rename/narrowing/type change detected	Inspect plan JSON and follow the matching runbook
DDL permission denied	Runner lacks `ALTER TABLE` or dataset update rights	Grant least-privilege schema evolution permissions or set `apply_safe: false` for preflight only
ClickHouse DDL fails	Engine/type does not support planned `MODIFY COLUMN`	Use generated column mode or manual migration
BigQuery schema update fails	BigQuery does not support that exact change	Add a generated column or rebuild through a planned shadow table
Existing generated column has wrong type	Prior incompatible change used a different type	Create a manual migration plan; dpone will not guess suffixes
Target has framework columns missing from source	Expected for `meta__` and `__dpone__`	These are ignored as target-only columns

Developer notes¶

SchemaComparator is pure and belongs to readiness/planning.
SchemaEvolutionService is runtime orchestration and runs before sink.load().
Sinks expose get_target_schema(load_config) and apply_schema_plan(load_config, plan).
Load strategies should not implement schema drift logic themselves.
New sink implementations should add contract tests for DDL rendering, target introspection, and payload column remapping.

Kafka schema evolution notes¶

Kafka sink participates in schema evolution only when Schema Registry is enabled. With schema_registry.auto_register_schemas: true, dpone registers safe evolved value schemas before producing. Without Schema Registry, Kafka target schema evolution is intentionally a no-op; downstream systems receive inferred source schema through the normal ExtractResult.schema contract.