Load lineage and technical columns¶
dpone uses canonical framework metadata columns to make loads auditable,
retry-safe, and debuggable across database, API, and Kafka pipelines.
The rule is simple: framework-owned columns always use the __dpone__*
namespace. User data should not create columns with this prefix.
Quick start¶
Default lineage is enabled and uses the standard preset plus quarantine
support:
Disable target lineage columns explicitly when you need a strict legacy target shape:
Set defaults once in batch manifests:
defaults:
sink:
options:
lineage:
enabled: true
preset: standard
features:
operations: true
quarantine: true
Presets and features¶
Presets are not mutually exclusive modes. They are starting points. Features can be enabled or disabled on top of a preset.
| Preset | Target columns |
|---|---|
off |
none |
minimal |
__dpone__load_id, __dpone__loaded_at |
standard |
minimal plus __dpone__row_id, __dpone__extracted_at |
debug |
standard plus operation and diagnostics columns |
hierarchical |
standard plus parent/root/list index columns |
| Feature | Adds |
|---|---|
identity |
__dpone__load_id, __dpone__loaded_at |
row_identity |
__dpone__row_id, __dpone__extracted_at |
hierarchy |
__dpone__parent_row_id, __dpone__root_row_id, __dpone__list_index |
operations |
__dpone__op |
diagnostics |
__dpone__meta |
quarantine |
rejected rows go to state quarantine storage, not to the target table |
For dlt-like splitting of nested JSON into root and child tables, see
Nested normalization. The hierarchical preset is
the lineage contract used by that runtime path.
Example: standard lineage with operation and diagnostics columns:
sink:
options:
lineage:
enabled: true
preset: standard
features:
operations: true
diagnostics: true
quarantine: true
Canonical technical columns¶
| Column | Purpose | Type contract |
|---|---|---|
__dpone__loaded_at |
UTC time when the load package wrote the row | timestamp |
__dpone__deleted_at |
UTC logical delete time | nullable timestamp |
__dpone__xmin |
PostgreSQL XMin checkpoint value when materialized | bigint/string by sink |
__dpone__deleted_marker |
Physical delete reconciliation marker | boolean/integer by sink |
__dpone__run_id |
One process execution identifier | ULID string, 26 chars |
__dpone__load_id |
One target load package identifier | ULID string, 26 chars |
__dpone__row_id |
Deterministic row lineage identity | SHA-256 hex, 64 chars |
__dpone__parent_row_id |
Parent row identity for normalized nested data | SHA-256 hex, 64 chars |
__dpone__root_row_id |
Root row identity for normalized nested data | SHA-256 hex, 64 chars |
__dpone__list_index |
Stable list position for normalized nested arrays | integer |
__dpone__op |
Normalized operation marker such as insert/update/delete | string |
__dpone__meta |
Optional row diagnostics | JSON/object |
__dpone__row_hash |
Business row hash for diff/SCD2 strategies | SHA-256 hex, 64 chars |
__dpone__valid_from_at |
SCD2 validity start timestamp | timestamp |
__dpone__valid_to_at |
SCD2 validity end timestamp | nullable timestamp |
__dpone__is_current |
SCD2 current row flag | boolean |
__dpone__extracted_at |
UTC time when the source row was extracted | timestamp |
Legacy names such as meta__load_dtm, meta__update_dtm,
meta__delete_dtm, meta__xmin, and __dpone_deleted_marker are accepted
only through compatibility resolution.
New tables should use canonical names only.
Load audit table¶
State backends can store load package lifecycle records in
etl_state.__dpone__loads.
stateDiagram-v2
[*] --> started
started --> staged: staging loaded
staged --> committed: target finalized
started --> failed: extract/load error
staged --> failed: finalization error
committed --> [*]
failed --> [*]
State advancement happens only after the load audit status reaches
committed.
| Status | Meaning |
|---|---|
started |
Run/load IDs were created and extraction is about to execute |
staged |
Source payload was materialized into staging or producer batch |
committed |
Sink finalization succeeded and state can advance |
failed |
The load package failed; state must not advance |
cancelled |
Reserved for cooperative cancellation flows |
Row identity algorithm¶
__dpone__row_id is deterministic across retries. It does not include
retry-specific values such as __dpone__load_id or __dpone__loaded_at.
flowchart TD
A["Input source row"] --> B{"unique_key configured?"}
B -->|yes| C["Normalize source family, source table, unique_key values"]
B -->|no| D["Normalize source family, source table, source row content, stable row path"]
C --> E["SHA-256 hex"]
D --> E
E --> F["__dpone__row_id"]
Preferred input is:
Fallback input is:
Use a unique_key whenever possible. Fallback identity is deterministic for
stable extract ordering, but a business key is more robust for retries,
backfills, and source-side ordering changes.
Nested hierarchy relationship¶
Nested normalization uses the same row identity service and adds parent/root relationships:
flowchart LR
Root["orders.__dpone__row_id"] --> Customer["orders__customer.__dpone__parent_row_id"]
Root --> Items["orders__items.__dpone__parent_row_id"]
Items --> Discounts["orders__items__discounts.__dpone__parent_row_id"]
Root -. root lineage .-> DiscountsRoot["orders__items__discounts.__dpone__root_row_id"]
Join child rows back to parent rows with:
See Nested normalization for table naming, scalar arrays, configuration, processor behavior, and runbooks.
Strategy interaction¶
| Strategy | Lineage usage |
|---|---|
full_refresh |
Writes one load_id for the load package and row IDs when enabled |
incremental_append |
Uses row IDs for audit/debug and source cursor state for checkpointing |
incremental_merge |
Uses unique_key for target finalization; row IDs remain lineage IDs, not business keys |
snapshot_diff |
Uses __dpone__row_hash to detect changed rows |
scd2 |
Uses __dpone__valid_from_at, __dpone__valid_to_at, __dpone__is_current, and __dpone__row_hash |
cdc_apply |
Uses __dpone__op when operation tracking is enabled |
backfill |
Uses one run_id and multiple load_id values for chunks |
See Load strategies for the execution algorithms.
Implementation map¶
| Concept | Code |
|---|---|
| Technical column catalog | src/dpone/contracts/technical_columns.py |
| Lineage option resolution | src/dpone/runtime/lineage/options.py |
| Row ID and ULID generation | src/dpone/runtime/lineage/identity.py |
| Row enrichment | src/dpone/runtime/lineage/enrichment.py |
| Nested normalization | src/dpone/runtime/normalization/normalizer.py |
| Load lifecycle service | src/dpone/runtime/lineage/audit.py |
| Processor lifecycle integration | src/dpone/runtime/etl/processor.py |
Runbooks¶
Target rejects __dpone__* columns¶
- Confirm the target table allows schema evolution or pre-create the lineage columns manually.
- If this is a strict legacy target, set
sink.options.lineage: false. - Keep run artifacts and
__dpone__loadsenabled through state storage when possible, even if target row lineage is disabled.
Row identity changes unexpectedly¶
- Prefer an explicit
sink.strategy.unique_keyfor merge/diff/SCD2 flows. - Check whether the source extract ordering changed when no
unique_keyis configured. - Compare source row content before enrichment. Retry-specific lineage fields
are intentionally excluded from
__dpone__row_id.
Legacy target already has meta__* columns¶
- Use
technical_columns_naming: legacy_compatfor one deprecation window. - Plan a migration to canonical
__dpone__*names. - Do not create new
meta__*,*_dtm, or*_dttmframework columns.