Skip to content

Postgres -> ClickHouse

This guide is a copy/paste-ready starting point for loading data from Postgres into ClickHouse with dpone.

When to use this path

Use this path when Postgres is the system of record or ingestion boundary and ClickHouse is the landing, warehouse, event-log, or downstream replication target.

Copy/paste manifest

name: postgres_to_clickhouse_example

source:
  type: postgres
  connection_id: postgres_oltp
  connection_type: vault
  table:
    schema: public
    name: orders
  options:
    incremental_column: updated_at
    batch_size: 50000

sink:
  type: clickhouse
  connection_id: clickhouse_analytics
  connection_type: vault
  table:
    database: analytics
    name: orders
  strategy:
    mode: incremental_merge
    unique_key: order_id
    merge_policy: lightweight_delete_insert
    duplicate_policy: fail
  options:
    insert_mode: native
    batch_size: 100000
    staging_engine: MergeTree

state:
  type: postgres
  connection_id: postgres_state
  table:
    schema: etl_state
    name: dpone_state

quality:
  mode: fail
  checks:
    - type: min_rows
      threshold: 1
    - type: source_target_count
      tolerance_pct: 0.1

observability:
  artifacts:
    enabled: true
    path: .dpone/runs/postgres_to_clickhouse

Run it locally:

dpone plan examples/postgres_to_clickhouse.yaml --format md
dpone batch run examples/postgres_to_clickhouse.yaml

Supported load strategies

Strategy Status Notes
full_refresh Supported Uses staging first, then applies the target-specific finalization plan.
incremental_append Supported Uses staging first, then applies the target-specific finalization plan.
incremental_merge Supported Default merge_policy: lightweight_delete_insert; shadow_swap supported; mutation_delete_insert is explicit opt-in and non-recommended.
replace Supported Uses staging first, then applies the target-specific finalization plan.
partition_replace Supported Replaces target partitions represented by staging partition.column; see Load strategies for native/fallback paths.
snapshot_reconciliation Supported Uses staging first, then applies the target-specific finalization plan.
xmin Postgres source only Uses Postgres transaction IDs as the incremental boundary; configure explicitly or omit incremental_column for the default Postgres source behavior.
cdc Source-specific Uses typed CDC offsets and advances state only after sink success.

See Load strategies for the detailed algorithm for each strategy.

Runtime algorithm

flowchart TD
    A["Resolve manifest and registry entries"] --> B["Create Postgres source"]
    B --> C["Plan bounded extract"]
    C --> D["Read through server-side cursor or partitioned range scan"]
    D --> E["Emit ExtractResult with schema and artifact"]
    E --> F["Plan schema evolution"]
    F --> G["Create ClickHouse staging or event batch"]
    G --> H["Load through staging/shadow table insert with atomic swap or append finalization"]
    H --> I["Apply finalization strategy"]
    I --> J["Run quality and reconciliation checks"]
    J --> K["Advance state only after success"]

Strategy behavior

  • full_refresh: extract the selected source boundary, load into staging, and replace the target according to the target's safe finalization path.
  • incremental_append: extract only the incremental boundary and append rows through staging or event production.
  • incremental_merge: load into staging, validate duplicates, then use lightweight_delete_insert by default; shadow_swap and guarded mutation_delete_insert are explicit policies.
  • replace: reload a bounded predicate window through staging and then atomically replace the matching target slice.
  • snapshot_reconciliation: compare the latest source snapshot with the target key set and apply configured physical-delete or soft-delete behavior through staging-first plans.
  • partition_replace: extract a complete partition slice, load it into staging, and replace only partitions represented by partition.column.

Schema evolution and type mapping

Schema evolution is enabled by default and runs before the staging/final load path:

  1. Read source schema from ExtractResult.schema.
  2. Introspect the ClickHouse target schema.
  3. Apply safe additions and widening operations.
  4. Fail breaking changes by default.
  5. If configured, route incompatible type changes to __dpone__nc__<column>.

Use Schema evolution and Type mapping matrix when adding columns or changing source types.

Runbook

  1. Start with dpone doctor --profile local and fix missing extras or native clients.
  2. Run dpone plan <manifest> --format md and review source boundary, staging path, schema evolution, state, and quality checks.
  3. Run a small bounded window first.
  4. Inspect the run artifact under .dpone/runs/postgres_to_clickhouse.
  5. For incremental jobs, verify state before enabling a schedule.
  6. For delete-aware jobs, run reconciliation in report-only mode before enabling physical deletes.
  7. Promote the manifest through GitOps after the plan and artifact are reviewed.

Type contracts and physical design

This flow supports the shared dpone type-governance stack:

  • Type inference for source metadata, sampled profiling, confidence, and empty string vs NULL behavior.
  • Schema contracts for explicit logical column types, enforcement modes, and __dpone__nc__* variant columns.
  • Physical design for target-specific DDL such as concrete SQL types, indexes, partitioning, compression, ClickHouse LowCardinality, and BigQuery clustering.

Use dpone schema infer --manifest ... and dpone schema physical-plan --manifest ... before enabling new table DDL in production.