Skip to content

MSSQL -> ClickHouse

This guide is a copy/paste-ready starting point for loading data from MSSQL into ClickHouse with dpone.

When to use this path

Use this path when MSSQL is the system of record or ingestion boundary and ClickHouse is the landing, warehouse, event-log, or downstream replication target.

Copy/paste manifest

name: mssql_to_clickhouse_example

source:
  type: mssql
  connection_id: mssql_dwh
  connection_type: vault
  table:
    schema: dbo
    name: orders
  options:
    incremental_column: updated_at
    batch_size: 50000
    export_mode: bcp

sink:
  type: clickhouse
  connection_id: clickhouse_analytics
  connection_type: vault
  table:
    database: analytics
    name: orders
  strategy:
    mode: incremental_merge
    unique_key: order_id
    merge_policy: lightweight_delete_insert
    duplicate_policy: fail
  options:
    insert_mode: native
    batch_size: 100000
    staging_engine: MergeTree

state:
  type: postgres
  connection_id: postgres_state
  table:
    schema: etl_state
    name: dpone_state

quality:
  mode: fail
  checks:
    - type: min_rows
      threshold: 1
    - type: source_target_count
      tolerance_pct: 0.1

observability:
  artifacts:
    enabled: true
    path: .dpone/runs/mssql_to_clickhouse

Run it locally:

dpone plan examples/mssql_to_clickhouse.yaml --format md
dpone batch run examples/mssql_to_clickhouse.yaml

Supported load strategies

Strategy Status Notes
full_refresh Supported Uses staging first, then applies the target-specific finalization plan.
incremental_append Supported Uses staging first, then applies the target-specific finalization plan.
incremental_merge Supported Default merge_policy: lightweight_delete_insert; shadow_swap supported; mutation_delete_insert is explicit opt-in and non-recommended.
replace Supported Uses staging first, then applies the target-specific finalization plan.
partition_replace Supported Replaces target partitions represented by staging partition.column; see Load strategies for native/fallback paths.
snapshot_reconciliation Supported Uses staging first, then applies the target-specific finalization plan.
cdc Source-specific Uses typed CDC offsets and advances state only after sink success.

See Load strategies for the detailed algorithm for each strategy.

Runtime algorithm

flowchart TD
    A["Resolve manifest and registry entries"] --> B["Create MSSQL source"]
    B --> C["Plan bounded extract"]
    C --> D["Read through BCP queryout or pyodbc streaming cursor"]
    D --> E["Emit ExtractResult with schema and artifact"]
    E --> F["Plan schema evolution"]
    F --> G["Create ClickHouse staging or event batch"]
    G --> H["Load through staging/shadow table insert with atomic swap or append finalization"]
    H --> I["Apply finalization strategy"]
    I --> J["Run quality and reconciliation checks"]
    J --> K["Advance state only after success"]

Strategy behavior

  • full_refresh: extract the selected source boundary, load into staging, and replace the target according to the target's safe finalization path.
  • incremental_append: extract only the incremental boundary and append rows through staging or event production.
  • incremental_merge: load into staging, validate duplicates, then use lightweight_delete_insert by default; shadow_swap and guarded mutation_delete_insert are explicit policies.
  • replace: reload a bounded predicate window through staging and then atomically replace the matching target slice.
  • snapshot_reconciliation: compare the latest source snapshot with the target key set and apply configured physical-delete or soft-delete behavior through staging-first plans.
  • partition_replace: extract a complete partition slice, load it into staging, and replace only partitions represented by partition.column.

Schema evolution and type mapping

Schema evolution is enabled by default and runs before the staging/final load path:

  1. Read source schema from ExtractResult.schema.
  2. Introspect the ClickHouse target schema.
  3. Apply safe additions and widening operations.
  4. Fail breaking changes by default.
  5. If configured, route incompatible type changes to __dpone__nc__<column>.

Use Schema evolution and Type mapping matrix when adding columns or changing source types.

Exact type preservation

MSSQL -> ClickHouse uses a lossless-first type profile for native transfers. The goal is to avoid the classic analytical-load failure mode where exact MSSQL values arrive but are stored as approximate ClickHouse values.

MSSQL type Default ClickHouse type Fidelity
decimal(p,s), numeric(p,s) Decimal(p,s) Exact precision and scale, capped to ClickHouse decimal limits.
money Decimal(19,4) Exact fixed scale.
smallmoney Decimal(10,4) Exact fixed scale.
bigint, int, smallint, tinyint Int64, Int32, Int16, UInt8 Exact integer range mapping.
bit Bool Exact boolean mapping.
uniqueidentifier UUID Exact UUID mapping.
datetime2(p) DateTime64(p) Preserves configured fractional precision.
datetimeoffset(p) configurable Default utc_instant lands DateTime64(p, 'UTC'); fixed_timezone, preserve_offset and preserve_text are available through type_fidelity.temporal.offset_timestamp.
date Date Exact date mapping.
time(p) String Conservative text landing because ClickHouse has no standalone time type.
binary, varbinary, rowversion String Requires explicit hex/base64 codec policy for byte-readable contracts.
nvarchar, varchar, xml, deprecated text types String Text-compatible landing.

dpone plan includes a type_fidelity section when MSSQL source schema metadata is available. Review that section before enabling a new table in production.

Binary and time codec policy

Use source.options.type_fidelity when source tables contain byte or standalone time columns and you need the transfer to be certifiably lossless:

source:
  type: mssql
  options:
    type_fidelity:
      binary_encoding: hex
      time_encoding: seconds_since_midnight

Offset timestamp examples:

source:
  options:
    type_fidelity:
      temporal:
        offset_timestamp:
          mode: utc_instant
          timezone: UTC
source:
  options:
    type_fidelity:
      temporal:
        offset_timestamp:
          mode: fixed_timezone
          timezone: Europe/Moscow
source:
  options:
    type_fidelity:
      temporal:
        offset_timestamp:
          mode: preserve_offset
source:
  options:
    type_fidelity:
      temporal:
        offset_timestamp:
          mode: preserve_text

preserve_offset stores the UTC instant in the original target column and adds __dpone__tz_offset_minutes__<column> as Int16. MSSQL datetimeoffset stores an offset such as +03:00, not an IANA timezone name such as Europe/Moscow; use fixed_timezone for business-view conversion and preserve_offset when the source offset must survive.

Policy values:

Option Values Default Effect
binary_encoding none, hex, base64 none hex exports MSSQL binary/varbinary/rowversion as deterministic hexadecimal text for ClickHouse String columns. none keeps binary columns non-lossless in the plan.
time_encoding string, seconds_since_midnight string string lands MSSQL time(p) as text. seconds_since_midnight exports it as UInt32 seconds for faster numeric filtering.
temporal.offset_timestamp.mode utc_instant, fixed_timezone, preserve_offset, preserve_text utc_instant Controls MSSQL datetimeoffset landing semantics.
temporal.offset_timestamp.timezone IANA timezone, SQL Server timezone alias, or fixed offset UTC Used by fixed_timezone; use Europe/Moscow for Moscow business time.
temporal.offset_timestamp.malformed fail, preserve_text fail Stops on malformed offset timestamps by default; use preserve_text only for raw landing/audit tables.
temporal.offset_timestamp.columns object keyed by column name {} Per-column overrides for tables where only some timestamp columns need preserve_offset, fixed_timezone, or preserve_text.

Run dpone plan --format json and check:

  • type_fidelity.policy matches the manifest.
  • binary columns are lossless: true only when an explicit byte encoding is configured.
  • time(p) becomes UInt32 only with time_encoding: seconds_since_midnight.

If a table needs sub-second standalone time precision, keep time_encoding: string or define an explicit target type in Physical design. For per-column temporal policies and malformed-value handling, see Temporal fidelity.

Typed reconciliation

For release certification and high-risk schema changes, use typed_hash in the local certification tool:

python tools/mssql_clickhouse_fault_injection.py \
  --rows 10000 \
  --reconciliation-profile typed_hash \
  --binary-encoding hex \
  --time-encoding seconds_since_midnight \
  --clickhouse-bulk-mode http

typed_hash compares rows through canonical typed serialization using the source schema. It treats Decimal('1.14'), ClickHouse integer JSON representations for whole decimals, benign floating JSON renderings such as 1.1400000000000001, hex-encoded bytes and configured standalone time values as the same typed value when the source contract says they are equivalent. It still preserves NULL vs empty string, integer values, timestamps and text boundaries.

Use profiles as follows:

Profile Use case
count_only Smoke checks where only row presence matters.
count_and_checksum Fast default for regular local certification.
sample_hash Codec/escaping changes where representative row content must match.
typed_hash Production type-fidelity certification for MSSQL -> ClickHouse.
full_partition_hash Bounded release slices where full partition hashing cost is acceptable.

Runbook:

  1. If typed_hash fails but counts pass, inspect the failed partition's source and target typed hashes in the artifact.
  2. Check dpone plan type_fidelity.columns for approximate or non-lossless decisions.
  3. For decimals, ensure the target table uses Decimal(p,s), not Float64.
  4. For binary/time columns, choose an explicit physical design or codec policy before certifying as lossless.
  5. Re-run typed_hash after DDL or codec changes.

Wide type-fidelity certification

Use the wide-type harness before changing MSSQL -> ClickHouse codecs, partitioning, DDL type mapping, or native bulk settings. It creates a disposable MSSQL source table with at least 120 columns and exercises the same runtime path as production:

wide MSSQL source -> bcp queryout partition files -> ClickHouse HTTP TSV load -> typed reconciliation evidence

The fixture intentionally includes:

  • exact numerics: decimal, numeric, money, smallmoney;
  • identifiers and timestamps: uniqueidentifier, datetime2, datetimeoffset, date, time;
  • bytes: binary, varbinary, rowversion;
  • text edge cases: Unicode, nvarchar(max), empty string, NULL;
  • sparse generated columns across integers, decimals, text, dates, times, binary and booleans.

Smoke certification:

DPONE_IT_MSSQL_HOST=127.0.0.1 \
DPONE_IT_MSSQL_PORT=51433 \
DPONE_IT_MSSQL_DATABASE=dpone \
DPONE_IT_MSSQL_USER=sa \
DPONE_IT_MSSQL_PASSWORD='Dp0ne.Strong.Pw.2026!' \
DPONE_IT_MSSQL_BCP_PATH=/opt/homebrew/bin/bcp \
DPONE_IT_CH_HOST=127.0.0.1 \
DPONE_IT_CH_PORT=59000 \
DPONE_IT_CH_HTTP_PORT=58123 \
DPONE_IT_CH_DATABASE=dpone_it \
DPONE_IT_CH_USER=default \
DPONE_IT_CH_PASSWORD=dpone \
uv run python tools/mssql_clickhouse_wide_type_certification.py \
  --rows 10000 \
  --column-count 120 \
  --typed-hash-rows 10000 \
  --prepare-chunk-size 10000 \
  --target-rows-per-partition 2500 \
  --export-workers 2 \
  --load-workers 2 \
  --bcp-packet-size 32767 \
  --output-dir test_artifacts/live_certification/benchmarks/wide_type_10k_2026_06_09

Latest local evidence captured on 2026-06-09:

Profile Rows Columns Result Evidence
wide_type_10k_2026_06_09 10,000 120 passed: count, duplicate and typed hash test_artifacts/live_certification/benchmarks/wide_type_10k_2026_06_09/mssql_clickhouse_wide_type_certification.json
wide_type_1m_2026_06_09 1,000,000 120 transfer evidence: export completed and target load produced 1,000,000 rows with no duplicates; source-side reconciliation connection failed after load test_artifacts/live_certification/benchmarks/wide_type_1m_2026_06_09/mssql_clickhouse_wide_type_certification.json
wide_type_1m_reconciliation_2026_06_09 1,000,000 120 passed: source count, target count, duplicate and typed hash sample after MSSQL recovered test_artifacts/live_certification/benchmarks/wide_type_1m_reconciliation_2026_06_09/mssql_clickhouse_wide_type_certification.json
wide_type_1m_repeat_real_2026_06_09 1,000,000 120 passed repeat: source count, target count, duplicate check and typed hash; confirms export is the bottleneck test_artifacts/live_certification/benchmarks/wide_type_1m_repeat_real_2026_06_09/mssql_clickhouse_wide_type_certification.json

The 10k evidence recorded source_count=10000, target_count=10000, duplicate_count=0, typed_hash_passed=true, artifact_bytes=16564348, prepare_seconds=1.404, export_seconds=6.851, load_seconds=0.411 and rows_per_second=735.15.

The 1M conservative profile used a previously prepared MSSQL source table, target_rows_per_partition=100000, export_workers=2 and load_workers=1. It exported 1000000 rows to ten partition files, produced artifact_bytes=1729328238, loaded target_count=1000000 rows into ClickHouse with duplicate_count=0, and then hit a local MSSQL connection failure during immediate source-side reconciliation. After MSSQL recovered, wide_type_1m_reconciliation_2026_06_09 verified source_count=1000000, target_count=1000000, duplicate_count=0 and typed_hash_passed=true for the first 10000 ordered rows.

The repeat 1M run used the same prepared MSSQL source, explicit MSSQL and ClickHouse connection parameters, target_rows_per_partition=100000, export_workers=2 and load_workers=1. It completed with source_count=1000000, target_count=1000000, duplicate_count=0 and typed_hash_passed=true. It reproduced the slow export phase: export_seconds=695.105 versus the earlier 681.932, while ClickHouse load took only 19.415 seconds. Live probes showed MSSQL around 198-204% CPU during export and ClickHouse mostly idle; during load, ClickHouse rose above 200% CPU and MSSQL dropped to idle. The conclusion is that this local profile is dominated by MSSQL-side bcp queryout projection/encoding for the 120-column type-fidelity fixture, not by ClickHouse ingest. See test_artifacts/live_certification/benchmarks/wide_type_1m_repeat_real_2026_06_09/mssql_clickhouse_export_diagnosis.md.

Scale evidence profiles:

uv run python tools/mssql_clickhouse_wide_type_certification.py \
  --rows 1000000 \
  --column-count 120 \
  --typed-hash-rows 10000 \
  --prepare-chunk-size 100000 \
  --target-rows-per-partition 100000 \
  --export-workers 2 \
  --load-workers 1 \
  --bcp-packet-size 32767 \
  --output-dir test_artifacts/live_certification/benchmarks/wide_type_1m_2026_06_09

uv run python tools/mssql_clickhouse_wide_type_certification.py \
  --rows 10000000 \
  --column-count 120 \
  --typed-hash-rows 10000 \
  --prepare-chunk-size 100000 \
  --target-rows-per-partition 1000000 \
  --export-workers 8 \
  --load-workers 8 \
  --bcp-packet-size 32767 \
  --output-dir test_artifacts/live_certification/benchmarks/wide_type_10m_2026_06_09

For 1M and 10M runs, the full row count and duplicate checks cover the complete dataset, while typed_hash intentionally samples the first --typed-hash-rows ordered rows to keep certification cost bounded. Run a full typed hash only on smaller bounded slices.

The harness emits MSSQL_WIDE_PREPARE_SOURCE_*, MSSQL_PARTITIONED_BCP_QUERYOUT_* and final evidence events. If a local Docker SQL Server becomes unresponsive during 1M/10M fixture preparation, restart the local integration stack and retry with a smaller --prepare-chunk-size before changing transfer settings. Treat fixture preparation time separately from native transfer time: evidence includes prepare_seconds, export_seconds and load_seconds.

Do not treat the 10M profile as an interactive developer check on a laptop Docker stack. It is a manual/nightly certification profile that needs dedicated disk, memory and stable SQL Server/ClickHouse containers. On local Docker, promote from 10k -> 1M conservative -> 10M only when the previous profile finishes with passed=true.

Important correctness guardrails covered by the harness:

  • partition bounds are computed from the original source rowset, not from the encoded bcp projection view;
  • datetimeoffset is normalized through an ODBC-safe UTC projection for typed hash verification;
  • SQL Server timestamp/rowversion is treated as binary, never as time;
  • ClickHouse staging decode is type-policy aware, so time(7) encoded as seconds_since_midnight is not passed through text sentinel decoding;
  • binary, varbinary and rowversion are exported as deterministic hex text before ClickHouse TSV ingestion.

Expected artifacts:

Artifact Purpose
mssql_clickhouse_wide_type_certification.json Machine-readable pass/fail, counts, duplicate count, typed hashes and throughput.
mssql_clickhouse_wide_type_certification.md Human-readable release evidence.

Runbook when wide certification fails:

  1. If counts differ, inspect the source export partitions before looking at type mapping.
  2. If duplicate count is non-zero, check partitioning.column, finalizer policy and target cleanup.
  3. If typed_hash differs, compare the source and target hash fields and then inspect the first failing ordered rows.
  4. If binary columns differ, confirm source.options.type_fidelity.binary_encoding: hex is active.
  5. If time(p) differs, confirm whether the contract expects text fidelity or seconds_since_midnight.
  6. If performance regresses, compare export_seconds, load_seconds, artifact_bytes and rows_per_second between artifacts.
  7. If bcp fails with I/O error while writing BCP data-file, reduce export_workers, reduce target_rows_per_partition, set partition_tmp_dir to a disk with enough free space, and rerun with --skip-source-prepare if the source fixture already exists.
  8. If ClickHouse returns BrokenPipe or the native port becomes unavailable, reduce load_workers, keep wait_for_async_insert=1, and inspect ClickHouse container memory/disk pressure before increasing parallelism.
  9. If source-side reconciliation fails after a successful load, rerun reconciliation with --skip-source-prepare --skip-transfer after MSSQL is healthy again; this avoids repeating the large transfer.

Runbook

  1. Start with dpone doctor --profile local and fix missing extras or native clients.
  2. Run dpone plan <manifest> --format md and review source boundary, staging path, schema evolution, state, and quality checks.
  3. Run a small bounded window first.
  4. Inspect the run artifact under .dpone/runs/mssql_to_clickhouse.
  5. For incremental jobs, verify state before enabling a schedule.
  6. For delete-aware jobs, run reconciliation in report-only mode before enabling physical deletes.
  7. Promote the manifest through GitOps after the plan and artifact are reviewed.

Type contracts and physical design

This flow supports the shared dpone type-governance stack:

  • Type inference for source metadata, sampled profiling, confidence, and empty string vs NULL behavior.
  • Schema contracts for explicit logical column types, enforcement modes, and __dpone__nc__* variant columns.
  • Physical design for target-specific DDL such as concrete SQL types, indexes, partitioning, compression, ClickHouse LowCardinality, and BigQuery clustering.

Use dpone schema infer --manifest ... and dpone schema physical-plan --manifest ... before enabling new table DDL in production.

Production native transfer fast path

MSSQL -> ClickHouse has a first-class native transfer path for large tables:

MSSQL SELECT partitions -> bcp queryout TSV files -> ClickHouse staging INSERT FORMAT TabSeparated -> target finalizer -> quality/state commit

This path keeps the normal dpone execution flow. It does not introduce a separate source/sink hierarchy. The reusable planning layer explains the transfer, while the existing MSSQL source, file artifacts, ClickHouse sink, lineage/state, quality checks and run artifacts still own execution.

Copy-paste manifest

source:
  type: mssql
  connection_id: mssql_oltp
  connection_type: env
  table:
    schema: dbo
    name: orders
  options:
    native_transfer:
      optimizer_profile: high_throughput_safe
    extract_mode: bcp_queryout
    mssql_export_mode: bcp
    read_consistency: read_committed_snapshot_if_enabled_else_read_committed
    type_fidelity:
      binary_encoding: hex
      time_encoding: seconds_since_midnight
    bulk:
      mode: bcp
      bcp:
        batch_size: 100000
        packet_size: 65535
        timeout_seconds: 3600
    partitioning:
      strategy: auto
      column: order_id
      bounds: auto
      target_rows_per_partition: 1000000
      max_partitions: 64
      export_workers: 8
      load_workers: 8

sink:
  type: clickhouse
  connection_id: clickhouse_dwh
  connection_type: env
  table:
    schema: analytics
    name: orders
  strategy:
    mode: incremental_merge
    unique_key: [order_id]
    merge_policy: lightweight_delete_insert
  options:
    native_transfer:
      optimizer_profile: high_throughput_safe
    clickhouse_bulk:
      mode: auto
      insert_settings:
        async_insert: 1
        wait_for_async_insert: 1
        max_insert_block_size: 1000000
        input_format_parallel_parsing: 1
    idempotency:
      enabled: true

Fast path selection algorithm

flowchart TD
  Start["Read manifest"] --> Source["Resolve MSSQL export capability"]
  Source --> Export{"bcp queryout available?"}
  Export -->|yes| Bcp["Use bcp queryout"]
  Export -->|no| Pyodbc["Fallback to pyodbc streaming"]
  Bcp --> Sink["Resolve ClickHouse ingest capability"]
  Pyodbc --> Sink
  Sink --> Http{"HTTP bulk configured?"}
  Http -->|yes| ChHttp["INSERT FORMAT TabSeparated over HTTP"]
  Http -->|no| Client{"clickhouse-client available?"}
  Client -->|yes| ChClient["clickhouse-client INSERT FORMAT TabSeparated"]
  Client -->|no| Python["Fallback Python parsing path"]
  ChHttp --> Finalizer["Staging-first finalizer"]
  ChClient --> Finalizer
  Python --> Finalizer
  Finalizer --> Quality["Quality checks and reconciliation"]
  Quality --> State["Advance state only after success"]

Partition planning

partitioning.bounds: auto executes one MSSQL metadata query over the selected source query:

SELECT
  MIN([order_id]) AS dpone_min_value,
  MAX([order_id]) AS dpone_max_value,
  COUNT_BIG(1) AS dpone_row_count
FROM (<source query>) AS dpone_bounds;

Numeric partition columns use half-open ranges: [lower, upper) for all partitions except the final partition, which includes the upper bound. Datetime partition columns use time windows with the same boundary semantics.

If no safe partition column is configured, dpone uses one partition and emits a warning. It does not guess a key, because a wrong partition key can produce duplicates, gaps, or source overload.

ClickHouse insert settings

clickhouse_bulk.insert_settings are passed to the native ClickHouse HTTP/client insert path. If async_insert: 1 is configured and wait_for_async_insert is omitted, dpone sets wait_for_async_insert: 1 for safer batch semantics.

Useful settings for large MSSQL -> ClickHouse transfers:

Setting Typical value Why it helps
async_insert 1 Lets ClickHouse buffer large inserts efficiently.
wait_for_async_insert 1 Keeps dpone correctness simple by waiting for acceptance.
max_insert_block_size 1000000 Reduces per-block overhead for wide batches.
input_format_parallel_parsing 1 Speeds TabSeparated parsing on ClickHouse side.
max_threads target-specific Caps ClickHouse CPU usage when concurrent jobs run.

Optimizer profile

Use native_transfer.optimizer_profile: high_throughput_safe as the first production baseline before hand-tuning individual knobs:

source:
  type: mssql
  options:
    native_transfer:
      optimizer_profile: high_throughput_safe
    bulk:
      mode: bcp

sink:
  type: clickhouse
  options:
    native_transfer:
      optimizer_profile: high_throughput_safe
    clickhouse_bulk:
      mode: http

The profile is intentionally conservative: it improves native throughput while keeping correctness-friendly settings such as waiting for ClickHouse async insert acceptance.

Area Profile default Why
MSSQL bulk.bcp.batch_size 250000 Reduces bcp commit/control overhead for large extracts.
MSSQL bulk.bcp.packet_size 65535 Uses a large network packet for bcp transfer.
MSSQL bulk.bcp.timeout_seconds 3600 Avoids short-running command timeouts on large partitions.
ClickHouse HTTP chunk_size 4194304 Streams larger HTTP chunks without loading the full file into memory.
ClickHouse async_insert 1 Lets ClickHouse buffer native inserts efficiently.
ClickHouse wait_for_async_insert 1 Keeps dpone acceptance semantics simple and safe.
ClickHouse input_format_parallel_parsing 1 Enables parallel TabSeparated parsing where supported.
ClickHouse max_insert_block_size 1000000 Reduces per-block overhead for large batches.

Explicit user settings always win over the profile. For example, this keeps the profile but lowers only the bcp packet size:

source:
  type: mssql
  options:
    native_transfer:
      optimizer_profile: high_throughput_safe
    bulk:
      mode: bcp
      bcp:
        packet_size: 32768

Use explicit overrides after benchmark evidence shows which phase is the bottleneck.

Lossless NULL and empty string handling

MSSQL bcp queryout in character mode can serialize an empty string as a NUL byte. That is unsafe for ClickHouse TabSeparated loads because an empty string, a NUL character and NULL must stay distinguishable.

dpone uses a staging-first, lossless codec for the native MSSQL -> ClickHouse path:

  1. MSSQL source projection renders NULL as ClickHouse \N.
  2. Text empty strings are encoded as the framework sentinel __dpone__tsv__empty.
  3. Real values that start with the sentinel namespace are escaped as __dpone__tsv__prefix....
  4. Raw TSV is loaded into a ClickHouse staging table.
  5. Finalization inserts from staging through a decode projection, so the target table receives the original empty string, original sentinel-like values and real NULL values.
flowchart LR
  M["MSSQL row"] --> P["bcp-safe SELECT projection"]
  P --> F["Raw TSV artifact"]
  F --> S["ClickHouse raw staging"]
  S --> D["Decode projection"]
  D --> T["Final target table"]

The raw staging table is an implementation detail and may contain sentinel strings. User-facing target tables must not contain __dpone__tsv__empty unless that exact value existed in the source and was escaped by the codec.

MSSQL source nullability metadata is preserved through INFORMATION_SCHEMA.COLUMNS.IS_NULLABLE. Nullable source columns are mapped to nullable ClickHouse target columns where needed, so a source NULL decimal remains NULL instead of becoming 0.

Idempotency and resume model

Every partition can be identified by a deterministic SHA-256 transfer partition id built from:

source table + target table + strategy + query hash + schema hash + partition bounds

A partition can be skipped only when a previous checkpoint is committed and both query hash and schema hash still match. Failed export, staging load, finalization or quality checks do not advance source state.

ClickHouse insert_deduplication_token is useful as an additional target-native guard, but dpone does not rely on it as the only correctness mechanism.

Runtime resume in dpone run

dpone run now uses the same partition checkpoint contract as the live certification tooling when a checkpoint store is available. The runtime hook sits between extraction and sink load:

  1. MSSQL source exports partitioned bcp queryout files and attaches stable partition metadata: bounds, query hash, schema hash and transfer partition id.
  2. The native-transfer runtime service compares those partitions with the checkpoint store.
  3. Already committed partitions with matching query/schema hash and matching artifact checksum are skipped.
  4. Retry partitions are loaded through the normal ClickHouse staging/finalizer path.
  5. Checkpoints are moved to committed only after sink.load() succeeds.
  6. Source state is persisted only after checkpoints, finalizer and normal processor lifecycle all succeed.

For full_refresh, dpone is deliberately conservative: a fully committed transfer can be skipped as a no-op, but mixed skip/retry plans still load a complete snapshot because shadow-swap finalization requires all rows. For incremental_append, incremental_merge and partition_replace, retry-only partition loading is allowed because those strategies are staging-first and idempotent by key or partition.

Production state-backed checkpoint stores are enabled through the existing state backend. No separate state system is introduced:

state:
  type: mssql
  connection_id: mssql_dwh
  connection_type: vault
  table:
    schema: etl_state
  partition_checkpoint_table:
    schema: etl_state
    name: dpone_partition_checkpoints

source:
  type: mssql
  options:
    partitioning:
      column: order_id
      bounds: auto
      target_rows_per_partition: 1000000
      export_workers: 8
      load_workers: 8

sink:
  type: clickhouse
  options:
    native_transfer:
      require_artifact_checksum: true
    runtime_evidence:
      output_dir: .dpone/runs/mssql_to_clickhouse

Failure/resume certification

Production certification for this path must prove three failure stages:

Failure stage Expected behavior before retry Expected behavior after retry
after_export Exported partitions are not treated as committed; source state is not advanced. The retry loads/finalizes the exported partitions or re-exports them safely, then commits matching checkpoints.
during_load Partially loaded partitions are not skipped unless their checkpoint is already committed with matching query/schema hashes. The retry produces the expected target count with no duplicate keys.
before_finalizer Finalized-but-uncommitted work is not allowed to advance source state. The retry runs finalization/reconciliation again and commits state only after success.

Certification evidence uses these invariants:

  • state_committed_before_retry must be false;
  • only committed checkpoints with matching query_hash and schema_hash may be skipped;
  • all partitions after retry must be committed;
  • actual_rows_after_retry == expected_rows;
  • duplicate_rows_after_retry == 0;
  • state_committed_after_retry must be true.

The reusable implementation lives in src/dpone/strategy_intelligence/resume_certification.py. It is connector-free and can be reused by future native transfer paths.

Local live fault-injection workflow:

PYTHONUNBUFFERED=1 uv run python tools/mssql_clickhouse_fault_injection.py \
  --rows 10000 \
  --batch-size 5000 \
  --bcp-path /opt/homebrew/bin/bcp \
  --optimizer-profile high_throughput_safe \
  --partition-column id \
  --lower-bound 1 \
  --upper-bound 10000 \
  --num-partitions 4 \
  --export-workers 2 \
  --load-workers 2 \
  --clickhouse-bulk-mode http \
  --clickhouse-http-host 127.0.0.1 \
  --clickhouse-http-port 58123 \
  --output-dir test_artifacts/live_certification/benchmarks/native_fault_injection_latest \
  --json-output test_artifacts/live_certification/benchmarks/native_fault_injection_latest/summary.json

The tool executes real MSSQL export and ClickHouse retry loads against the local Docker stack, then evaluates the controlled checkpoint boundary for after_export, during_load, and before_finalizer. It is deliberately artifact-first: the certification service stays connector-free, while the tool is the local-live adapter.

The tool also writes an append-only partition checkpoint store:

test_artifacts/live_certification/benchmarks/native_fault_injection_latest/partition_checkpoints.jsonl

This JSONL store is the local/CI implementation of the production checkpoint contract. Database-backed state stores should implement the same lifecycle:

planned -> exported -> loaded -> finalized -> committed

Only committed checkpoints with matching query_hash, schema_hash, partition bounds and artifact diagnostics are safe skip candidates. Failed or partial checkpoints are evidence, not permission to advance source state.

For production state backends, the same checkpoint contract is available through SQL-backed adapters:

  • MSSQLCheckpointDialect for MSSQL state tables;
  • PostgresCheckpointDialect for Postgres state tables;
  • SqlPartitionCheckpointStore as the small adapter over an existing SQL connector.

These adapters do not introduce a new state system. They persist the same PartitionCheckpoint payload in an existing state database table and expose the same PartitionCheckpointStore protocol as the local JSONL evidence store.

For every retry, dpone can build a checkpoint-driven resume plan:

{
  "summary": {"skip": 0, "retry": 4},
  "decisions": [
    {
      "action": "retry",
      "reason": "latest_status_loaded",
      "checkpoint_status": "loaded"
    }
  ]
}

skip is allowed only for committed checkpoints with matching hashes. exported, loaded, finalized, missing checkpoints and hash mismatches are retried. This keeps the runtime contract conservative: partial evidence is never treated as committed work.

Artifact checksums can be required as an additional hard guardrail. When enabled, a committed checkpoint without diagnostics.artifact_sha256 is treated as retryable, not skippable. This prevents stale or incomplete file artifacts from being trusted only because the status says committed.

Partition correctness certification

After a retry succeeds, the local-live certification tool verifies data correctness per partition, not only global target count:

{
  "partition_correctness_passed": true,
  "partition_correctness": {
    "partition_count": 4,
    "failed_partition_count": 0,
    "total_source_rows": 10000,
    "total_target_rows": 10000
  }
}

The reusable implementation lives in src/dpone/strategy_intelligence/partition_correctness.py. Concrete adapters provide source/target observations; the service compares row counts and deterministic checksums. For the local MSSQL -> ClickHouse benchmark table the checksum is based on partition row count plus stable numeric sums (id, customer_id). Production adapters can provide stronger hashes over wider payload projections when needed.

Adaptive partitioning v2

Adaptive partitioning is an explainable next-run tuning layer. It does not silently change the current run. Instead, dpone records partition observations and recommends safer partition sizes for the next execution.

source:
  type: mssql
  options:
    partitioning:
      strategy: auto
      column: order_id
      bounds: auto
      target_rows_per_partition: 1000000
      max_partitions: 128
      export_workers: 8
      load_workers: 8
      adaptive:
        enabled: true
        max_skew_ratio: 3.0
        retry_split_factor: 2
        min_rows_per_partition: 1000

Runtime observations can be fed back into planning:

source:
  options:
    partitioning:
      adaptive:
        enabled: true
        observations:
          - partition_id: p0
            row_count: 1000000
            bytes_count: 240000000
            duration_seconds: 12.4
            status: committed
          - partition_id: p1
            row_count: 9000000
            bytes_count: 2160000000
            duration_seconds: 98.1
            status: committed
          - partition_id: p2
            row_count: 1000000
            retry_count: 1
            status: failed

Planning behavior:

Signal Recommendation
No observations Keep configured partition size and collect observability.
row_count / min_nonzero_row_count > max_skew_ratio Mark partition as split_skewed_partition.
status = failed or retry_count > 0 Mark partition as split_retry_partition.
Split required Reduce target_rows_per_partition, bounded by min_rows_per_partition.

This mirrors the useful part of Spark JDBC partition tuning, but keeps dpone stateful and resumable: recommendations are visible in dpone plan, while the actual run still uses deterministic partition bounds, checkpoint hashes, and staging-first finalization.

Reconciliation profiles

Partition correctness is configurable by profile:

Profile Checks Use case
count_only Per-partition row counts Fast smoke or very large partitions where checksum is too expensive.
count_and_checksum Count plus deterministic aggregate checksum Default production balance for benchmark/certification runs.
sample_hash Count, aggregate checksum, and ordered sample hash Catches value corruption with bounded extra cost.
full_partition_hash Count plus full ordered partition hash Strongest correctness proof; use for certification slices or smaller partitions.

Fault-injection certification supports the same profiles:

uv run python tools/mssql_clickhouse_fault_injection.py \
  --rows 10000 \
  --reconciliation-profile sample_hash \
  --output-dir test_artifacts/live_certification/benchmarks/native_fault_injection_latest

Runbook:

  • use count_and_checksum for regular local/live certification;
  • use sample_hash before changing codecs, escaping, bcp options, or ClickHouse insert format settings;
  • use full_partition_hash on bounded certification slices when preparing a release or validating a migration;
  • if a profile fails, inspect the failed partition id, compare the source and target hash fields, then rerun only the failed partition with a smaller target_rows_per_partition.

Runtime report evidence

The same local-live tool writes one runtime report per failure stage:

native_transfer_runtime_mssql_clickhouse_<stage>_<timestamp>.json
native_transfer_runtime_mssql_clickhouse_<stage>_<timestamp>.md

The runtime report schema is dpone.native_transfer.runtime_report.v1 and contains:

  • checkpoint summary;
  • resume skip/retry decisions;
  • load result summary;
  • overall pass/fail flag.

Normal dpone run writes this report when runtime_evidence.output_dir is configured. The live certification harness enriches the same evidence family with partition correctness observations and source/target row checks.

Full dpone run two-pass certification

Use this harness when you need to prove the complete user-facing execution path, not only lower-level source/sink strategy objects:

DPONE_IT_MSSQL_HOST=127.0.0.1 \
DPONE_IT_MSSQL_PORT=51433 \
DPONE_IT_MSSQL_DATABASE=master \
DPONE_IT_MSSQL_USER=sa \
DPONE_IT_MSSQL_PASSWORD='Dp0ne.Strong.Pw.2026!' \
DPONE_IT_MSSQL_TRUST_SERVER_CERTIFICATE=yes \
DPONE_IT_MSSQL_BCP_PATH=/opt/homebrew/bin/bcp \
DPONE_IT_CH_HOST=127.0.0.1 \
DPONE_IT_CH_PORT=59000 \
DPONE_IT_CH_HTTP_PORT=58123 \
DPONE_IT_CH_DATABASE=dpone_it \
DPONE_IT_CH_USER=default \
DPONE_IT_CH_PASSWORD=dpone \
uv run python tools/mssql_clickhouse_run_certification.py \
  --rows 10000 \
  --target-rows-per-partition 2500 \
  --export-workers 4 \
  --load-workers 4 \
  --output-dir test_artifacts/live_certification/benchmarks/native_dpone_run_2026_06_09

The harness deliberately stays thin and uses normal dpone execution:

  • creates a disposable MSSQL source table and ClickHouse target table;
  • writes a normal single-process manifest with connection_type: params;
  • enables state.type: mssql and partition_checkpoint_table;
  • runs the manifest twice through dpone.api.run, the Python equivalent of dpone run;
  • validates that the second run is a no-op skip over already committed SQL-backed partition checkpoints.

Expected artifacts:

Artifact Purpose
mssql_to_clickhouse_native_run_cert.yml Exact generated manifest used by dpone run.
runtime/native_transfer_runtime_dpone-native-transfer-first.json First run runtime report with retry/load summary.
runtime/native_transfer_runtime_dpone-native-transfer-second.json Second run runtime report; expected skip = committed partitions and inserted_rows = 0.
native_transfer_run_certification.json Machine-readable certification summary.
native_transfer_run_certification.md Human-readable evidence for release notes or audit review.

If the second run is not a no-op, check these first:

  • checkpoint table points to the same MSSQL state connection used by the first run;
  • native_transfer.require_artifact_checksum is still enabled;
  • source query, schema, target table, strategy and partition bounds did not change;
  • ClickHouse finalizer completed before state was committed.

Example evidence shape:

{
  "schema_version": "dpone.native_transfer.resume_evidence.v1",
  "passed": true,
  "safe_to_resume": true,
  "skipped_partition_count": 1,
  "retry_partition_count": 3,
  "checks": [
    {"name": "state_not_committed_before_retry", "passed": true},
    {"name": "all_after_retry_partitions_committed", "passed": true},
    {"name": "target_count_matches_expected", "passed": true},
    {"name": "no_duplicate_rows_after_retry", "passed": true},
    {"name": "state_committed_after_success", "passed": true}
  ]
}

Runbook when certification fails:

  1. If state_not_committed_before_retry is red, stop the pipeline and inspect the state backend before rerunning.
  2. If all_after_retry_partitions_committed is red, inspect checkpoint hashes and do not skip non-committed partitions.
  3. If target_count_matches_expected is red, compare staging and final target counts before advancing state.
  4. If no_duplicate_rows_after_retry is red, verify unique_key, finalizer policy and ClickHouse staging cleanup.
  5. If state_committed_after_success is red, treat the run as incomplete even if target counts look correct.

Finalizers

Strategy ClickHouse finalizer
full_refresh staging/shadow swap
incremental_append staging-first append
incremental_merge lightweight_delete_insert by default
partition_replace ALTER TABLE target REPLACE PARTITION ... FROM staging
cdc_apply CDC events applied through staging-aware ClickHouse finalization

Tuning runbook

Use this order when optimizing a slow MSSQL -> ClickHouse run:

  1. Run dpone plan --explain-strategy --format text <manifest> and confirm native_fast_path is mssql_bcp_queryout_to_clickhouse_direct_tsv.
  2. Enable native_transfer.optimizer_profile: high_throughput_safe on the source and sink.
  3. Configure partitioning.column on a numeric or datetime column with good distribution.
  4. Start with target_rows_per_partition: 1000000, max_partitions: 64, export_workers: 4..8.
  5. Set load_workers to the same value as export workers only if ClickHouse has enough CPU and IO headroom.
  6. Prefer HTTP bulk when ClickHouse HTTP endpoint is reachable from the runner; otherwise configure clickhouse-client.
  7. Keep wait_for_async_insert: 1 until benchmark evidence proves a looser setting is safe for your SLA.
  8. Watch run artifacts for export time, ingest time, finalization time, rows/sec, MB/sec and ClickHouse parts count.
  9. Override individual profile values only after comparing tuning matrix artifacts from the same runner.

Troubleshooting

Symptom Likely cause Action
partitioning.bounds=auto requires a bounds resolver Generic planner used without MSSQL runtime resolver Run through normal dpone source strategy, not standalone partitioner calls.
Very small files Too many partitions or low target_rows_per_partition Increase target_rows_per_partition or lower max_partitions.
MSSQL CPU pressure Too many concurrent bcp queryout workers Lower export_workers, use off-peak windows, or choose a better partition key.
ClickHouse many small parts Load workers too high or blocks too small Increase block size and reduce load_workers.
Duplicate rows after retry Target finalizer or checkpoint contract was bypassed Use staging-first strategies and do not manually insert artifacts into target.
Empty string/NULL confusion Unsafe delimited format settings Use dpone bulk text codec defaults and do not override NULL handling without tests.

Industrial comparison

System Pattern dpone behavior
Spark JDBC partitionColumn, bounds and partition count Same split model, plus bcp queryout, durable files and ClickHouse-native ingest.
Pentaho Partitioned transformations GitOps YAML and deterministic retry boundaries instead of visual-only jobs.
Informatica Bulk source partitioning and native loaders Native extract/load with explicit governance and run artifacts.
dlt Incremental state and resumable packages Partition checkpoint model and deterministic reruns.
Airbyte State after destination acceptance Source state advances only after ClickHouse finalizer and quality checks.
Fivetran Durable connector checkpoints Per-partition evidence, committed status and hash matching.