MSSQL -> ClickHouse¶
This guide is a copy/paste-ready starting point for loading data from MSSQL into ClickHouse with dpone.
When to use this path¶
Use this path when MSSQL is the system of record or ingestion boundary and ClickHouse is the landing, warehouse, event-log, or downstream replication target.
Copy/paste manifest¶
name: mssql_to_clickhouse_example
source:
type: mssql
connection_id: mssql_dwh
connection_type: vault
table:
schema: dbo
name: orders
options:
incremental_column: updated_at
batch_size: 50000
export_mode: bcp
sink:
type: clickhouse
connection_id: clickhouse_analytics
connection_type: vault
table:
database: analytics
name: orders
strategy:
mode: incremental_merge
unique_key: order_id
merge_policy: lightweight_delete_insert
duplicate_policy: fail
options:
insert_mode: native
batch_size: 100000
staging_engine: MergeTree
state:
type: postgres
connection_id: postgres_state
table:
schema: etl_state
name: dpone_state
quality:
mode: fail
checks:
- type: min_rows
threshold: 1
- type: source_target_count
tolerance_pct: 0.1
observability:
artifacts:
enabled: true
path: .dpone/runs/mssql_to_clickhouse
Run it locally:
dpone plan examples/mssql_to_clickhouse.yaml --format md
dpone batch run examples/mssql_to_clickhouse.yaml
Supported load strategies¶
| Strategy | Status | Notes |
|---|---|---|
full_refresh |
Supported | Uses staging first, then applies the target-specific finalization plan. |
incremental_append |
Supported | Uses staging first, then applies the target-specific finalization plan. |
incremental_merge |
Supported | Default merge_policy: lightweight_delete_insert; shadow_swap supported; mutation_delete_insert is explicit opt-in and non-recommended. |
replace |
Supported | Uses staging first, then applies the target-specific finalization plan. |
partition_replace |
Supported | Replaces target partitions represented by staging partition.column; see Load strategies for native/fallback paths. |
snapshot_reconciliation |
Supported | Uses staging first, then applies the target-specific finalization plan. |
cdc |
Source-specific | Uses typed CDC offsets and advances state only after sink success. |
See Load strategies for the detailed algorithm for each strategy.
Runtime algorithm¶
flowchart TD
A["Resolve manifest and registry entries"] --> B["Create MSSQL source"]
B --> C["Plan bounded extract"]
C --> D["Read through BCP queryout or pyodbc streaming cursor"]
D --> E["Emit ExtractResult with schema and artifact"]
E --> F["Plan schema evolution"]
F --> G["Create ClickHouse staging or event batch"]
G --> H["Load through staging/shadow table insert with atomic swap or append finalization"]
H --> I["Apply finalization strategy"]
I --> J["Run quality and reconciliation checks"]
J --> K["Advance state only after success"]
Strategy behavior¶
full_refresh: extract the selected source boundary, load into staging, and replace the target according to the target's safe finalization path.incremental_append: extract only the incremental boundary and append rows through staging or event production.incremental_merge: load into staging, validate duplicates, then uselightweight_delete_insertby default;shadow_swapand guardedmutation_delete_insertare explicit policies.replace: reload a bounded predicate window through staging and then atomically replace the matching target slice.snapshot_reconciliation: compare the latest source snapshot with the target key set and apply configured physical-delete or soft-delete behavior through staging-first plans.partition_replace: extract a complete partition slice, load it into staging, and replace only partitions represented bypartition.column.
Schema evolution and type mapping¶
Schema evolution is enabled by default and runs before the staging/final load path:
- Read source schema from
ExtractResult.schema. - Introspect the ClickHouse target schema.
- Apply safe additions and widening operations.
- Fail breaking changes by default.
- If configured, route incompatible type changes to
__dpone__nc__<column>.
Use Schema evolution and Type mapping matrix when adding columns or changing source types.
Exact type preservation¶
MSSQL -> ClickHouse uses a lossless-first type profile for native transfers. The goal is to avoid the classic analytical-load failure mode where exact MSSQL values arrive but are stored as approximate ClickHouse values.
| MSSQL type | Default ClickHouse type | Fidelity |
|---|---|---|
decimal(p,s), numeric(p,s) |
Decimal(p,s) |
Exact precision and scale, capped to ClickHouse decimal limits. |
money |
Decimal(19,4) |
Exact fixed scale. |
smallmoney |
Decimal(10,4) |
Exact fixed scale. |
bigint, int, smallint, tinyint |
Int64, Int32, Int16, UInt8 |
Exact integer range mapping. |
bit |
Bool |
Exact boolean mapping. |
uniqueidentifier |
UUID |
Exact UUID mapping. |
datetime2(p) |
DateTime64(p) |
Preserves configured fractional precision. |
datetimeoffset(p) |
configurable | Default utc_instant lands DateTime64(p, 'UTC'); fixed_timezone, preserve_offset and preserve_text are available through type_fidelity.temporal.offset_timestamp. |
date |
Date |
Exact date mapping. |
time(p) |
String |
Conservative text landing because ClickHouse has no standalone time type. |
binary, varbinary, rowversion |
String |
Requires explicit hex/base64 codec policy for byte-readable contracts. |
nvarchar, varchar, xml, deprecated text types |
String |
Text-compatible landing. |
dpone plan includes a type_fidelity section when MSSQL source schema metadata is available. Review that section before enabling a new table in production.
Binary and time codec policy¶
Use source.options.type_fidelity when source tables contain byte or standalone
time columns and you need the transfer to be certifiably lossless:
source:
type: mssql
options:
type_fidelity:
binary_encoding: hex
time_encoding: seconds_since_midnight
Offset timestamp examples:
source:
options:
type_fidelity:
temporal:
offset_timestamp:
mode: fixed_timezone
timezone: Europe/Moscow
preserve_offset stores the UTC instant in the original target column and adds __dpone__tz_offset_minutes__<column> as Int16. MSSQL datetimeoffset stores an offset such as +03:00, not an IANA timezone name such as Europe/Moscow; use fixed_timezone for business-view conversion and preserve_offset when the source offset must survive.
Policy values:
| Option | Values | Default | Effect |
|---|---|---|---|
binary_encoding |
none, hex, base64 |
none |
hex exports MSSQL binary/varbinary/rowversion as deterministic hexadecimal text for ClickHouse String columns. none keeps binary columns non-lossless in the plan. |
time_encoding |
string, seconds_since_midnight |
string |
string lands MSSQL time(p) as text. seconds_since_midnight exports it as UInt32 seconds for faster numeric filtering. |
temporal.offset_timestamp.mode |
utc_instant, fixed_timezone, preserve_offset, preserve_text |
utc_instant |
Controls MSSQL datetimeoffset landing semantics. |
temporal.offset_timestamp.timezone |
IANA timezone, SQL Server timezone alias, or fixed offset | UTC |
Used by fixed_timezone; use Europe/Moscow for Moscow business time. |
temporal.offset_timestamp.malformed |
fail, preserve_text |
fail |
Stops on malformed offset timestamps by default; use preserve_text only for raw landing/audit tables. |
temporal.offset_timestamp.columns |
object keyed by column name | {} |
Per-column overrides for tables where only some timestamp columns need preserve_offset, fixed_timezone, or preserve_text. |
Run dpone plan --format json and check:
type_fidelity.policymatches the manifest.- binary columns are
lossless: trueonly when an explicit byte encoding is configured. time(p)becomesUInt32only withtime_encoding: seconds_since_midnight.
If a table needs sub-second standalone time precision, keep time_encoding:
string or define an explicit target type in Physical design.
For per-column temporal policies and malformed-value handling, see
Temporal fidelity.
Typed reconciliation¶
For release certification and high-risk schema changes, use typed_hash in the local certification tool:
python tools/mssql_clickhouse_fault_injection.py \
--rows 10000 \
--reconciliation-profile typed_hash \
--binary-encoding hex \
--time-encoding seconds_since_midnight \
--clickhouse-bulk-mode http
typed_hash compares rows through canonical typed serialization using the
source schema. It treats Decimal('1.14'), ClickHouse integer JSON
representations for whole decimals, benign floating JSON renderings such as
1.1400000000000001, hex-encoded bytes and configured standalone time values
as the same typed value when the source contract says they are equivalent. It
still preserves NULL vs empty string, integer values, timestamps and text
boundaries.
Use profiles as follows:
| Profile | Use case |
|---|---|
count_only |
Smoke checks where only row presence matters. |
count_and_checksum |
Fast default for regular local certification. |
sample_hash |
Codec/escaping changes where representative row content must match. |
typed_hash |
Production type-fidelity certification for MSSQL -> ClickHouse. |
full_partition_hash |
Bounded release slices where full partition hashing cost is acceptable. |
Runbook:
- If
typed_hashfails but counts pass, inspect the failed partition's source and target typed hashes in the artifact. - Check
dpone plantype_fidelity.columnsfor approximate or non-lossless decisions. - For decimals, ensure the target table uses
Decimal(p,s), notFloat64. - For binary/time columns, choose an explicit physical design or codec policy before certifying as lossless.
- Re-run
typed_hashafter DDL or codec changes.
Wide type-fidelity certification¶
Use the wide-type harness before changing MSSQL -> ClickHouse codecs, partitioning, DDL type mapping, or native bulk settings. It creates a disposable MSSQL source table with at least 120 columns and exercises the same runtime path as production:
wide MSSQL source -> bcp queryout partition files -> ClickHouse HTTP TSV load -> typed reconciliation evidence
The fixture intentionally includes:
- exact numerics:
decimal,numeric,money,smallmoney; - identifiers and timestamps:
uniqueidentifier,datetime2,datetimeoffset,date,time; - bytes:
binary,varbinary,rowversion; - text edge cases: Unicode,
nvarchar(max), empty string,NULL; - sparse generated columns across integers, decimals, text, dates, times, binary and booleans.
Smoke certification:
DPONE_IT_MSSQL_HOST=127.0.0.1 \
DPONE_IT_MSSQL_PORT=51433 \
DPONE_IT_MSSQL_DATABASE=dpone \
DPONE_IT_MSSQL_USER=sa \
DPONE_IT_MSSQL_PASSWORD='Dp0ne.Strong.Pw.2026!' \
DPONE_IT_MSSQL_BCP_PATH=/opt/homebrew/bin/bcp \
DPONE_IT_CH_HOST=127.0.0.1 \
DPONE_IT_CH_PORT=59000 \
DPONE_IT_CH_HTTP_PORT=58123 \
DPONE_IT_CH_DATABASE=dpone_it \
DPONE_IT_CH_USER=default \
DPONE_IT_CH_PASSWORD=dpone \
uv run python tools/mssql_clickhouse_wide_type_certification.py \
--rows 10000 \
--column-count 120 \
--typed-hash-rows 10000 \
--prepare-chunk-size 10000 \
--target-rows-per-partition 2500 \
--export-workers 2 \
--load-workers 2 \
--bcp-packet-size 32767 \
--output-dir test_artifacts/live_certification/benchmarks/wide_type_10k_2026_06_09
Latest local evidence captured on 2026-06-09:
| Profile | Rows | Columns | Result | Evidence |
|---|---|---|---|---|
wide_type_10k_2026_06_09 |
10,000 | 120 | passed: count, duplicate and typed hash | test_artifacts/live_certification/benchmarks/wide_type_10k_2026_06_09/mssql_clickhouse_wide_type_certification.json |
wide_type_1m_2026_06_09 |
1,000,000 | 120 | transfer evidence: export completed and target load produced 1,000,000 rows with no duplicates; source-side reconciliation connection failed after load | test_artifacts/live_certification/benchmarks/wide_type_1m_2026_06_09/mssql_clickhouse_wide_type_certification.json |
wide_type_1m_reconciliation_2026_06_09 |
1,000,000 | 120 | passed: source count, target count, duplicate and typed hash sample after MSSQL recovered | test_artifacts/live_certification/benchmarks/wide_type_1m_reconciliation_2026_06_09/mssql_clickhouse_wide_type_certification.json |
wide_type_1m_repeat_real_2026_06_09 |
1,000,000 | 120 | passed repeat: source count, target count, duplicate check and typed hash; confirms export is the bottleneck | test_artifacts/live_certification/benchmarks/wide_type_1m_repeat_real_2026_06_09/mssql_clickhouse_wide_type_certification.json |
The 10k evidence recorded source_count=10000, target_count=10000,
duplicate_count=0, typed_hash_passed=true, artifact_bytes=16564348,
prepare_seconds=1.404, export_seconds=6.851, load_seconds=0.411 and
rows_per_second=735.15.
The 1M conservative profile used a previously prepared MSSQL source table,
target_rows_per_partition=100000, export_workers=2 and load_workers=1.
It exported 1000000 rows to ten partition files, produced
artifact_bytes=1729328238, loaded target_count=1000000 rows into
ClickHouse with duplicate_count=0, and then hit a local MSSQL connection
failure during immediate source-side reconciliation. After MSSQL recovered,
wide_type_1m_reconciliation_2026_06_09 verified source_count=1000000,
target_count=1000000, duplicate_count=0 and typed_hash_passed=true for
the first 10000 ordered rows.
The repeat 1M run used the same prepared MSSQL source, explicit MSSQL and
ClickHouse connection parameters, target_rows_per_partition=100000,
export_workers=2 and load_workers=1. It completed with
source_count=1000000, target_count=1000000, duplicate_count=0 and
typed_hash_passed=true. It reproduced the slow export phase:
export_seconds=695.105 versus the earlier 681.932, while ClickHouse load
took only 19.415 seconds. Live probes showed MSSQL around 198-204% CPU
during export and ClickHouse mostly idle; during load, ClickHouse rose above
200% CPU and MSSQL dropped to idle. The conclusion is that this local
profile is dominated by MSSQL-side bcp queryout projection/encoding for the
120-column type-fidelity fixture, not by ClickHouse ingest. See
test_artifacts/live_certification/benchmarks/wide_type_1m_repeat_real_2026_06_09/mssql_clickhouse_export_diagnosis.md.
Scale evidence profiles:
uv run python tools/mssql_clickhouse_wide_type_certification.py \
--rows 1000000 \
--column-count 120 \
--typed-hash-rows 10000 \
--prepare-chunk-size 100000 \
--target-rows-per-partition 100000 \
--export-workers 2 \
--load-workers 1 \
--bcp-packet-size 32767 \
--output-dir test_artifacts/live_certification/benchmarks/wide_type_1m_2026_06_09
uv run python tools/mssql_clickhouse_wide_type_certification.py \
--rows 10000000 \
--column-count 120 \
--typed-hash-rows 10000 \
--prepare-chunk-size 100000 \
--target-rows-per-partition 1000000 \
--export-workers 8 \
--load-workers 8 \
--bcp-packet-size 32767 \
--output-dir test_artifacts/live_certification/benchmarks/wide_type_10m_2026_06_09
For 1M and 10M runs, the full row count and duplicate checks cover the complete
dataset, while typed_hash intentionally samples the first --typed-hash-rows
ordered rows to keep certification cost bounded. Run a full typed hash only on
smaller bounded slices.
The harness emits MSSQL_WIDE_PREPARE_SOURCE_*,
MSSQL_PARTITIONED_BCP_QUERYOUT_* and final evidence events. If a local Docker
SQL Server becomes unresponsive during 1M/10M fixture preparation, restart the
local integration stack and retry with a smaller --prepare-chunk-size before
changing transfer settings. Treat fixture preparation time separately from
native transfer time: evidence includes prepare_seconds, export_seconds and
load_seconds.
Do not treat the 10M profile as an interactive developer check on a laptop
Docker stack. It is a manual/nightly certification profile that needs dedicated
disk, memory and stable SQL Server/ClickHouse containers. On local Docker,
promote from 10k -> 1M conservative -> 10M only when the previous profile
finishes with passed=true.
Important correctness guardrails covered by the harness:
- partition bounds are computed from the original source rowset, not from the encoded bcp projection view;
datetimeoffsetis normalized through an ODBC-safe UTC projection for typed hash verification;- SQL Server
timestamp/rowversionis treated as binary, never astime; - ClickHouse staging decode is type-policy aware, so
time(7)encoded asseconds_since_midnightis not passed through text sentinel decoding; binary,varbinaryandrowversionare exported as deterministic hex text before ClickHouse TSV ingestion.
Expected artifacts:
| Artifact | Purpose |
|---|---|
mssql_clickhouse_wide_type_certification.json |
Machine-readable pass/fail, counts, duplicate count, typed hashes and throughput. |
mssql_clickhouse_wide_type_certification.md |
Human-readable release evidence. |
Runbook when wide certification fails:
- If counts differ, inspect the source export partitions before looking at type mapping.
- If duplicate count is non-zero, check
partitioning.column, finalizer policy and target cleanup. - If
typed_hashdiffers, compare the source and target hash fields and then inspect the first failing ordered rows. - If binary columns differ, confirm
source.options.type_fidelity.binary_encoding: hexis active. - If
time(p)differs, confirm whether the contract expects text fidelity orseconds_since_midnight. - If performance regresses, compare
export_seconds,load_seconds,artifact_bytesandrows_per_secondbetween artifacts. - If
bcpfails withI/O error while writing BCP data-file, reduceexport_workers, reducetarget_rows_per_partition, setpartition_tmp_dirto a disk with enough free space, and rerun with--skip-source-prepareif the source fixture already exists. - If ClickHouse returns
BrokenPipeor the native port becomes unavailable, reduceload_workers, keepwait_for_async_insert=1, and inspect ClickHouse container memory/disk pressure before increasing parallelism. - If source-side reconciliation fails after a successful load, rerun
reconciliation with
--skip-source-prepare --skip-transferafter MSSQL is healthy again; this avoids repeating the large transfer.
Runbook¶
- Start with
dpone doctor --profile localand fix missing extras or native clients. - Run
dpone plan <manifest> --format mdand review source boundary, staging path, schema evolution, state, and quality checks. - Run a small bounded window first.
- Inspect the run artifact under
.dpone/runs/mssql_to_clickhouse. - For incremental jobs, verify state before enabling a schedule.
- For delete-aware jobs, run reconciliation in report-only mode before enabling physical deletes.
- Promote the manifest through GitOps after the plan and artifact are reviewed.
Cross-links¶
- Source -> sink matrix
- Load strategies
- Schema evolution
- Type mapping matrix
- Reconciliation and CDC
- Performance guide
Type contracts and physical design¶
This flow supports the shared dpone type-governance stack:
- Type inference for source metadata, sampled profiling, confidence, and empty string vs
NULLbehavior. - Schema contracts for explicit logical column types, enforcement modes, and
__dpone__nc__*variant columns. - Physical design for target-specific DDL such as concrete SQL types, indexes, partitioning, compression, ClickHouse
LowCardinality, and BigQuery clustering.
Use dpone schema infer --manifest ... and dpone schema physical-plan --manifest ... before enabling new table DDL in production.
Production native transfer fast path¶
MSSQL -> ClickHouse has a first-class native transfer path for large tables:
MSSQL SELECT partitions -> bcp queryout TSV files -> ClickHouse staging INSERT FORMAT TabSeparated -> target finalizer -> quality/state commit
This path keeps the normal dpone execution flow. It does not introduce a separate source/sink hierarchy. The reusable planning layer explains the transfer, while the existing MSSQL source, file artifacts, ClickHouse sink, lineage/state, quality checks and run artifacts still own execution.
Copy-paste manifest¶
source:
type: mssql
connection_id: mssql_oltp
connection_type: env
table:
schema: dbo
name: orders
options:
native_transfer:
optimizer_profile: high_throughput_safe
extract_mode: bcp_queryout
mssql_export_mode: bcp
read_consistency: read_committed_snapshot_if_enabled_else_read_committed
type_fidelity:
binary_encoding: hex
time_encoding: seconds_since_midnight
bulk:
mode: bcp
bcp:
batch_size: 100000
packet_size: 65535
timeout_seconds: 3600
partitioning:
strategy: auto
column: order_id
bounds: auto
target_rows_per_partition: 1000000
max_partitions: 64
export_workers: 8
load_workers: 8
sink:
type: clickhouse
connection_id: clickhouse_dwh
connection_type: env
table:
schema: analytics
name: orders
strategy:
mode: incremental_merge
unique_key: [order_id]
merge_policy: lightweight_delete_insert
options:
native_transfer:
optimizer_profile: high_throughput_safe
clickhouse_bulk:
mode: auto
insert_settings:
async_insert: 1
wait_for_async_insert: 1
max_insert_block_size: 1000000
input_format_parallel_parsing: 1
idempotency:
enabled: true
Fast path selection algorithm¶
flowchart TD
Start["Read manifest"] --> Source["Resolve MSSQL export capability"]
Source --> Export{"bcp queryout available?"}
Export -->|yes| Bcp["Use bcp queryout"]
Export -->|no| Pyodbc["Fallback to pyodbc streaming"]
Bcp --> Sink["Resolve ClickHouse ingest capability"]
Pyodbc --> Sink
Sink --> Http{"HTTP bulk configured?"}
Http -->|yes| ChHttp["INSERT FORMAT TabSeparated over HTTP"]
Http -->|no| Client{"clickhouse-client available?"}
Client -->|yes| ChClient["clickhouse-client INSERT FORMAT TabSeparated"]
Client -->|no| Python["Fallback Python parsing path"]
ChHttp --> Finalizer["Staging-first finalizer"]
ChClient --> Finalizer
Python --> Finalizer
Finalizer --> Quality["Quality checks and reconciliation"]
Quality --> State["Advance state only after success"]
Partition planning¶
partitioning.bounds: auto executes one MSSQL metadata query over the selected source query:
SELECT
MIN([order_id]) AS dpone_min_value,
MAX([order_id]) AS dpone_max_value,
COUNT_BIG(1) AS dpone_row_count
FROM (<source query>) AS dpone_bounds;
Numeric partition columns use half-open ranges: [lower, upper) for all partitions except the final partition, which includes the upper bound. Datetime partition columns use time windows with the same boundary semantics.
If no safe partition column is configured, dpone uses one partition and emits a warning. It does not guess a key, because a wrong partition key can produce duplicates, gaps, or source overload.
ClickHouse insert settings¶
clickhouse_bulk.insert_settings are passed to the native ClickHouse HTTP/client insert path. If async_insert: 1 is configured and wait_for_async_insert is omitted, dpone sets wait_for_async_insert: 1 for safer batch semantics.
Useful settings for large MSSQL -> ClickHouse transfers:
| Setting | Typical value | Why it helps |
|---|---|---|
async_insert |
1 |
Lets ClickHouse buffer large inserts efficiently. |
wait_for_async_insert |
1 |
Keeps dpone correctness simple by waiting for acceptance. |
max_insert_block_size |
1000000 |
Reduces per-block overhead for wide batches. |
input_format_parallel_parsing |
1 |
Speeds TabSeparated parsing on ClickHouse side. |
max_threads |
target-specific | Caps ClickHouse CPU usage when concurrent jobs run. |
Optimizer profile¶
Use native_transfer.optimizer_profile: high_throughput_safe as the first
production baseline before hand-tuning individual knobs:
source:
type: mssql
options:
native_transfer:
optimizer_profile: high_throughput_safe
bulk:
mode: bcp
sink:
type: clickhouse
options:
native_transfer:
optimizer_profile: high_throughput_safe
clickhouse_bulk:
mode: http
The profile is intentionally conservative: it improves native throughput while keeping correctness-friendly settings such as waiting for ClickHouse async insert acceptance.
| Area | Profile default | Why |
|---|---|---|
MSSQL bulk.bcp.batch_size |
250000 |
Reduces bcp commit/control overhead for large extracts. |
MSSQL bulk.bcp.packet_size |
65535 |
Uses a large network packet for bcp transfer. |
MSSQL bulk.bcp.timeout_seconds |
3600 |
Avoids short-running command timeouts on large partitions. |
ClickHouse HTTP chunk_size |
4194304 |
Streams larger HTTP chunks without loading the full file into memory. |
ClickHouse async_insert |
1 |
Lets ClickHouse buffer native inserts efficiently. |
ClickHouse wait_for_async_insert |
1 |
Keeps dpone acceptance semantics simple and safe. |
ClickHouse input_format_parallel_parsing |
1 |
Enables parallel TabSeparated parsing where supported. |
ClickHouse max_insert_block_size |
1000000 |
Reduces per-block overhead for large batches. |
Explicit user settings always win over the profile. For example, this keeps the profile but lowers only the bcp packet size:
source:
type: mssql
options:
native_transfer:
optimizer_profile: high_throughput_safe
bulk:
mode: bcp
bcp:
packet_size: 32768
Use explicit overrides after benchmark evidence shows which phase is the bottleneck.
Lossless NULL and empty string handling¶
MSSQL bcp queryout in character mode can serialize an empty string as a NUL byte. That is unsafe for ClickHouse TabSeparated loads because an empty string, a NUL character and NULL must stay distinguishable.
dpone uses a staging-first, lossless codec for the native MSSQL -> ClickHouse path:
- MSSQL source projection renders
NULLas ClickHouse\N. - Text empty strings are encoded as the framework sentinel
__dpone__tsv__empty. - Real values that start with the sentinel namespace are escaped as
__dpone__tsv__prefix.... - Raw TSV is loaded into a ClickHouse staging table.
- Finalization inserts from staging through a decode projection, so the target table receives the original empty string, original sentinel-like values and real
NULLvalues.
flowchart LR
M["MSSQL row"] --> P["bcp-safe SELECT projection"]
P --> F["Raw TSV artifact"]
F --> S["ClickHouse raw staging"]
S --> D["Decode projection"]
D --> T["Final target table"]
The raw staging table is an implementation detail and may contain sentinel strings. User-facing target tables must not contain __dpone__tsv__empty unless that exact value existed in the source and was escaped by the codec.
MSSQL source nullability metadata is preserved through INFORMATION_SCHEMA.COLUMNS.IS_NULLABLE. Nullable source columns are mapped to nullable ClickHouse target columns where needed, so a source NULL decimal remains NULL instead of becoming 0.
Idempotency and resume model¶
Every partition can be identified by a deterministic SHA-256 transfer partition id built from:
A partition can be skipped only when a previous checkpoint is committed and both query hash and schema hash still match. Failed export, staging load, finalization or quality checks do not advance source state.
ClickHouse insert_deduplication_token is useful as an additional target-native guard, but dpone does not rely on it as the only correctness mechanism.
Runtime resume in dpone run¶
dpone run now uses the same partition checkpoint contract as the live
certification tooling when a checkpoint store is available. The runtime hook
sits between extraction and sink load:
- MSSQL source exports partitioned
bcp queryoutfiles and attaches stable partition metadata: bounds, query hash, schema hash and transfer partition id. - The native-transfer runtime service compares those partitions with the checkpoint store.
- Already
committedpartitions with matching query/schema hash and matching artifact checksum are skipped. - Retry partitions are loaded through the normal ClickHouse staging/finalizer path.
- Checkpoints are moved to
committedonly aftersink.load()succeeds. - Source state is persisted only after checkpoints, finalizer and normal processor lifecycle all succeed.
For full_refresh, dpone is deliberately conservative: a fully committed
transfer can be skipped as a no-op, but mixed skip/retry plans still load a
complete snapshot because shadow-swap finalization requires all rows. For
incremental_append, incremental_merge and partition_replace, retry-only
partition loading is allowed because those strategies are staging-first and
idempotent by key or partition.
Production state-backed checkpoint stores are enabled through the existing state backend. No separate state system is introduced:
state:
type: mssql
connection_id: mssql_dwh
connection_type: vault
table:
schema: etl_state
partition_checkpoint_table:
schema: etl_state
name: dpone_partition_checkpoints
source:
type: mssql
options:
partitioning:
column: order_id
bounds: auto
target_rows_per_partition: 1000000
export_workers: 8
load_workers: 8
sink:
type: clickhouse
options:
native_transfer:
require_artifact_checksum: true
runtime_evidence:
output_dir: .dpone/runs/mssql_to_clickhouse
Failure/resume certification¶
Production certification for this path must prove three failure stages:
| Failure stage | Expected behavior before retry | Expected behavior after retry |
|---|---|---|
after_export |
Exported partitions are not treated as committed; source state is not advanced. | The retry loads/finalizes the exported partitions or re-exports them safely, then commits matching checkpoints. |
during_load |
Partially loaded partitions are not skipped unless their checkpoint is already committed with matching query/schema hashes. |
The retry produces the expected target count with no duplicate keys. |
before_finalizer |
Finalized-but-uncommitted work is not allowed to advance source state. | The retry runs finalization/reconciliation again and commits state only after success. |
Certification evidence uses these invariants:
state_committed_before_retrymust befalse;- only
committedcheckpoints with matchingquery_hashandschema_hashmay be skipped; - all partitions after retry must be
committed; actual_rows_after_retry == expected_rows;duplicate_rows_after_retry == 0;state_committed_after_retrymust betrue.
The reusable implementation lives in
src/dpone/strategy_intelligence/resume_certification.py.
It is connector-free and can be reused by future native transfer paths.
Local live fault-injection workflow:
PYTHONUNBUFFERED=1 uv run python tools/mssql_clickhouse_fault_injection.py \
--rows 10000 \
--batch-size 5000 \
--bcp-path /opt/homebrew/bin/bcp \
--optimizer-profile high_throughput_safe \
--partition-column id \
--lower-bound 1 \
--upper-bound 10000 \
--num-partitions 4 \
--export-workers 2 \
--load-workers 2 \
--clickhouse-bulk-mode http \
--clickhouse-http-host 127.0.0.1 \
--clickhouse-http-port 58123 \
--output-dir test_artifacts/live_certification/benchmarks/native_fault_injection_latest \
--json-output test_artifacts/live_certification/benchmarks/native_fault_injection_latest/summary.json
The tool executes real MSSQL export and ClickHouse retry loads against the
local Docker stack, then evaluates the controlled checkpoint boundary for
after_export, during_load, and before_finalizer. It is deliberately
artifact-first: the certification service stays connector-free, while the tool
is the local-live adapter.
The tool also writes an append-only partition checkpoint store:
test_artifacts/live_certification/benchmarks/native_fault_injection_latest/partition_checkpoints.jsonl
This JSONL store is the local/CI implementation of the production checkpoint contract. Database-backed state stores should implement the same lifecycle:
Only committed checkpoints with matching query_hash, schema_hash,
partition bounds and artifact diagnostics are safe skip candidates. Failed or
partial checkpoints are evidence, not permission to advance source state.
For production state backends, the same checkpoint contract is available through SQL-backed adapters:
MSSQLCheckpointDialectfor MSSQL state tables;PostgresCheckpointDialectfor Postgres state tables;SqlPartitionCheckpointStoreas the small adapter over an existing SQL connector.
These adapters do not introduce a new state system. They persist the same
PartitionCheckpoint payload in an existing state database table and expose the
same PartitionCheckpointStore protocol as the local JSONL evidence store.
For every retry, dpone can build a checkpoint-driven resume plan:
{
"summary": {"skip": 0, "retry": 4},
"decisions": [
{
"action": "retry",
"reason": "latest_status_loaded",
"checkpoint_status": "loaded"
}
]
}
skip is allowed only for committed checkpoints with matching hashes.
exported, loaded, finalized, missing checkpoints and hash mismatches are
retried. This keeps the runtime contract conservative: partial evidence is
never treated as committed work.
Artifact checksums can be required as an additional hard guardrail. When
enabled, a committed checkpoint without diagnostics.artifact_sha256 is treated
as retryable, not skippable. This prevents stale or incomplete file artifacts
from being trusted only because the status says committed.
Partition correctness certification¶
After a retry succeeds, the local-live certification tool verifies data correctness per partition, not only global target count:
{
"partition_correctness_passed": true,
"partition_correctness": {
"partition_count": 4,
"failed_partition_count": 0,
"total_source_rows": 10000,
"total_target_rows": 10000
}
}
The reusable implementation lives in
src/dpone/strategy_intelligence/partition_correctness.py.
Concrete adapters provide source/target observations; the service compares
row counts and deterministic checksums. For the local MSSQL -> ClickHouse
benchmark table the checksum is based on partition row count plus stable
numeric sums (id, customer_id). Production adapters can provide stronger
hashes over wider payload projections when needed.
Adaptive partitioning v2¶
Adaptive partitioning is an explainable next-run tuning layer. It does not silently change the current run. Instead, dpone records partition observations and recommends safer partition sizes for the next execution.
source:
type: mssql
options:
partitioning:
strategy: auto
column: order_id
bounds: auto
target_rows_per_partition: 1000000
max_partitions: 128
export_workers: 8
load_workers: 8
adaptive:
enabled: true
max_skew_ratio: 3.0
retry_split_factor: 2
min_rows_per_partition: 1000
Runtime observations can be fed back into planning:
source:
options:
partitioning:
adaptive:
enabled: true
observations:
- partition_id: p0
row_count: 1000000
bytes_count: 240000000
duration_seconds: 12.4
status: committed
- partition_id: p1
row_count: 9000000
bytes_count: 2160000000
duration_seconds: 98.1
status: committed
- partition_id: p2
row_count: 1000000
retry_count: 1
status: failed
Planning behavior:
| Signal | Recommendation |
|---|---|
| No observations | Keep configured partition size and collect observability. |
row_count / min_nonzero_row_count > max_skew_ratio |
Mark partition as split_skewed_partition. |
status = failed or retry_count > 0 |
Mark partition as split_retry_partition. |
| Split required | Reduce target_rows_per_partition, bounded by min_rows_per_partition. |
This mirrors the useful part of Spark JDBC partition tuning, but keeps dpone
stateful and resumable: recommendations are visible in dpone plan, while the
actual run still uses deterministic partition bounds, checkpoint hashes, and
staging-first finalization.
Reconciliation profiles¶
Partition correctness is configurable by profile:
| Profile | Checks | Use case |
|---|---|---|
count_only |
Per-partition row counts | Fast smoke or very large partitions where checksum is too expensive. |
count_and_checksum |
Count plus deterministic aggregate checksum | Default production balance for benchmark/certification runs. |
sample_hash |
Count, aggregate checksum, and ordered sample hash | Catches value corruption with bounded extra cost. |
full_partition_hash |
Count plus full ordered partition hash | Strongest correctness proof; use for certification slices or smaller partitions. |
Fault-injection certification supports the same profiles:
uv run python tools/mssql_clickhouse_fault_injection.py \
--rows 10000 \
--reconciliation-profile sample_hash \
--output-dir test_artifacts/live_certification/benchmarks/native_fault_injection_latest
Runbook:
- use
count_and_checksumfor regular local/live certification; - use
sample_hashbefore changing codecs, escaping, bcp options, or ClickHouse insert format settings; - use
full_partition_hashon bounded certification slices when preparing a release or validating a migration; - if a profile fails, inspect the failed partition id, compare the source and
target hash fields, then rerun only the failed partition with a smaller
target_rows_per_partition.
Runtime report evidence¶
The same local-live tool writes one runtime report per failure stage:
native_transfer_runtime_mssql_clickhouse_<stage>_<timestamp>.json
native_transfer_runtime_mssql_clickhouse_<stage>_<timestamp>.md
The runtime report schema is dpone.native_transfer.runtime_report.v1 and
contains:
- checkpoint summary;
- resume skip/retry decisions;
- load result summary;
- overall pass/fail flag.
Normal dpone run writes this report when runtime_evidence.output_dir is
configured. The live certification harness enriches the same evidence family
with partition correctness observations and source/target row checks.
Full dpone run two-pass certification¶
Use this harness when you need to prove the complete user-facing execution path, not only lower-level source/sink strategy objects:
DPONE_IT_MSSQL_HOST=127.0.0.1 \
DPONE_IT_MSSQL_PORT=51433 \
DPONE_IT_MSSQL_DATABASE=master \
DPONE_IT_MSSQL_USER=sa \
DPONE_IT_MSSQL_PASSWORD='Dp0ne.Strong.Pw.2026!' \
DPONE_IT_MSSQL_TRUST_SERVER_CERTIFICATE=yes \
DPONE_IT_MSSQL_BCP_PATH=/opt/homebrew/bin/bcp \
DPONE_IT_CH_HOST=127.0.0.1 \
DPONE_IT_CH_PORT=59000 \
DPONE_IT_CH_HTTP_PORT=58123 \
DPONE_IT_CH_DATABASE=dpone_it \
DPONE_IT_CH_USER=default \
DPONE_IT_CH_PASSWORD=dpone \
uv run python tools/mssql_clickhouse_run_certification.py \
--rows 10000 \
--target-rows-per-partition 2500 \
--export-workers 4 \
--load-workers 4 \
--output-dir test_artifacts/live_certification/benchmarks/native_dpone_run_2026_06_09
The harness deliberately stays thin and uses normal dpone execution:
- creates a disposable MSSQL source table and ClickHouse target table;
- writes a normal single-process manifest with
connection_type: params; - enables
state.type: mssqlandpartition_checkpoint_table; - runs the manifest twice through
dpone.api.run, the Python equivalent ofdpone run; - validates that the second run is a no-op skip over already committed SQL-backed partition checkpoints.
Expected artifacts:
| Artifact | Purpose |
|---|---|
mssql_to_clickhouse_native_run_cert.yml |
Exact generated manifest used by dpone run. |
runtime/native_transfer_runtime_dpone-native-transfer-first.json |
First run runtime report with retry/load summary. |
runtime/native_transfer_runtime_dpone-native-transfer-second.json |
Second run runtime report; expected skip = committed partitions and inserted_rows = 0. |
native_transfer_run_certification.json |
Machine-readable certification summary. |
native_transfer_run_certification.md |
Human-readable evidence for release notes or audit review. |
If the second run is not a no-op, check these first:
- checkpoint table points to the same MSSQL state connection used by the first run;
native_transfer.require_artifact_checksumis still enabled;- source query, schema, target table, strategy and partition bounds did not change;
- ClickHouse finalizer completed before state was committed.
Example evidence shape:
{
"schema_version": "dpone.native_transfer.resume_evidence.v1",
"passed": true,
"safe_to_resume": true,
"skipped_partition_count": 1,
"retry_partition_count": 3,
"checks": [
{"name": "state_not_committed_before_retry", "passed": true},
{"name": "all_after_retry_partitions_committed", "passed": true},
{"name": "target_count_matches_expected", "passed": true},
{"name": "no_duplicate_rows_after_retry", "passed": true},
{"name": "state_committed_after_success", "passed": true}
]
}
Runbook when certification fails:
- If
state_not_committed_before_retryis red, stop the pipeline and inspect the state backend before rerunning. - If
all_after_retry_partitions_committedis red, inspect checkpoint hashes and do not skip non-committed partitions. - If
target_count_matches_expectedis red, compare staging and final target counts before advancing state. - If
no_duplicate_rows_after_retryis red, verifyunique_key, finalizer policy and ClickHouse staging cleanup. - If
state_committed_after_successis red, treat the run as incomplete even if target counts look correct.
Finalizers¶
| Strategy | ClickHouse finalizer |
|---|---|
full_refresh |
staging/shadow swap |
incremental_append |
staging-first append |
incremental_merge |
lightweight_delete_insert by default |
partition_replace |
ALTER TABLE target REPLACE PARTITION ... FROM staging |
cdc_apply |
CDC events applied through staging-aware ClickHouse finalization |
Tuning runbook¶
Use this order when optimizing a slow MSSQL -> ClickHouse run:
- Run
dpone plan --explain-strategy --format text <manifest>and confirmnative_fast_pathismssql_bcp_queryout_to_clickhouse_direct_tsv. - Enable
native_transfer.optimizer_profile: high_throughput_safeon the source and sink. - Configure
partitioning.columnon a numeric or datetime column with good distribution. - Start with
target_rows_per_partition: 1000000,max_partitions: 64,export_workers: 4..8. - Set
load_workersto the same value as export workers only if ClickHouse has enough CPU and IO headroom. - Prefer HTTP bulk when ClickHouse HTTP endpoint is reachable from the runner; otherwise configure
clickhouse-client. - Keep
wait_for_async_insert: 1until benchmark evidence proves a looser setting is safe for your SLA. - Watch run artifacts for export time, ingest time, finalization time, rows/sec, MB/sec and ClickHouse parts count.
- Override individual profile values only after comparing tuning matrix artifacts from the same runner.
Troubleshooting¶
| Symptom | Likely cause | Action |
|---|---|---|
partitioning.bounds=auto requires a bounds resolver |
Generic planner used without MSSQL runtime resolver | Run through normal dpone source strategy, not standalone partitioner calls. |
| Very small files | Too many partitions or low target_rows_per_partition |
Increase target_rows_per_partition or lower max_partitions. |
| MSSQL CPU pressure | Too many concurrent bcp queryout workers |
Lower export_workers, use off-peak windows, or choose a better partition key. |
| ClickHouse many small parts | Load workers too high or blocks too small | Increase block size and reduce load_workers. |
| Duplicate rows after retry | Target finalizer or checkpoint contract was bypassed | Use staging-first strategies and do not manually insert artifacts into target. |
| Empty string/NULL confusion | Unsafe delimited format settings | Use dpone bulk text codec defaults and do not override NULL handling without tests. |
Industrial comparison¶
| System | Pattern | dpone behavior |
|---|---|---|
| Spark JDBC | partitionColumn, bounds and partition count |
Same split model, plus bcp queryout, durable files and ClickHouse-native ingest. |
| Pentaho | Partitioned transformations | GitOps YAML and deterministic retry boundaries instead of visual-only jobs. |
| Informatica | Bulk source partitioning and native loaders | Native extract/load with explicit governance and run artifacts. |
| dlt | Incremental state and resumable packages | Partition checkpoint model and deterministic reruns. |
| Airbyte | State after destination acceptance | Source state advances only after ClickHouse finalizer and quality checks. |
| Fivetran | Durable connector checkpoints | Per-partition evidence, committed status and hash matching. |