Postgres -> MSSQL¶
This guide is a copy/paste-ready starting point for loading data from Postgres into MSSQL with dpone.
When to use this path¶
Use this path when Postgres is the system of record or ingestion boundary and MSSQL is the landing, warehouse, event-log, or downstream replication target.
Copy/paste manifest¶
name: postgres_to_mssql_example
source:
type: postgres
connection_id: postgres_oltp
connection_type: vault
table:
schema: public
name: orders
options:
incremental_column: updated_at
batch_size: 50000
export_format: mssql-delimited
compress_export: false
partitioning:
strategy: auto
column: order_id
bounds: auto
target_rows_per_partition: 1000000
max_partitions: 64
export_workers: 4
load_workers: 4
sink:
type: mssql
connection_id: mssql_dwh
connection_type: vault
table:
schema: dbo
name: orders
strategy:
mode: incremental_merge
unique_key: order_id
merge_policy: delete_insert
duplicate_policy: fail
options:
bulk:
mode: bcp
bcp:
batch_size: 100000
packet_size: 65535
field_terminator: "\t"
row_terminator: "\n"
state:
type: mssql
connection_id: mssql_dwh
table:
schema: etl_state
name: dpone_state
quality:
mode: fail
checks:
- type: min_rows
threshold: 1
- type: source_target_count
tolerance_pct: 0.1
observability:
artifacts:
enabled: true
path: .dpone/runs/postgres_to_mssql
Run it locally:
Supported load strategies¶
| Strategy | Status | Notes |
|---|---|---|
full_refresh |
Supported | Uses staging first, then applies the target-specific finalization plan. |
incremental_append |
Supported | Uses staging first, then applies the target-specific finalization plan. |
incremental_merge |
Supported | Default merge_policy: delete_insert; shadow_swap is available for DB targets. |
replace |
Supported | Uses staging first, then applies the target-specific finalization plan. |
partition_replace |
Supported | Replaces target partitions represented by staging partition.column; see Load strategies for native/fallback paths. |
snapshot_reconciliation |
Supported | Uses staging first, then applies the target-specific finalization plan. |
xmin |
Postgres source only | Uses Postgres transaction IDs as the incremental boundary; configure explicitly or omit incremental_column for the default Postgres source behavior. |
cdc |
Source-specific | Uses typed CDC offsets and advances state only after sink success. |
See Load strategies for the detailed algorithm for each strategy.
Runtime algorithm¶
flowchart TD
A["Resolve manifest and registry entries"] --> B["Create Postgres source"]
B --> C["Profile partition bounds when bounds=auto"]
C --> D["Build Spark-like range partitions"]
D --> E["COPY each partition TO STDOUT"]
E --> F["Apply BulkTextCodec projection for text safety"]
F --> G["Emit mssql-delimited file artifacts"]
G --> H["Create MSSQL staging table"]
H --> I["Load artifacts into staging with bcp"]
I --> J["Finalize with delete_insert, shadow_swap, replace, or partition_replace"]
J --> K["Run quality and reconciliation checks"]
K --> L["Commit state only after target success"]
Native fast path¶
The preferred high-throughput path is:
- PostgreSQL builds a bounded
SELECTfor the configured strategy. - If
partitioning.bounds: auto, dpone runsMIN,MAX, andCOUNTover that bounded query. RangePartitionercreates deterministic half-open ranges.- Each partition is exported through PostgreSQL
COPY (...) TO STDOUT. - Text-like columns are projected through
BulkTextCodecbefore COPY, soNULL, empty strings, tabs, newlines, and control characters remain distinguishable for SQL Serverbcp. - MSSQL sink loads each artifact into staging through
bcp. - Finalization is set-based and staging-first.
Lossless bulk text contract¶
Postgres COPY and SQL Server bcp are both extremely fast, but plain
delimiter files are not safe enough by themselves. dpone uses a bulk text
codec when export_format: mssql-delimited is selected:
| Source value | File representation | Target final value |
|---|---|---|
NULL |
empty bcp field | NULL |
| empty string | framework marker | empty string |
| tab/newline/control char | escaped framework marker sequence | original text |
| normal text | original text | original text |
The codec metadata is attached to the file artifact and carried into the MSSQL
staging/finalization step. If a raw delimited artifact with text columns does
not include codec metadata, the MSSQL sink fails closed unless
allow_unsafe_raw_mssql_bulk_files: true is explicitly configured.
This is the important production guarantee: NULL and '' are never silently
collapsed into the same value on the default Postgres -> MSSQL fast path.
Canonical tuning knobs:
| Knob | Path | Meaning |
|---|---|---|
| Source export workers | source.options.partitioning.export_workers |
Parallel PostgreSQL COPY workers. |
| Target load workers | source.options.partitioning.load_workers |
Parallel artifact load workers used by the sink. |
| Partition size | source.options.partitioning.target_rows_per_partition |
Auto-calculated partition count target. |
| MSSQL bulk mode | sink.options.bulk.mode |
bcp for native SQL Server bulk load. |
| bcp batch size | sink.options.bulk.bcp.batch_size |
Rows per bcp transaction batch. |
| bcp packet size | sink.options.bulk.bcp.packet_size |
SQL Server bulk network packet size. |
The runtime consumes the same canonical bulk.bcp.* settings that dpone plan
shows. Legacy flat aliases are accepted only as migration aliases and are
reported as warnings in plan output; keep new manifests on the canonical nested
shape and use config aliases migration only
when upgrading older automation.
Partition checkpoints and certification evidence¶
Partitioned Postgres -> MSSQL transfers attach deterministic metadata to every file artifact:
| Field | Purpose |
|---|---|
partition_bounds |
The exact half-open source range exported into the file. |
query_hash |
Detects whether the source boundary changed between retries. |
schema_hash |
Detects schema drift between retries. |
transfer_partition_id |
Stable SHA-256 partition identity used for resume and evidence. |
When native transfer checkpointing is enabled, already committed partitions can be skipped on retry only when the query hash, schema hash, artifact checksum and partition identity still match. State advances only after staging load, finalization and quality checks succeed.
For release evidence, run the MSSQL benchmark/certification harnesses from
Performance guide. The recommended local profiles are
10k, 1m and 10m rows with wide sparse data and text edge cases.
When running through GitHub Actions, dispatch
.github/workflows/live-certification.yml with
run_native_benchmark_suite=true. The workflow writes both machine-readable
summary.json and human-readable postgres_mssql_native_benchmark_summary.md
artifacts under test_artifacts/live_certification/benchmarks/.
Strategy behavior¶
full_refresh: extract the selected source boundary, load into staging, and replace the target according to the target's safe finalization path.incremental_append: extract only the incremental boundary and append rows through staging or event production.incremental_merge: load into staging, validate duplicates, then usedelete_insertby default;shadow_swapis available where table swaps are supported.replace: reload a bounded predicate window through staging and then atomically replace the matching target slice.snapshot_reconciliation: compare the latest source snapshot with the target key set and apply configured physical-delete or soft-delete behavior through staging-first plans.partition_replace: extract a complete partition slice, load it into staging, and replace only partitions represented bypartition.column.
Schema evolution and type mapping¶
Schema evolution is enabled by default and runs before the staging/final load path:
- Read source schema from
ExtractResult.schema. - Introspect the MSSQL target schema.
- Apply safe additions and widening operations.
- Fail breaking changes by default.
- If configured, route incompatible type changes to
__dpone__nc__<column>.
Use Schema evolution and Type mapping matrix when adding columns or changing source types.
Runbook¶
- Start with
dpone doctor --profile localand fix missing extras or native clients. - Run
dpone plan <manifest> --format mdand review source boundary, staging path, schema evolution, state, and quality checks. - Run a small bounded window first.
- Inspect the run artifact under
.dpone/runs/postgres_to_mssql. - For incremental jobs, verify state before enabling a schedule.
- For delete-aware jobs, run reconciliation in report-only mode before enabling physical deletes.
- Promote the manifest through GitOps after the plan and artifact are reviewed.
Cross-links¶
- Source -> sink matrix
- Load strategies
- Schema evolution
- Type mapping matrix
- Reconciliation and CDC
- Performance guide
Type contracts and physical design¶
This flow supports the shared dpone type-governance stack:
- Type inference for source metadata, sampled profiling, confidence, and empty string vs
NULLbehavior. - Schema contracts for explicit logical column types, enforcement modes, and
__dpone__nc__*variant columns. - Physical design for target-specific DDL such as concrete SQL types, indexes, partitioning, compression, ClickHouse
LowCardinality, and BigQuery clustering.
Use dpone schema infer --manifest ... and dpone schema physical-plan --manifest ... before enabling new table DDL in production.