Type Mapping Matrix¶
This document defines the default type conversion policy for dpone source -> sink flows. It complements the per-flow guides in Source -> sink matrix.
Table of contents¶
- Principles
- Logical target defaults
- Source-specific caveats
- PostgreSQL caveats
- MSSQL / SQL Server caveats
- ClickHouse caveats
- Generic REST API caveats
- Kafka caveats
- Per-flow docs
Principles¶
- Prefer lossless native types when source and sink share the same semantics.
- Prefer string/JSON landing for vendor-specific, semi-structured, spatial, hierarchy, variant, enum/domain/composite, array/range, and protobuf-only values unless a manifest mapper is explicit.
- Never auto-apply narrowing. Narrowing is a breaking schema evolution change.
- Preserve numeric precision and scale. If the sink cannot represent the source precision, fail or land as string/generated column.
- Preserve timestamps intentionally: choose UTC instant, local wall-clock, or string fidelity in the manifest.
- Use
__dpone__nc__<column>only for explicit incompatible type-change continuation. - Explicit schema contracts override inferred logical types.
- Target-specific physical design overrides render concrete sink DDL types.
Logical target defaults¶
| Logical type | MSSQL | PostgreSQL | ClickHouse | BigQuery | Kafka |
|---|---|---|---|---|---|
| boolean | bit | boolean | UInt8/Bool | BOOL | boolean/schema field |
| small integer | smallint/int | smallint/integer | Int16/Int32 | INT64 | int32/int64 |
| big integer | bigint | bigint | Int64 | INT64 | int64 |
| unsigned big integer | decimal(20,0) | numeric(20,0) | UInt64 | NUMERIC/STRING | decimal/string |
| decimal(p,s) | decimal(p,s) | numeric(p,s) | Decimal(p,s) | NUMERIC/BIGNUMERIC | decimal/string |
| float/double | float | double precision | Float64 | FLOAT64 | double |
| short string | nvarchar(n) | varchar/text | String | STRING | string |
| long text | nvarchar(max) | text | String | STRING | string |
| binary | varbinary(max) | bytea | String/base64 | BYTES | bytes/base64 |
| date | date | date | Date/Date32 | DATE | string/logical date |
| timestamp | datetime2 | timestamp/timestamptz | DateTime64 | TIMESTAMP/DATETIME | timestamp logical/string |
| JSON object | nvarchar(max) | jsonb | String/JSON | JSON/STRING | JSON object/value schema |
| array/list | nvarchar(max) JSON | array/jsonb | Array/String | REPEATED/JSON/STRING | array/schema field |
| map/object | nvarchar(max) JSON | jsonb | Map/String | JSON/RECORD/STRING | map/schema field |
| UUID/GUID | uniqueidentifier | uuid | UUID/String | STRING | string/logical uuid |
| enum/domain | nvarchar/text | native/text | String/Enum | STRING | string/enum schema |
| spatial/hierarchy | nvarchar(max)/varbinary(max) | text/extension type | String | GEOGRAPHY/STRING | string/bytes |
Source-specific caveats¶
PostgreSQL¶
- Arrays, ranges, multiranges, enums, domains, composites, geometric types, bit/varbit, tsvector/tsquery, pg_lsn, oid/reg* and extension/custom types require explicit mapping outside PostgreSQL targets.
jsonbshould land as target JSON when the target has a first-class JSON type, otherwise string JSON.numericprecision/scale must be preserved or the load fails.
MSSQL / SQL Server¶
rowversion/timestampis a binary concurrency token, not a business timestamp.sql_variant,hierarchyid,geometry,geography, deprecatedtext/ntext/image, and CLR/user-defined types require explicit mapping.uniqueidentifiermaps to native UUID/GUID where available.- MSSQL -> ClickHouse uses the
mssql_to_clickhouse_lossless_v1profile:decimal(p,s)andnumeric(p,s)map toDecimal(p,s),moneymaps toDecimal(19,4),smallmoneymaps toDecimal(10,4),datetime2(p)maps toDateTime64(p), anduniqueidentifiermaps toUUID. datetimeoffset(p)usestype_fidelity.temporal.offset_timestamp: defaultutc_instantmaps toDateTime64(p, 'UTC'); usefixed_timezonefor BI local time,preserve_offsetto add__dpone__tz_offset_minutes__<column>, orpreserve_textfor raw audit text.binary,varbinary, androwversionneedsource.options.type_fidelity.binary_encoding: hexorbase64if downstream consumers require byte-readable exactness.time(p)maps toStringby default. Usesource.options.type_fidelity.time_encoding: seconds_since_midnightto land it asUInt32seconds when sub-second precision is not required.- Use ODBC Driver 18 and bcp UTF-8/codepage settings for reliable Unicode movement.
ClickHouse¶
Nullable(T)maps to nullable target columns;LowCardinality(T)is treated asTfor most sinks.Array,Map,Nested,Tuple,AggregateFunction, andSimpleAggregateFunctionneed explicit landing policy outside ClickHouse.- Avoid mutation-based correction for target tables; prefer staging/shadow replacement or tombstone/sign modeling.
Generic REST API¶
- Explicit
columnsbeat sampled inference. - Volatile objects/arrays should land as JSON/string until the contract stabilizes.
- Date/time parsing must be configured because API formats often drift.
Kafka¶
- Schema Registry gives the strongest type contract for Avro, JSON Schema, and Protobuf.
- Without Schema Registry, JSON payloads are inferred from samples and should land conservatively.
- Tombstones and
op=deleterequire explicit delete handling.
Per-flow docs¶
See Source -> sink matrix for links to every source -> sink guide.
Related docs¶
- Schema evolution explains how type widening and incompatible type changes are handled at runtime.
- Type inference explains source metadata, samples, confidence, and fallback behavior.
- Schema contracts explains explicit logical column contracts and enforcement modes.
- Physical design explains concrete target DDL overrides, indexes, partitions, and storage hints.
- Load strategies documents which write strategies are supported per sink and source -> sink pair.
- Source -> sink matrix links every supported flow to a dedicated implementation guide.