Skip to content

Type Mapping Matrix

This document defines the default type conversion policy for dpone source -> sink flows. It complements the per-flow guides in Source -> sink matrix.

Table of contents

Principles

  • Prefer lossless native types when source and sink share the same semantics.
  • Prefer string/JSON landing for vendor-specific, semi-structured, spatial, hierarchy, variant, enum/domain/composite, array/range, and protobuf-only values unless a manifest mapper is explicit.
  • Never auto-apply narrowing. Narrowing is a breaking schema evolution change.
  • Preserve numeric precision and scale. If the sink cannot represent the source precision, fail or land as string/generated column.
  • Preserve timestamps intentionally: choose UTC instant, local wall-clock, or string fidelity in the manifest.
  • Use __dpone__nc__<column> only for explicit incompatible type-change continuation.
  • Explicit schema contracts override inferred logical types.
  • Target-specific physical design overrides render concrete sink DDL types.

Logical target defaults

Logical type MSSQL PostgreSQL ClickHouse BigQuery Kafka
boolean bit boolean UInt8/Bool BOOL boolean/schema field
small integer smallint/int smallint/integer Int16/Int32 INT64 int32/int64
big integer bigint bigint Int64 INT64 int64
unsigned big integer decimal(20,0) numeric(20,0) UInt64 NUMERIC/STRING decimal/string
decimal(p,s) decimal(p,s) numeric(p,s) Decimal(p,s) NUMERIC/BIGNUMERIC decimal/string
float/double float double precision Float64 FLOAT64 double
short string nvarchar(n) varchar/text String STRING string
long text nvarchar(max) text String STRING string
binary varbinary(max) bytea String/base64 BYTES bytes/base64
date date date Date/Date32 DATE string/logical date
timestamp datetime2 timestamp/timestamptz DateTime64 TIMESTAMP/DATETIME timestamp logical/string
JSON object nvarchar(max) jsonb String/JSON JSON/STRING JSON object/value schema
array/list nvarchar(max) JSON array/jsonb Array/String REPEATED/JSON/STRING array/schema field
map/object nvarchar(max) JSON jsonb Map/String JSON/RECORD/STRING map/schema field
UUID/GUID uniqueidentifier uuid UUID/String STRING string/logical uuid
enum/domain nvarchar/text native/text String/Enum STRING string/enum schema
spatial/hierarchy nvarchar(max)/varbinary(max) text/extension type String GEOGRAPHY/STRING string/bytes

Source-specific caveats

PostgreSQL

  • Arrays, ranges, multiranges, enums, domains, composites, geometric types, bit/varbit, tsvector/tsquery, pg_lsn, oid/reg* and extension/custom types require explicit mapping outside PostgreSQL targets.
  • jsonb should land as target JSON when the target has a first-class JSON type, otherwise string JSON.
  • numeric precision/scale must be preserved or the load fails.

MSSQL / SQL Server

  • rowversion/timestamp is a binary concurrency token, not a business timestamp.
  • sql_variant, hierarchyid, geometry, geography, deprecated text/ntext/image, and CLR/user-defined types require explicit mapping.
  • uniqueidentifier maps to native UUID/GUID where available.
  • MSSQL -> ClickHouse uses the mssql_to_clickhouse_lossless_v1 profile: decimal(p,s) and numeric(p,s) map to Decimal(p,s), money maps to Decimal(19,4), smallmoney maps to Decimal(10,4), datetime2(p) maps to DateTime64(p), and uniqueidentifier maps to UUID.
  • datetimeoffset(p) uses type_fidelity.temporal.offset_timestamp: default utc_instant maps to DateTime64(p, 'UTC'); use fixed_timezone for BI local time, preserve_offset to add __dpone__tz_offset_minutes__<column>, or preserve_text for raw audit text.
  • binary, varbinary, and rowversion need source.options.type_fidelity.binary_encoding: hex or base64 if downstream consumers require byte-readable exactness.
  • time(p) maps to String by default. Use source.options.type_fidelity.time_encoding: seconds_since_midnight to land it as UInt32 seconds when sub-second precision is not required.
  • Use ODBC Driver 18 and bcp UTF-8/codepage settings for reliable Unicode movement.

ClickHouse

  • Nullable(T) maps to nullable target columns; LowCardinality(T) is treated as T for most sinks.
  • Array, Map, Nested, Tuple, AggregateFunction, and SimpleAggregateFunction need explicit landing policy outside ClickHouse.
  • Avoid mutation-based correction for target tables; prefer staging/shadow replacement or tombstone/sign modeling.

Generic REST API

  • Explicit columns beat sampled inference.
  • Volatile objects/arrays should land as JSON/string until the contract stabilizes.
  • Date/time parsing must be configured because API formats often drift.

Kafka

  • Schema Registry gives the strongest type contract for Avro, JSON Schema, and Protobuf.
  • Without Schema Registry, JSON payloads are inferred from samples and should land conservatively.
  • Tombstones and op=delete require explicit delete handling.

Per-flow docs

See Source -> sink matrix for links to every source -> sink guide.

  • Schema evolution explains how type widening and incompatible type changes are handled at runtime.
  • Type inference explains source metadata, samples, confidence, and fallback behavior.
  • Schema contracts explains explicit logical column contracts and enforcement modes.
  • Physical design explains concrete target DDL overrides, indexes, partitions, and storage hints.
  • Load strategies documents which write strategies are supported per sink and source -> sink pair.
  • Source -> sink matrix links every supported flow to a dedicated implementation guide.