Skip to content

Manual Source -> Sink Integration Matrix

This runbook defines the manual integration category for validating every supported source -> sink pair across every public load strategy.

The normal CI gate remains fast. The full matrix is intentionally manual because it may start database containers, Kafka, Schema Registry, and heavier type/strategy datasets.

Credentials are not required for mock_contract or mock_local. They are required only for vendor_live cases that exercise real external systems such as BigQuery, managed APIs, or managed Kafka.

What the matrix covers

The canonical matrix lives in dpone.integration_matrix.

Sources:

  • PostgreSQL
  • MSSQL / SQL Server
  • ClickHouse
  • Generic REST API
  • Kafka bounded batch topic

Sinks:

  • MSSQL / SQL Server
  • PostgreSQL
  • ClickHouse
  • BigQuery
  • Kafka topic

Load strategies:

  • full_refresh
  • incremental_append
  • incremental_merge
  • replace
  • partition_replace
  • snapshot_diff
  • scd2
  • backfill
  • Postgres-only xmin
  • Postgres/MSSQL cdc

That gives 25 source -> sink pairs and 200 strategy cases: 100 common base cases plus 25 snapshot_diff cases plus 60 DB-target partition_replace/scd2/backfill cases plus 5 Postgres xmin cases and 10 Postgres/MSSQL cdc cases.

Manual CI workflow

GitHub Actions workflow:

.github/workflows/integration-matrix.yml

Trigger it from GitHub Actions with Run workflow. It is not attached to push or pull_request.

Run modes:

Run mode External credentials Behavior
mock_contract no Run all 200 source -> sink x strategy contract cases without starting services: 100 common base cases plus snapshot_diff, DB-target partition_replace/scd2/backfill, Postgres xmin, and Postgres/MSSQL cdc.
mock_local no Start local services and run local/mock-capable cases. BigQuery target cases remain documented-contract only.
vendor_live yes Use caller-provided real managed/vendor services.

Useful filters:

source_filter=postgres,mssql
sink_filter=mssql,clickhouse
strategy_filter=incremental_merge,replace
case_id_filter=postgres_to_mssql__incremental_merge

Use * to run all cases.

Local command

Run the manual preflight matrix locally:

DPONE_RUN_INTEGRATION=1 \
DPONE_RUN_INTEGRATION_MATRIX=1 \
uv run pytest -m integration_matrix tests/integration/matrix -q

Run a focused case:

DPONE_RUN_INTEGRATION=1 \
DPONE_RUN_INTEGRATION_MATRIX=1 \
DPONE_MATRIX_CASE_ID=postgres_to_mssql__incremental_merge \
uv run pytest -m integration_matrix tests/integration/matrix -q

Write case artifacts:

DPONE_RUN_INTEGRATION=1 \
DPONE_RUN_INTEGRATION_MATRIX=1 \
DPONE_MATRIX_ARTIFACT_DIR=test_artifacts/integration_matrix \
uv run pytest -m integration_matrix tests/integration/matrix -q

Preflight, mock-local, and live execution

The first matrix layer is a credential-free preflight:

  • each source -> sink guide exists;
  • each case has install extras;
  • each case has required live profiles;
  • each case renders a minimal manifest fragment;
  • each case writes a JSON artifact when DPONE_MATRIX_ARTIFACT_DIR is set.

The second layer is mock_local: it uses disposable local services and deterministic mock data. The matrix behavior model defaults to 10,000 source rows, 20% changed rows, 5% physical deletes, and 120 sparse wide columns per sampled row. It should cover Postgres, MSSQL, ClickHouse, Kafka, and REST mock paths without external credentials.

Live database/vendor execution is layered below the same case ids. This keeps the public matrix complete while allowing teams to run only the systems they can provision in a given CI environment.

Required profiles

Profile Meaning
postgres_live PostgreSQL source/sink/state endpoint is available.
mssql_live SQL Server endpoint, ODBC Driver 18, and bcp are available.
clickhouse_live ClickHouse native/HTTP endpoint is available.
bigquery_live GCP credentials and target dataset are available.
kafka_live Kafka broker and Schema Registry are available.
rest_mock Generic REST mock source is available.

Relationship to other gates