Skip to content

Strategy Intelligence

Strategy Intelligence is the dpone control-plane layer for choosing, explaining, repairing, certifying, and optimizing load strategies before a pipeline writes data.

It does not replace explicit manifests. It helps users move from "I picked a strategy by hand" to "dpone explained the safest and fastest target-native plan for this source -> sink pair".

What it covers

Capability User value CLI/API
Strategy Advisor Selects full_refresh, append, merge, partition_replace, cdc_apply, or other strategies from manifest hints. dpone strategy advise
Explainable plan Shows why a strategy was selected and which safety gates apply. dpone strategy advise --format md
Adaptive batching Suggests initial batch size and parallel workers from row estimates and partition hints. included in decision payload
Repair / resume UX Produces deterministic repair commands for failed runs. included in repair_commands
Certification matrix Classifies source -> sink -> strategy support as contract, manual live gate, or not supported. Python API v1
Native fast paths Documents target-native bulk commands and fallbacks. included in decision payload
Runtime-native auto compile Compiles strategy.mode: auto into a concrete LoadStrategy before hydration. dpone run, dpone plan
Native preflight Checks whether local fast-path tools are available. dpone strategy preflight
Certification artifacts Writes source -> sink -> strategy evidence matrix. dpone strategy certification-artifact

Strategy Advisor

Run the advisor against a manifest:

dpone strategy advise orders.yaml \
  --estimated-rows 50000000 \
  --changed-percent 0.20 \
  --delete-percent 0.05 \
  --format md

Example output highlights:

strategy: partition_replace
merge_policy: delete_insert
native_fast_path: postgres_copy_to_mssql_bcp
adaptive_batching: true

Runtime-native mode: auto

strategy.mode: auto is compiled before runtime hydration, so sinks receive a concrete LoadStrategy.

sink:
  strategy:
    mode: auto
    unique_key: [order_id]

For a large partitioned Postgres -> MSSQL load, this can compile to:

partition_replace

The decision is saved in LoadConfig.options.strategy_intelligence, so run artifacts and plan output can explain why the strategy was selected.

Use dpone plan --explain-strategy to see the decision payload:

dpone plan orders.yaml --explain-strategy --format json

Native preflight

Check whether the local machine has the tools needed for the recommended fast path:

dpone strategy preflight --source-type postgres --sink-type mssql --format md

With --fail-on-missing, the command exits non-zero when required tools are missing:

dpone strategy preflight --source-type mssql --sink-type clickhouse --fail-on-missing

Auto selection rules v1

The v1 advisor is deliberately conservative:

Manifest/runtime signal Recommended strategy
CDC is available and unique_key is configured cdc_apply
Large partitioned delta on DB sink partition_replace
unique_key is configured on mutable DB/Kafka target incremental_merge
Cursor exists but no unique_key incremental_append
No safe incremental hints full_refresh

Kafka intentionally rejects partition_replace because it is an event log, not a partitioned mutable table target.

Repair / resume UX

When a finalization step fails, Strategy Intelligence returns safe repair steps and actionable commands.

For partition replacement:

dpone resync --run-id RUN_ID --partitions from-run-artifact --plan

Safe runbook:

  1. Inspect the run report and load IDs.
  2. Validate staging files/tables and row counts.
  3. Replay only affected partitions or resume from an idempotent stage.
  4. Run source-target reconciliation.
  5. Commit state only after sink success.

CLI:

dpone strategy repair-plan RUN_ID \
  --source-type postgres \
  --sink-type mssql \
  --strategy partition_replace \
  --failed-stage finalize \
  --partition 2026-01-01 \
  --format md

Certification artifacts

Write the strategy support matrix as JSON and Markdown:

dpone strategy certification-artifact --artifact-dir test_artifacts/strategies

The artifact records support status for source -> sink -> strategy combinations:

  • contract_gate: unit/contract evidence is sufficient for normal CI.
  • manual_live_gate: run the local/live integration matrix before claiming certification.
  • not_supported: fail early with a clear diagnostic.

Native fast paths

Source -> sink Fast path Required tools Fallback
Postgres -> MSSQL postgres_copy_to_mssql_bcp psycopg, bcp pyodbc_fast_executemany
MSSQL -> ClickHouse mssql_bcp_queryout_to_clickhouse_direct_tsv bcp, clickhouse-client python_streaming_batches
Postgres -> ClickHouse postgres_copy_to_clickhouse_http_tsv psycopg, clickhouse-client or curl streaming_rows
MSSQL -> MSSQL mssql_bcp_queryout_to_bcp_import bcp pyodbc_fast_executemany

Mermaid flow

flowchart TD
    A["Manifest hints"] --> B["StrategyContext"]
    B --> C["StrategyAdvisor"]
    C --> D["StrategyDecision"]
    C --> E["NativeFastPathCatalog"]
    D --> F["Adaptive batching"]
    D --> G["Safety gates"]
    D --> H["Repair commands"]
    D --> I["Certification matrix evidence"]

Python API

from dpone.strategy_intelligence.advisor import StrategyAdvisor, StrategyContext

decision = StrategyAdvisor().advise(
    StrategyContext(
        source_type="postgres",
        sink_type="mssql",
        requested_mode="auto",
        unique_key=("order_id",),
        estimated_rows=50_000_000,
        changed_percent=0.20,
        delete_percent=0.05,
        partition_column="business_date",
    )
)
print(decision.strategy_mode)

Troubleshooting

Symptom Cause Fix
Advisor returns full_refresh Manifest has no cursor, key, CDC, or partition hints. Add unique_key, incremental_column, CDC config, or partition config.
Advisor rejects partition_replace for Kafka Kafka is not a mutable partitioned table sink. Use keyed incremental_merge event upserts.
Native fast path says fallback Required local tool is missing. Run dpone doctor and install the target-specific client.
Repair plan is not auto-resumable Failure happened during finalization. Replay staged partitions or run resync --plan before state commit.

Live preflight

dpone strategy preflight checks local tools such as bcp and clickhouse-client. The live execution layer also exposes a probe contract for target-specific readiness checks:

  • ODBC/driver availability;
  • staging schema accessibility;
  • target write permissions;
  • target lock risk before finalization.

The default OSS probe is plan-only and safe. Live adapters can inject concrete probes for MSSQL, Postgres, ClickHouse, or BigQuery without changing CLI code.

Adaptive runtime batching

mode: auto now stores an adaptive batch plan in LoadConfig.options.strategy_intelligence. When a user has not explicitly set batch_size, the initial adaptive batch size becomes LoadConfig.batch_size.

Runtime observations can be fed into AdaptiveBatchController:

from dpone.strategy_intelligence import AdaptiveBatchController, BatchObservation

controller = AdaptiveBatchController(initial_batch_size=50000, min_batch_size=10000, max_batch_size=200000)
adjustment = controller.observe(BatchObservation(rows=50000, duration_seconds=1.0, target_backpressure=0.05))
print(adjustment.next_batch_size)

Rules are intentionally simple in v1:

  • increase batch size when throughput is healthy and backpressure is low;
  • decrease batch size when target backpressure is high;
  • always stay within configured min/max bounds.

Resync and resume commands

Repair commands are now first-class top-level CLI commands. They are safe by default: without --yes, they write a replay artifact and do not mark replay as executed.

Plan a partition resync:

dpone resync \
  --run-id RUN_ID \
  --source-type postgres \
  --sink-type mssql \
  --strategy partition_replace \
  --partition 2026-01-01 \
  --format md

Approve execution contract:

dpone resync \
  --run-id RUN_ID \
  --source-type postgres \
  --sink-type mssql \
  --strategy partition_replace \
  --partition 2026-01-01 \
  --yes

Plan a resume from an idempotent failed stage:

dpone resume RUN_ID \
  --from-stage stage \
  --source-type postgres \
  --sink-type mssql \
  --strategy incremental_merge

Execution semantics in v1:

  • mode: plan: write replay artifact only;
  • mode: execute: mark replay artifact as approved/executed after --yes;
  • live database mutation remains an adapter-level next step, not implicit CLI behavior.

Replay adapters

Replay adapters turn an approved replay artifact into a sink-specific execution sequence. They are still dependency-injected: the default OSS backend records an artifact-only execution contract, while live deployments can provide database or Kafka backends.

Database replay sequence:

sequenceDiagram
    participant CLI as dpone resync/resume --yes
    participant Service as ReplayExecutionService
    participant Adapter as DbReplayAdapter
    participant Backend as ReplayBackend
    participant State as State backend

    CLI->>Service: ReplayExecutionRequest
    Service->>Adapter: execute(request)
    Adapter->>Backend: validate staging
    Adapter->>Backend: execute finalizer
    Adapter->>Backend: reconcile source and target
    Adapter->>State: commit state
    Service-->>CLI: artifact with operations

Kafka replay sequence:

sequenceDiagram
    participant CLI as dpone resume --yes
    participant Adapter as KafkaReplayAdapter
    participant Backend as ReplayBackend
    participant Kafka as Kafka producer
    participant State as State backend

    CLI->>Adapter: execute(request)
    Adapter->>Backend: validate staging
    Adapter->>Kafka: produce replay events
    Adapter->>Backend: reconcile produced events
    Adapter->>State: commit state

Safety rules:

  1. Always validate staging before finalization or event production.
  2. Always reconcile before state commit.
  3. Never commit state when reconciliation fails.
  4. Keep Kafka replay append/event-log only; do not emulate partition replacement.
  5. Record operations in the replay artifact for auditability.

In artifact-only mode, --yes marks the replay contract as executed but does not mutate external systems. Live mutation requires an injected ReplayBackend from a deployment/runtime adapter.

Live replay backends

Live replay backends are the sink-specific implementation layer behind approved dpone resync --yes and dpone resume --yes requests. They use injected clients, so dpone core does not create DB/Kafka connections inside CLI commands.

Supported backend contracts:

Sink Backend Finalization path
MSSQL MssqlReplayBackend partition switch for partition_replace, delete+insert fallback for merge-like replay.
Postgres PostgresReplayBackend partition attach/detach contract for partition_replace, delete+insert fallback.
ClickHouse ClickHouseReplayBackend native ALTER TABLE ... REPLACE PARTITION ... FROM staging.
BigQuery BigQueryReplayBackend partition-scoped delete+insert contract.
Kafka KafkaLiveReplayBackend replay event production, no mutable target operations.

All live backends follow the same safety invariant:

flowchart TD
    A["validate staging"] --> B["execute finalizer or produce replay events"]
    B --> C["reconcile"]
    C --> D{"reconciliation passed?"}
    D -- yes --> E["commit state"]
    D -- no --> F["fail without state commit"]

The default package still ships an artifact-only backend for safe OSS use. Deployments can inject SQL/Kafka clients that satisfy the backend protocols.

Runtime replay clients

Runtime replay clients bridge the control-plane replay contracts to the existing runtime connector and credential layer.

Class Role
ReplayBackendConnection Declarative reference to sink type, connection id, target table, staging schema, and credentials source
RuntimeSqlReplayClient Wraps runtime SQL connectors and exposes the narrow exists, scalar, execute replay port
RuntimeKafkaReplayClient Wraps KafkaConnector.create_producer() and exposes produce, flush, scalar
RuntimeReplayBackendFactory Builds MSSQL/Postgres/ClickHouse/BigQuery/Kafka replay backends through existing BaseFactory credential resolution

Example:

from dpone.strategy_intelligence import ReplayBackendConnection, RuntimeReplayBackendFactory

backend = RuntimeReplayBackendFactory().build(
    ReplayBackendConnection(
        sink_type="mssql",
        connection_id="mssql_dwh",
        target_schema="dbo",
        target_table="orders",
        staging_schema="staging",
        credentials_source="env",
    )
)

The factory does not import pyodbc, psycopg, clickhouse-driver, or confluent-kafka directly. Those SDKs stay behind runtime connectors and are loaded only when a live connector is actually built.

Native transfer plan details

For MSSQL -> ClickHouse, strategy intelligence includes an optional native_transfer_plan inside the decision payload. dpone plan --explain-strategy renders the selected export method, ingest method, finalizer and partitioning strategy.

Example decision fields:

{
  "native_fast_path": "mssql_bcp_queryout_to_clickhouse_direct_tsv",
  "native_transfer_plan": {
    "export_method": "bcp_queryout",
    "ingest_method": "clickhouse_direct_tsv",
    "finalizer": "lightweight_delete_insert",
    "retry_boundaries": ["export", "load_staging", "finalize"]
  }
}