Developer object storage guide¶
This guide documents the object storage staging architecture for contributors.
Design rule¶
Object storage is a reusable infrastructure boundary. Do not couple sources or sinks directly to boto3, google-cloud-storage, or azure-storage-blob.
Use ObjectStorageClient implementations and inject clients for testing.
Module taxonomy¶
| Module | Responsibility |
|---|---|
dpone.storage.models |
URI parsing and uploaded object metadata. |
dpone.storage.protocols |
ObjectStorageClient protocol. |
dpone.storage.local |
Local filesystem emulator for CI and unit tests. |
dpone.storage.adapters |
Lazy S3/GCS/Azure SDK adapters. |
dpone.staging.object_storage |
Staging plan, manifest, service, and cleanup policy. |
Class map¶
classDiagram
class ObjectStorageUri {
+provider
+bucket
+key
+account
+parse(value)
+child(parts)
}
class ObjectStorageClient {
<<protocol>>
+put_file(local_path, destination)
+get_file(source, local_path)
+delete_prefix(prefix)
+exists(uri)
}
class LocalObjectStorageClient
class S3ObjectStorageClient
class GCSObjectStorageClient
class AzureBlobObjectStorageClient
class ObjectStorageStagingPlan
class ObjectStorageStagingManifest
class ObjectStorageStagingService {
+stage_files(plan, files)
+cleanup(manifest)
}
ObjectStorageClient <|.. LocalObjectStorageClient
ObjectStorageClient <|.. S3ObjectStorageClient
ObjectStorageClient <|.. GCSObjectStorageClient
ObjectStorageClient <|.. AzureBlobObjectStorageClient
ObjectStorageStagingService --> ObjectStorageClient
ObjectStorageStagingService --> ObjectStorageStagingPlan
ObjectStorageStagingService --> ObjectStorageStagingManifest
ObjectStorageStagingPlan --> ObjectStorageUri
Dependency injection¶
Cloud adapters must accept an already-created SDK client:
S3ObjectStorageClient(client=fake_or_boto3_client)
GCSObjectStorageClient(client=fake_or_google_storage_client)
AzureBlobObjectStorageClient(service_client=fake_or_blob_service_client)
This keeps tests credential-free and prevents optional SDK imports at package import time.
Test requirements¶
Every object storage change must include:
- URI parser tests for all supported URI forms;
- local adapter upload/download/delete tests;
- staging manifest checksum and round-trip tests;
- lazy adapter tests with fake clients;
- docs contract tests for user docs, developer docs, extras, and architecture.
Future integration hooks¶
Target loaders should consume ObjectStorageStagingManifest instead of
provider-specific dictionaries. Provider-specific native loaders can then map
the manifest to:
- BigQuery GCS load jobs;
- ClickHouse S3 table functions;
- MSSQL
BULK INSERTfrom object storage when available in the target platform; - PostgreSQL extension-backed imports where explicitly configured.
Heavy target writes must still use staging-first finalization.