Skip to content

Source landing semantics

Some sources—especially API and SaaS connectors—declare how each namespace’s data must land in the warehouse. Skippr calls this a namespace contract. Contracts are separate from column types discovered by skippr discover.

Namespace contracts

For every table (namespace) a source can emit, the source plugin publishes a contract that includes:

FieldMeaning
Primary keyLogical row identity (business dimensions plus identifiers such as property_id and date)
Partition keyColumns that define a physical slice for partition-scoped writes (often date for daily reports)
Write policyHow the configured data sink must apply each batch
Refresh windowOptional number of days to re-fetch before the checkpoint (for APIs that revise past days)
SemanticsDescriptor such as mutable_report (informational; does not change engine behavior by itself)

The host validates contracts when a source starts and checks that your pipeline’s data sink supports every declared write policy.

Write policies

PolicyUse whenTypical sources
appendRows are immutable or append-onlyEvent streams, webhooks
merge_by_keyYou need current state by business keyEntity snapshots (sink must support merge)
replace_partitionA partition can be fully rewritten when numbers changeGA4 daily reports, other mutable report APIs
replace_tableSmall, bounded full snapshotsReference dumps

Mutable reports: APIs like GA4 can change metrics for dates you already synced. Appending new rows duplicates or conflicts with old values. replace_partition tells the sink to drop and rewrite the partition for the batch’s partition key values (for example date=2024-01-15) before writing new Parquet.

Lookback / refresh window: The source re-pulls the last N days on each run so corrected API values overwrite the same partitions. Checkpoints record progress (for example last completed date); they do not by themselves prove the warehouse is correct without the matching write policy and lookback.

Destination pairing

Data sinkreplace_partitionNotes
Athena (S3 + Glue)SupportedDeletes the contract partition prefix in S3, then writes new files; Glue partition keys follow the contract
IcebergSupportedNative partition replace
Append-only sinksNot suitable for mutable report sourcesPipeline validation fails if the sink cannot honor the contract

Pair mutable report sources with Athena or Iceberg and run skippr discover after the first sync so Glue/Iceberg schemas include contract partition columns.

Discover vs contracts

LayerControls
Namespace contractHow batches land (append vs replace partition, partition columns)
Arrow / discovered schemaColumn names and types in bronze

Contract partition keys do not add columns by themselves; dimensions such as date must appear in the ingested records. Discovery and schema sinks align catalog DDL with both the contract and observed data.

Example: Google Analytics 4

GA4 curated daily namespaces use replace_partition on date, with lookback_days driving the refresh window. See Google Analytics (GA4) and Athena.

For plugin authors, see the maintainer guide API / SaaS source plugins in the skipprd repository.