Source landing semantics

Some sources—especially API and SaaS connectors—declare how each namespace’s data must land in the warehouse. Skippr calls this a namespace contract. Contracts are separate from column types discovered by skippr discover.

Namespace contracts

For every table (namespace) a source can emit, the source plugin publishes a contract that includes:

Field	Meaning
Primary key	Logical row identity (business dimensions plus identifiers such as `property_id` and `date`)
Partition key	Columns that define a physical slice for partition-scoped writes (often `date` for daily reports)
Write policy	How the configured data sink must apply each batch
Refresh window	Optional number of days to re-fetch before the checkpoint (for APIs that revise past days)
Semantics	Descriptor such as `mutable_report` (informational; does not change engine behavior by itself)

The host validates contracts when a source starts and checks that your pipeline’s data sink supports every declared write policy.

Write policies

Policy	Use when	Typical sources
`append`	Rows are immutable or append-only	Event streams, webhooks
`merge_by_key`	You need current state by business key	Entity snapshots (sink must support merge)
`replace_partition`	A partition can be fully rewritten when numbers change	GA4 daily reports, other mutable report APIs
`replace_table`	Small, bounded full snapshots	Reference dumps

Mutable reports: APIs like GA4 can change metrics for dates you already synced. Appending new rows duplicates or conflicts with old values. replace_partition tells the sink to drop and rewrite the partition for the batch’s partition key values (for example date=2024-01-15) before writing new Parquet.

Lookback / refresh window: The source re-pulls the last N days on each run so corrected API values overwrite the same partitions. Checkpoints record progress (for example last completed date); they do not by themselves prove the warehouse is correct without the matching write policy and lookback.

Destination pairing

Data sink	`replace_partition`	Notes
Athena (S3 + Glue)	Supported	Deletes the contract partition prefix in S3, then writes new files; Glue partition keys follow the contract
Iceberg	Supported	Native partition replace
Append-only sinks	Not suitable for mutable report sources	Pipeline validation fails if the sink cannot honor the contract

Pair mutable report sources with Athena or Iceberg and run skippr discover after the first sync so Glue/Iceberg schemas include contract partition columns.

Discover vs contracts

Layer	Controls
Namespace contract	How batches land (append vs replace partition, partition columns)
Arrow / discovered schema	Column names and types in bronze

Contract partition keys do not add columns by themselves; dimensions such as date must appear in the ingested records. Discovery and schema sinks align catalog DDL with both the contract and observed data.

Example: Google Analytics 4

GA4 curated daily namespaces use replace_partition on date, with lookback_days driving the refresh window. See Google Analytics (GA4) and Athena.

For plugin authors, see the maintainer guide API / SaaS source plugins in the skipprd repository.

Source landing semantics ​

Namespace contracts ​

Write policies ​

Destination pairing ​