Core Concepts

A deeper look at what Skippr does under the hood and how the main pieces fit together.

Bronze / Silver / Gold

Skippr organises data into three tiers inside your warehouse, following the medallion architecture pattern used by most modern data teams:

Tier	What lives here	Who creates it
Bronze	Raw extracted data, exactly as it appeared in the source	`skipprd` extract/load
Silver	Cleaned, typed, and renamed staging models	Data Agent + dbt
Gold	Business-ready marts and aggregations	Data Agent + dbt, then yours to extend

Each tier gets its own schema (e.g. RAW, project_silver, project_gold), keeping raw ingestion cleanly separated from transformed layers. You can query any tier directly.

Schema Discovery and Evolution

During the Discover phase, skippr reads source metadata to learn table names, column names, and data types. No manual DDL, schema registry, or YAML mapping is required for the first run.

How discovery works per source type:

Databases -- read catalog metadata such as table and column definitions.
Object stores -- sample structured files and infer column names and types.
Streams and messaging -- infer structure from incoming records.
HTTP and network inputs -- infer structure from payload shape.

Schema handling is deterministic. When source data changes shape, Skippr prefers additive evolution over silent destructive rewrites:

compatible new fields are added
nested structures are preserved as structured fields where possible
incompatible type shifts are surfaced as explicit schema evolution instead of silently treating the old column definition as unchanged

Deterministic vs AI-Assisted Work

Ingestion correctness is deterministic and reviewable.

Deterministic responsibilities

schema discovery and destination mapping in skipprd
type reconciliation and evolution handling
incremental checkpoints and replay behavior
CDC reconciliation logic such as business keys, order tokens, and tombstones

AI-assisted responsibilities

Data Agent dbt model and test scaffolding
naming and staging-model structure
descriptive metadata and model documentation

By default, only schema metadata is sent to the model. Data samples are optional and off by default.

Incremental Sync and CDC Correctness

Skippr has two related but different correctness stories:

Incremental sync -- the extract-and-load engine tracks source progress so reruns only process new or changed data.
CDC final-state reconciliation -- supported CDC source and destination pairs converge on the correct final table state using business keys, order tokens, and tombstones.

For batch and incremental sync, progress is only advanced when the corresponding load has been durably committed. For CDC, resume and replay are based on the last committed change batch. See CDC Guarantees for the full contract.

Data flow and privacy

Row-level data only ever exists in two places: the machine running skippr, and your destination.

Source data is read locally and written directly to the destination.
AI modeling uses metadata by default. Data samples are optional and off by default.
Skippr's cloud path handles authentication and control-plane services, not row-level source or warehouse data.
Credentials live in environment variables, never in config files.

dbt Integration

Skippr generates a standard, fully functional dbt project:

dbt_project.yml and profiles.yml -- auto-configured for your warehouse
models/schema.yml -- source definitions pointing at bronze tables
models/staging/stg_*.sql -- silver models with type casting and renaming
packages.yml -- any required dbt packages

After the pipeline runs, the project is yours. Add tests, snapshots, custom gold models, or plug it into your existing dbt CI/CD -- it's standard dbt, nothing proprietary.

Supported connectors

Sources

Category	Source	Kind identifier
Databases	Microsoft SQL Server	`mssql`
	MySQL	`mysql`
	PostgreSQL	`postgres`
	Amazon Redshift	`redshift`
	MongoDB	`mongodb`
	Amazon DynamoDB	`dynamodb`
	ClickHouse	`clickhouse_source`
	MotherDuck	`motherduck_source`
Object Stores	Amazon S3	`s3`
	SFTP	`sftp`
	Delta Lake	`delta_lake`
Streaming	Kafka	`kafka`
	Amazon SQS	`sqs`
	Amazon Kinesis	`kinesis`
	AMQP (RabbitMQ)	`amqp`
	Amazon SNS	`sns`
	Amazon EventBridge	`eventbridge`
	MQTT	`mqtt`
	WebSocket	`websocket`
HTTP	HTTP Client	`http_client`
	HTTP Server	`http_server`
Analytics APIs	Google Analytics (GA4)	`google_analytics` (runtime plugin `GoogleAnalytics`)
	Google Search Console	`google_search_console` (runtime plugin `GoogleSearchConsole`)
	Google PageSpeed Insights	`google_pagespeed` (runtime plugin `GooglePageSpeed`)
	SEO Crawl	`seo_crawl` (runtime plugin `SeoCrawl`)
	DataForSEO Backlinks	`dataforseo_backlinks` (runtime plugin `DataForSeoBacklinks`)
	Site Quality	`site_quality` (runtime plugin `SiteQuality`)
	Apple Search Ads	`apple_search_ads` (runtime plugin `AppleSearchAds`)
	Meta Instagram Ads	`meta_instagram_ads` (runtime plugin `MetaInstagramAds`)
Other	Socket (TCP/UDP/Unix)	`socket`
	StatsD	`statsd`
	Local File	`file`
	Stdin	`stdin`

Destinations

Category	Destination	Kind identifier
Warehouses	Snowflake	`snowflake`
	Google BigQuery	`bigquery`
	PostgreSQL	`postgres`
	AWS Athena (S3 + Glue)	`athena`
	Databricks	`databricks`
	Azure Synapse	`synapse`
	Amazon Redshift	`redshift`
	ClickHouse	`clickhouse`
	MotherDuck	`motherduck`
Cloud Storage	Google Cloud Storage	`gcs`
	Azure Blob Storage	`azure_blob`
	SFTP	`sftp`
Messaging	AMQP (RabbitMQ)	`amqp`
Other	Local File	`file`
	Stdout	`stdout`

Core Concepts ​

Bronze / Silver / Gold ​

Schema Discovery and Evolution ​

Deterministic vs AI-Assisted Work ​

Incremental Sync and CDC Correctness ​

Data flow and privacy ​

dbt Integration ​

Supported connectors ​

Sources ​

Destinations ​