Core Concepts
A deeper look at what Skippr does under the hood and why it works the way it does.
Bronze / Silver / Gold
Skippr organises data into three tiers inside your warehouse, following the medallion architecture pattern used by most modern data teams:
| Tier | What lives here | Who creates it |
|---|---|---|
| Bronze | Raw extracted data, exactly as it appeared in the source | skippr extract-and-load |
| Silver | Cleaned, typed, and renamed staging models | dbt (AI-assisted) |
| Gold | Business-ready marts and aggregations | dbt (AI-assisted, then yours to extend) |
Each tier gets its own schema (e.g. RAW, project_silver, project_gold), keeping raw ingestion cleanly separated from transformed layers. You can query any tier directly.
Schema Discovery and Evolution
During the Discover phase, skippr reads source metadata to learn table names, column names, and data types. No manual DDL, schema registry, or YAML mapping is required for the first run.
How discovery works per source type:
- Databases -- read catalog metadata such as table and column definitions.
- Object stores -- sample structured files and infer column names and types.
- Streams and messaging -- infer structure from incoming records.
- HTTP and network inputs -- infer structure from payload shape.
Schema handling is deterministic. When source data changes shape, Skippr prefers additive evolution over silent destructive rewrites:
- compatible new fields are added
- nested structures are preserved as structured fields where possible
- incompatible type shifts are surfaced as explicit schema evolution instead of pretending the old column still means the same thing
Deterministic vs AI-Assisted Work
The important trust boundary is that ingestion correctness is deterministic and reviewable.
Deterministic responsibilities
- schema discovery and destination mapping
- type reconciliation and evolution handling
- incremental checkpoints and replay behavior
- CDC reconciliation logic such as business keys, order tokens, and tombstones
AI-assisted responsibilities
- dbt model and test scaffolding
- naming and staging-model structure
- descriptive metadata and model documentation
By default, only schema metadata is sent to the model. Data samples are optional and off by default.
Incremental Sync and CDC Correctness
Skippr has two related but different correctness stories:
- Incremental sync -- the extract-and-load engine tracks source progress so reruns only process new or changed data.
- CDC final-state reconciliation -- supported CDC source and destination pairs converge on the correct final table state using business keys, order tokens, and tombstones.
For batch and incremental sync, progress is only advanced when the corresponding load has been durably committed. For CDC, the committed change batch is the authority for resume and replay. See CDC Guarantees for the exact contract.
Data Privacy and Trust Boundary
Row-level data only ever exists in two places: the machine running skippr, and your destination.
- Source data is read locally and written directly to the destination.
- AI modeling uses metadata by default. Data samples are optional and off by default.
- Skippr's cloud path handles authentication and control-plane services, not row-level source or warehouse data.
- Credentials live in environment variables, never in config files.
dbt Integration
Skippr generates a standard, fully functional dbt project:
dbt_project.ymlandprofiles.yml-- auto-configured for your warehousemodels/schema.yml-- source definitions pointing at bronze tablesmodels/staging/stg_*.sql-- silver models with type casting and renamingpackages.yml-- any required dbt packages
After the pipeline runs, the project is yours. Add tests, snapshots, custom gold models, or plug it into your existing dbt CI/CD -- it's standard dbt, nothing proprietary.
Supported connectors
Sources
| Category | Source | Kind identifier |
|---|---|---|
| Databases | Microsoft SQL Server | mssql |
| MySQL | mysql | |
| PostgreSQL | postgres | |
| Amazon Redshift | redshift | |
| MongoDB | mongodb | |
| Amazon DynamoDB | dynamodb | |
| ClickHouse | clickhouse_source | |
| MotherDuck | motherduck_source | |
| Object Stores | Amazon S3 | s3 |
| SFTP | sftp | |
| Delta Lake | delta_lake | |
| Streaming | Kafka | kafka |
| Amazon SQS | sqs | |
| Amazon Kinesis | kinesis | |
| AMQP (RabbitMQ) | amqp | |
| Amazon SNS | sns | |
| Amazon EventBridge | eventbridge | |
| MQTT | mqtt | |
| WebSocket | websocket | |
| HTTP | HTTP Client | http_client |
| HTTP Server | http_server | |
| Other | Socket (TCP/UDP/Unix) | socket |
| StatsD | statsd | |
| Local File | file | |
| Stdin | stdin |
Destinations
| Category | Destination | Kind identifier |
|---|---|---|
| Warehouses | Snowflake | snowflake |
| Google BigQuery | bigquery | |
| PostgreSQL | postgres | |
| AWS Athena (S3 + Glue) | athena | |
| Databricks | databricks | |
| Azure Synapse | synapse | |
| Amazon Redshift | redshift | |
| ClickHouse | clickhouse | |
| MotherDuck | motherduck | |
| Cloud Storage | Google Cloud Storage | gcs |
| Azure Blob Storage | azure_blob | |
| SFTP | sftp | |
| Messaging | AMQP (RabbitMQ) | amqp |
| Other | Local File | file |
| Stdout | stdout |
