How It Works
When you run skippr run, the CLI moves through a short public pipeline model: discover the source, sync raw data, draft dbt assets, and validate the result.
The pipeline
Your Source
│
│ discover ── inspect metadata and shape the destination
▼
Discover
│
│ sync ── move raw rows into bronze tables
▼
Bronze Tables
│
│ model ── draft silver/gold dbt assets
▼
Reviewable dbt Project
│
│ validate ── compile and run against the destination
▼
Silver and Gold ModelsEach phase runs automatically. You see real-time progress in the terminal UI (or structured logs in CI).
How this maps to the CLI phases
| Public step | CLI phases |
|---|---|
| Discover | Discover |
| Sync | Sync, Verify |
| Model | Plan, Author |
| Validate | Validate, Review |
What happens at each step
- Discover -- reads source metadata such as table names, column names, and types. Destination mapping is determined here, using deterministic logic rather than model output.
- Sync -- extracts rows and files from the source and writes them into bronze tables in your destination.
- Model -- drafts a dbt project with source definitions, staging models, and business-facing models for review.
- Validate -- runs the generated dbt project against the destination to confirm that the output materialises cleanly.
Incremental by default
Re-running skippr run on an existing project doesn't start from scratch:
- Data sync -- offsets are tracked internally. Only new and changed rows are extracted and loaded.
- dbt models -- existing models are preserved. The agent updates or adds new models as the source evolves.
This means you can run the same pipeline on a schedule and it behaves like a proper incremental ETL -- no custom state management required.
Data privacy
Row-level data only ever exists in two places: the machine running skippr, and your warehouse.
- Source data is read locally and written directly to the warehouse API (Snowflake REST, BigQuery API, Postgres wire protocol, etc.). It is never sent to Skippr or any third party.
- AI modeling uses only metadata (table names, column names, types) by default. Data samples can optionally be sent to improve model quality but are off by default.
- The Skippr cloud path handles authentication and control-plane services. It receives metadata needed to operate the service, not row-level source or warehouse data.
- Credentials live in environment variables, never in config files.
Output structure
The pipeline creates schemas in your warehouse using the project name:
| Tier | Schema name | Contents |
|---|---|---|
| Bronze | <warehouse_schema> (e.g. RAW) | Raw extracted data |
| Silver | <project>_silver (e.g. my_project_silver) | Staged, cleaned, typed |
| Gold | <project>_gold (e.g. my_project_gold) | Business-ready models |
What you can inspect
- Bronze, silver, and gold objects in your destination
- The generated dbt project in your working directory
- Connector guides for auth, permissions or network requirements, and troubleshooting
