Config File
Skippr uses one project config file: skippr.yml. The same full engine schema is used by skippr and skipprd.
Example
skippr:
workspace: mssql_migration
pipelines:
mssql_to_snowflake:
data_source: data_sources.mssql
data_sink: data_sinks.snowflake
cdc:
business_key_columns: [id]
data_sources:
mssql:
Mssql:
connection_string: ${MSSQL_CONNECTION_STRING}
tables: ["dbo.customers", "dbo.orders"]
postgres_cdc:
Postgres:
connection_string: ${POSTGRES_CONNECTION_STRING}
tables: ["public.orders"]
cdc_mode: snapshot_then_cdc
data_sinks:
snowflake:
Snowflake:
account: ${SNOWFLAKE_ACCOUNT}
user: ${SNOWFLAKE_USER}
database: ANALYTICS
schema: RAW
warehouse: COMPUTE_WH
role: ACCOUNTADMIN
private_key_path: ${SNOWFLAKE_PRIVATE_KEY_PATH}
schema_sinks: {}
runtime_plugins: {}Top-Level Sections
| Section | Purpose |
|---|---|
project | Top-level project id used for hosted React scope (project_id) and related paths (including Lance for doc vectors). |
skippr | Workspace and internal skipprd extract/load defaults |
pipelines | Named pipelines: ELT (data_source / data_sink, …) and doc-vector ingest (vector_source pointing at vector_sources, optional chunk fields) |
data_sources | Source plugin configuration keyed by name |
data_sinks | Destination plugin configuration keyed by name |
schema_sinks | Catalog/schema plugin configuration keyed by name |
runtime_plugins | Optional explicit runtime plugin manifest paths |
vector_sources | Named file trees for skippr vector ingest-docs (declarative root, include, optional exclude / extensions / chunk overrides). No warehouse required. |
Storage Settings
skippr.skipprd_el_storage_mode is an internal development/testing setting that controls where skipprd extract/load state is stored (local or s3). It does not control dbt project storage, React thread logs, or vector storage for skippr model; authenticated modeling runs use the storage credentials returned by the Skippr API.
The equivalent environment variable for direct skipprd runs is SKIPPRD_EL_STORAGE_MODE.
Pipelines
Each pipeline references registry entries by section-qualified name:
pipelines:
ingest_orders:
data_source: data_sources.postgres
data_sink: data_sinks.icebergUse skippr discover --pipeline ingest_orders to persist metadata, then skippr sync --pipeline ingest_orders --once to load data. If metadata is missing, sync runs discovery automatically before loading. Run skippr model --pipeline ingest_orders after sync when you are ready to generate and validate dbt assets.
Documentation vectors (vector_sources)
For marketing docs, internal knowledge bases, or other text-first corpora, define vector_sources plus a pipelines entry (conventionally named vector_ingest) with vector_source set to the key you want from vector_sources. Then run skippr vector ingest-docs (optional --pipeline if you use a different name). That path uses the same Skippr authentication and tenant S3 layout as modeling, but does not require data_sinks or a warehouse. See the vector CLI reference for flags and examples.
pipelines:
vector_ingest:
vector_source: web_docs
chunk_chars: 1200
chunk_overlap: 120
vector_sources:
web_docs:
root: docs
include: ["**/*.md"]Plugin Entries
Plugin sections use the plugin name as the single key under each named entry:
data_sources:
postgres:
Postgres:
connection_string: ${POSTGRES_CONNECTION_STRING}
tables: ["public.orders"]Destination entries follow the same shape:
data_sinks:
iceberg:
Iceberg:
table_namespace: analytics
table_location_prefix: s3://my-bucket/warehouse
catalog:
type: glue
warehouse: s3://my-bucket/warehouse
database: analytics
region: us-east-1CDC-capable source plugins use cdc_mode to choose how source reads begin:
| Value | Behavior |
|---|---|
snapshot | Bounded snapshot only. |
snapshot_then_cdc | Full initial snapshot, then native CDC stream. |
cdc_only | Native CDC stream only, with no initial snapshot. |
Modeling Settings
skippr model --pipeline <name> derives modeling provider settings from the selected pipeline's data_sink entry and built-in defaults for catalog, dbt, vector storage, and LLM settings. Do not add react:, top-level providers:, top-level dbt:, or skippr.tenant to CLI-owned skippr.yml files. Tenant identity comes from authenticated Skippr credentials. By default, model resumes the latest modeling thread for the pipeline; use skippr model --pipeline <name> --no-resume to start a fresh thread.
Environment Variables
Use ${ENV_VAR} syntax for secrets and deployment-specific values:
data_sources:
mssql:
Mssql:
connection_string: ${MSSQL_CONNECTION_STRING}Keep secure values in the environment or your secret manager, not in skippr.yml.
