Config File

Skippr uses one project config file: skippr.yml. The same full engine schema is used by skippr and skipprd.

Example

yaml

skippr:
  workspace: mssql_migration

pipelines:
  mssql_to_snowflake:
    data_source: data_sources.mssql
    data_sink: data_sinks.snowflake
    cdc:
      business_key_columns: [id]

data_sources:
  mssql:
    Mssql:
      connection_string: ${MSSQL_CONNECTION_STRING}
      tables: ["dbo.customers", "dbo.orders"]
  postgres_cdc:
    Postgres:
      connection_string: ${POSTGRES_CONNECTION_STRING}
      tables: ["public.orders"]
      cdc_mode: snapshot_then_cdc

data_sinks:
  snowflake:
    Snowflake:
      account: ${SNOWFLAKE_ACCOUNT}
      user: ${SNOWFLAKE_USER}
      database: ANALYTICS
      schema: RAW
      warehouse: COMPUTE_WH
      role: ACCOUNTADMIN
      private_key_path: ${SNOWFLAKE_PRIVATE_KEY_PATH}

schema_sinks: {}
runtime_plugins: {}

Top-Level Sections

Section	Purpose
`project`	Top-level project id used for hosted React scope (`project_id`) and related paths (including Lance for doc vectors).
`skippr`	Workspace and internal skipprd extract/load defaults
`pipelines`	Named pipelines: ELT (`data_source` / `data_sink`, …) and doc-vector ingest (`vector_source` pointing at `vector_sources`, optional chunk fields)
`data_sources`	Source plugin configuration keyed by name
`data_sinks`	Destination plugin configuration keyed by name
`schema_sinks`	Catalog/schema plugin configuration keyed by name
`runtime_plugins`	Optional explicit runtime plugin manifest paths
`vector_sources`	Named file trees for `skippr vector ingest-docs` (declarative `root`, `include`, optional `exclude` / `extensions` / chunk overrides). No warehouse required.

Storage Settings

skippr.skipprd_el_storage_mode is an internal development/testing setting that controls where skipprd extract/load state is stored (local or s3). It does not control dbt project storage, React thread logs, or vector storage for skippr model; authenticated modeling runs use the storage credentials returned by the Skippr API.

The equivalent environment variable for direct skipprd runs is SKIPPRD_EL_STORAGE_MODE.

Pipelines

Each pipeline references registry entries by section-qualified name:

yaml

pipelines:
  ingest_orders:
    data_source: data_sources.postgres
    data_sink: data_sinks.iceberg

Use skippr discover --pipeline ingest_orders to persist metadata, then skippr sync --pipeline ingest_orders --once to load data. If metadata is missing, sync runs discovery automatically before loading. Run skippr model --pipeline ingest_orders after sync when you are ready to generate and validate dbt assets.

Documentation vectors (`vector_sources`)

For marketing docs, internal knowledge bases, or other text-first corpora, define vector_sources plus a pipelines entry (conventionally named vector_ingest) with vector_source set to the key you want from vector_sources. Then run skippr vector ingest-docs (optional --pipeline if you use a different name). That path uses the same Skippr authentication and tenant S3 layout as modeling, but does not require data_sinks or a warehouse. See the vector CLI reference for flags and examples.

yaml

pipelines:
  vector_ingest:
    vector_source: web_docs
    chunk_chars: 1200
    chunk_overlap: 120

vector_sources:
  web_docs:
    root: docs
    include: ["**/*.md"]

Plugin Entries

Plugin sections use the plugin name as the single key under each named entry:

yaml

data_sources:
  postgres:
    Postgres:
      connection_string: ${POSTGRES_CONNECTION_STRING}
      tables: ["public.orders"]

Destination entries follow the same shape:

yaml

data_sinks:
  iceberg:
    Iceberg:
      table_namespace: analytics
      table_location_prefix: s3://my-bucket/warehouse
      catalog:
        type: glue
        warehouse: s3://my-bucket/warehouse
        database: analytics
        region: us-east-1

CDC-capable source plugins use cdc_mode to choose how source reads begin:

Value	Behavior
`snapshot`	Bounded snapshot only.
`snapshot_then_cdc`	Full initial snapshot, then native CDC stream.
`cdc_only`	Native CDC stream only, with no initial snapshot.

Modeling Settings

skippr model --pipeline <name> derives modeling provider settings from the selected pipeline's data_sink entry and built-in defaults for catalog, dbt, vector storage, and LLM settings. Do not add react:, top-level providers:, top-level dbt:, or skippr.tenant to CLI-owned skippr.yml files. Tenant identity comes from authenticated Skippr credentials. By default, model resumes the latest modeling thread for the pipeline; use skippr model --pipeline <name> --no-resume to start a fresh thread.

Environment Variables

Use ${ENV_VAR} syntax for secrets and deployment-specific values:

yaml

data_sources:
  mssql:
    Mssql:
      connection_string: ${MSSQL_CONNECTION_STRING}

Keep secure values in the environment or your secret manager, not in skippr.yml.

Config File ​

Example ​

Top-Level Sections ​

Storage Settings ​

Pipelines ​

Documentation vectors (vector_sources) ​

Plugin Entries ​

Modeling Settings ​

Environment Variables ​

Config File

Example

Top-Level Sections

Storage Settings

Pipelines

Documentation vectors (`vector_sources`)

Plugin Entries

Modeling Settings

Environment Variables