skippr vector
The skippr vector command group handles vector store workflows that do not go through skippr model. Today the only subcommand is ingest-docs: walk declarative file sets from skippr.yml, chunk text, call the hosted embed API, and upsert vectors into tenant Lance storage on S3 (same keyspace layout as the data-engineer suite).
Public read copies of those vectors (for example marketing-site knowledge) are a separate sync step from your tenant prefix to the public vectors bucket; that is typically done in CI with a dedicated publish role, not by this CLI command alone.
Subcommands
| Subcommand | Purpose |
|---|---|
skippr vector ingest-docs | Chunk, embed, and upsert documentation (or similar text files) into Lance under your tenant bucket. |
Usage
skippr [--config <path>] [--log [level]] vector ingest-docs \
[--pipeline <name>] \
[--vector-source <key>] \
[--src-path <dir>] \
[--chunk-chars <n>] \
[--chunk-overlap <n>] \
[--include-glob <pattern>]... \
[--exclude-glob <pattern>]... \
[--dry-run] \
[--output text|json]Configuration (skippr.yml)
Discovery rules live entirely in YAML — the binary does not hard-code paths or extensions.
vector_sources— map of named sources. Each entry includes at leastroot(directory relative to the config file) andinclude(non-empty list of globs relative toroot). Optional:exclude,extensions, per-sourcechunk_chars/chunk_overlap.pipelines.<name>— foringest-docs, use a mapping with requiredvector_source(a key undervector_sources) and optionalchunk_chars/chunk_overlap. The default pipeline name isvector_ingest(override with--pipeline).project(top-level) — becomes the Reactproject_idin the Lance URI.skippr.workspace— workspace segment in the Lance path (same convention as the engine config).
There is no data_sink or warehouse requirement on this path; you still need Skippr authentication and a positive balance for embeddings.
Flags
| Flag | Description |
|---|---|
--pipeline <name> | Which pipelines.<name> block supplies vector_source (default: vector_ingest). |
--vector-source <key> | Override pipelines.<name>.vector_source for this run (must match a vector_sources key). |
--src-path <dir> | Override the scan root for this run (default: root from the selected source). |
--chunk-chars <n> | Override chunk size (characters). Precedence: CLI → pipeline YAML → vector_sources entry → built-in default. |
--chunk-overlap <n> | Override overlap between chunks. |
--include-glob <pattern> | Extra include glob (repeatable); merged after YAML includes. |
--exclude-glob <pattern> | Extra exclude glob (repeatable); merged after YAML excludes. |
--dry-run | Resolve files and count chunks only; no embed or Lance writes. |
--output json | Structured summary (dry run or post-ingest metadata including resolved Lance prefix). |
--output text | Human-readable progress. |
Global flags: --config, --log (same as other commands).
Authentication
Same as skippr model: skippr user login, SKIPPR_API_KEY in CI, and /auth/credentials for tenant S3 + LLM. Optional API fields knowledge_credentials and public_vectors_bucket apply to reading published public vectors in apps, not to ingest-docs writes (ingest uses the primary tenant credentials).
GitHub Actions
This repository ships .github/workflows/docs-vector-ingest.yml, which:
- Checks out the repo.
- Installs the CLI with the same public one-liner as Install:
curl -fsSL https://install.skippr.io/install.sh | sh. - Runs
bash scripts/vector-ingest-docs.sh(which invokesskippr vector ingest-docsagainst./skippr.yml).
Add a repository secret SKIPPR_API_KEY for an account that has already accepted the current EULA (interactive skippr user login once if needed). The workflow runs on workflow_dispatch and on pushes to main / master that touch docs/, skippr.yml, or the ingest script.
See also
- Config file —
vector_sourcesand doc-vector settings underpipelines. skippr model— warehouse-backed modeling (separate from doc vector ingest).
