Data Source Plugin Name

Configuration - DATA_SOURCE_PLUGIN_NAME¶

Description¶

Determines the data source plugin Skippr uses for data ingestion.

Default Value¶

No default value. This configuration must be specified explicitly.

Example Values¶

"stdin": Skippr ingests data from the standard input stream.
"s3": Skippr ingests data from an S3 bucket.
"s3_inventory": Skippr ingests data from an S3 inventory.
Any unsupported value will return a console message: "Plugin {plugin_name} not supported".

Detailed Description¶

The DATA_SOURCE_PLUGIN_NAME configuration instructs Skippr on the source of data for ingestion. Depending on the value of DATA_SOURCE_PLUGIN_NAME, Skippr employs the corresponding data source plugin for data ingestion.

If the configuration is set to "stdin", the data is read from the standard input stream. This allows you to pipe data from other processes directly into Skippr for processing.

If the value is set to "s3", Skippr connects to an Amazon S3 bucket as the data source. It will read and ingest files stored in the specified S3 bucket.

For "s3_inventory", Skippr uses an S3 inventory as the data source. An S3 inventory provides a scheduled alternative to listing all objects in an S3 bucket. Skippr can parse this inventory to understand what data is available for ingestion.

In case of an unsupported value, a console message is printed indicating the plugin is not supported.

Considerations¶

The correct plugin must be installed and configured for Skippr to ingest data successfully. For example, the "s3" and "s3_inventory" plugins require appropriate AWS credentials and bucket names.
If ingesting data from "stdin", ensure that the data is properly formatted and compatible with the data parsers used in your Skippr pipeline.
Using "s3_inventory" as a source can be beneficial for large buckets where listing all objects is time and resource-intensive.
Unsupported values will not halt the program execution but will result in no data being ingested, which can affect subsequent steps in the data processing pipeline.
Be aware of any possible rate limits or access constraints on your data source to prevent potential issues during data ingestion.