Link Search Menu Expand Document Documentation Menu

Ingesting data into a vector index

After creating a vector index, you need to either ingest raw vector data or convert data to embeddings while ingesting it.

Comparison of ingestion methods

Image for: Comparison of ingestion methods

The following table compares the two ingestion methods.

Feature Data format Ingest pipeline Vector generation Additional fields
Raw vector ingestion Pre-generated vectors Not required External Optional metadata
Converting data to embeddings during ingestion Text or image data Required Internal (during ingestion) Original data + embeddings

Raw vector ingestion

Image for: Raw vector ingestion

When working with raw vectors or embeddings generated outside of OpenSearch, you directly ingest vector data into the knn_vector field. No pipeline is required because the vectors are already generated:

PUT /my-raw-vector-index/_doc/1
{
  "my_vector": [0.1, 0.2, 0.3],
  "metadata": "Optional additional information"
}

You can also use the Bulk API to ingest multiple vectors efficiently:

PUT /_bulk
{"index": {"_index": "my-raw-vector-index", "_id": 1}}
{"my_vector": [0.1, 0.2, 0.3], "metadata": "First item"}
{"index": {"_index": "my-raw-vector-index", "_id": 2}}
{"my_vector": [0.2, 0.3, 0.4], "metadata": "Second item"}

Converting data to embeddings during ingestion

Image for: Converting data to embeddings during ingestion

After you have configured an ingest pipeline that automatically generates embeddings, you can ingest text data directly into your index:

PUT /my-ai-search-index/_doc/1
{
  "input_text": "Example: AI search description"
}

The pipeline automatically generates and stores the embeddings in the output_embedding field.

You can also use the Bulk API to ingest multiple documents efficiently:

PUT /_bulk
{"index": {"_index": "my-ai-search-index", "_id": 1}}
{"input_text": "Example AI search description"}
{"index": {"_index": "my-ai-search-index", "_id": 2}}
{"input_text": "Bulk API operation description"}

Working with sparse vectors

Image for: Working with sparse vectors

OpenSearch also supports sparse vectors. For more information, see Neural sparse search.

Text chunking

Image for: Text chunking

For information about splitting large documents into smaller passages before generating embeddings during dense or sparse AI search, see Text chunking.

Next steps

Image for: Next steps
350 characters left

Have a question? .

Want to contribute? or .