Ingesting data into a vector index

After creating a vector index, you need to either ingest raw vector data or convert data to embeddings while ingesting it.

Comparison of ingestion methods

The following table compares the two ingestion methods.

Feature	Data format	Ingest pipeline	Vector generation	Additional fields
Raw vector ingestion	Pre-generated vectors	Not required	External	Optional metadata
Converting data to embeddings during ingestion	Text or image data	Required	Internal (during ingestion)	Original data + embeddings

Raw vector ingestion

When working with raw vectors or embeddings generated outside of OpenSearch, you directly ingest vector data into the knn_vector field. No pipeline is required because the vectors are already generated:

PUT /my-raw-vector-index/_doc/1
{
  "my_vector": [0.1, 0.2, 0.3],
  "metadata": "Optional additional information"
}

You can also use the Bulk API to ingest multiple vectors efficiently:

PUT /_bulk
{"index": {"_index": "my-raw-vector-index", "_id": 1}}
{"my_vector": [0.1, 0.2, 0.3], "metadata": "First item"}
{"index": {"_index": "my-raw-vector-index", "_id": 2}}
{"my_vector": [0.2, 0.3, 0.4], "metadata": "Second item"}

Converting data to embeddings during ingestion

After you have configured an ingest pipeline that automatically generates embeddings, you can ingest text data directly into your index:

PUT /my-ai-search-index/_doc/1
{
  "input_text": "Example: AI search description"
}

The pipeline automatically generates and stores the embeddings in the output_embedding field.

You can also use the Bulk API to ingest multiple documents efficiently:

PUT /_bulk
{"index": {"_index": "my-ai-search-index", "_id": 1}}
{"input_text": "Example AI search description"}
{"index": {"_index": "my-ai-search-index", "_id": 2}}
{"input_text": "Bulk API operation description"}

Working with sparse vectors

OpenSearch also supports sparse vectors. For more information, see Neural sparse search.

Text chunking

For information about splitting large documents into smaller passages before generating embeddings during dense or sparse AI search, see Text chunking.

Next steps

WAS THIS PAGE HELPFUL?

✔ Yes ✖ No

Tell us why

350 characters left

Have a question? Ask us on the OpenSearch forum.

Want to contribute? Edit this page or create an issue.

Ingesting data into a vector index

Comparison of ingestion methods

Raw vector ingestion

Converting data to embeddings during ingestion

Working with sparse vectors

Text chunking

Next steps

OpenSearch Links

Get Involved

Resources

Contact Us