Skip to content

PG Vectorize API Overview

pg vectorize provides tools for two closely related tasks; vector search and retrieval augmented generation (RAG), and there are APIs dedicated to both of these tasks. Vector search is an important component of RAG and the RAG APIs depend on the vector search APIs. It could be helpful to think of the vector search APIs as lower level than RAG. However, relative to Postgres's APIs, both of these vectorize APIs are very high level.

Importing Pre-existing Embeddings

If you have already computed embeddings for your data using a compatible model, you can import these directly into pg_vectorize using the vectorize.import_embeddings function:

SELECT vectorize.import_embeddings(
    job_name => 'my_search_project',
    src_table => 'my_source_table',
    src_primary_key => 'id',
    src_embeddings_col => 'embeddings'
);

This function allows you to: - Import pre-computed embeddings without recomputation - Support both join and append table methods - Automatically validate embedding dimensions - Clean up any pending realtime jobs

The embeddings must match the dimensions expected by the model specified when creating the project with vectorize.table().

Parameters

  • job_name: The name of your pg_vectorize project (created via vectorize.table())
  • src_table: The table containing your pre-computed embeddings
  • src_primary_key: The primary key column in your source table
  • src_embeddings_col: The column containing the vector embeddings

Example

-- First create a vectorize project
SELECT vectorize.table(
    job_name => 'product_search',
    relation => 'products',
    primary_key => 'id',
    columns => ARRAY['description'],
    transformer => 'sentence-transformers/all-MiniLM-L6-v2'
);

-- Then import pre-existing embeddings
SELECT vectorize.import_embeddings(
    job_name => 'product_search',
    src_table => 'product_embeddings',
    src_primary_key => 'product_id',
    src_embeddings_col => 'embedding_vector'
);

Creating a Table from Existing Embeddings

If you have pre-computed embeddings and want to create a new vectorize table from them, use vectorize.table_from():

SELECT vectorize.table_from(
    relation => 'products',
    columns => ARRAY['description'],
    job_name => 'product_search',
    primary_key => 'id',
    src_table => 'product_embeddings',
    src_primary_key => 'product_id',
    src_embeddings_col => 'embedding_vector',
    transformer => 'sentence-transformers/all-MiniLM-L6-v2',
    schedule => 'realtime'
);

This function: 1. Creates the vectorize table structure 2. Imports your pre-computed embeddings 3. Sets up the specified update schedule (realtime triggers or cron job)

The embeddings must match the dimensions of the specified transformer model.

Parameters

  • relation: The table to create or modify
  • columns: Array of columns to generate embeddings from
  • job_name: Name for this vectorize project
  • primary_key: Primary key column in your table
  • src_table: Table containing your pre-computed embeddings
  • src_primary_key: Primary key column in your source table
  • src_embeddings_col: Column containing the vector embeddings
  • schema: Schema name (default: 'public')
  • update_col: Column tracking updates (default: 'last_updated_at')
  • index_dist_type: Index type (default: 'pgv_hnsw_cosine')
  • transformer: Model to use (default: 'sentence-transformers/all-MiniLM-L6-v2')
  • table_method: How to store embeddings (default: 'join')
  • schedule: Update schedule - 'realtime', 'manual', or cron expression (default: ' * * * ')

This approach ensures your pre-computed embeddings are properly imported before any automatic updates are enabled.