PGVectorStore
PGVectorStore
is an implementation of a LangChain vectorstore using postgres
as the backend.
This notebook goes over how to use the PGVectorStore
API.
The code lives in an integration package called: langchain-postgres.
Setup
This package requires a PostgreSQL database with the pgvector
extension.
You can run the following command to spin up a container for a pgvector
enabled Postgres instance:
docker run --name pgvector-container -e POSTGRES_USER=langchain -e POSTGRES_PASSWORD=langchain -e POSTGRES_DB=langchain -p 6024:5432 -d pgvector/pgvector:pg16
Install
Install the integration library, langchain-postgres
.
%pip install --upgrade --quiet langchain-postgres
# This notebook also requires the following dependencies
%pip install --upgrade --quiet langchain-core langchain-cohere sqlalchemy
Set your Postgres values
Set your Postgres values to test the functionality in this notebook against a Postgres instance.
# @title Set your values or use the defaults to connect to Docker { display-mode: "form" }
POSTGRES_USER = "langchain" # @param {type: "string"}
POSTGRES_PASSWORD = "langchain" # @param {type: "string"}
POSTGRES_HOST = "localhost" # @param {type: "string"}
POSTGRES_PORT = "6024" # @param {type: "string"}
POSTGRES_DB = "langchain" # @param {type: "string"}
TABLE_NAME = "vectorstore" # @param {type: "string"}
VECTOR_SIZE = 1024 # @param {type: "int"}
Initialization
PGEngine Connection Pool
One of the requirements and arguments to establish PostgreSQL as a vector store is a PGEngine
object. The PGEngine
configures a shared connection pool to your Postgres database. This is an industry best practice to manage number of connections and to reduce latency through cached database connections.
PGVectorStore
can be used with the asyncpg
and psycopg3
drivers.
To create a PGEngine
using PGEngine.from_connection_string()
you need to provide:
url
: Connection string using thepostgresql+asyncpg
driver.
Note: This tutorial demonstrates the async interface. All async methods have corresponding sync methods.
# See docker command above to launch a Postgres instance with pgvector enabled.
CONNECTION_STRING = (
f"postgresql+asyncpg://{POSTGRES_USER}:{POSTGRES_PASSWORD}@{POSTGRES_HOST}"
f":{POSTGRES_PORT}/{POSTGRES_DB}"
)
# To use psycopg3 driver, set your connection string to `postgresql+psycopg://`
from langchain_postgres import PGEngine
pg_engine = PGEngine.from_connection_string(url=CONNECTION_STRING)
To create a PGEngine
using PGEngine.from_engine()
you need to provide:
engine
: An object ofAsyncEngine
from sqlalchemy.ext.asyncio import create_async_engine
# Create an SQLAlchemy Async Engine
engine = create_async_engine(
CONNECTION_STRING,
)
pg_engine = PGEngine.from_engine(engine=engine)
Initialize a table
The PGVectorStore
class requires a database table. The PGEngine
engine has a helper method ainit_vectorstore_table()
that can be used to create a table with the proper schema for you.
See Create a custom Vector Store or Create a Vector Store using existing table for customizing the schema.
await pg_engine.ainit_vectorstore_table(
table_name=TABLE_NAME,
vector_size=VECTOR_SIZE,
)
Optional Tip: 💡
You can also specify a schema name by passing schema_name
wherever you pass table_name
. Eg:
SCHEMA_NAME="my_schema"
await pg_engine.ainit_vectorstore_table(
table_name=TABLE_NAME,
vector_size=768,
schema_name=SCHEMA_NAME, # Default: "public"
)
Create an embedding class instance
You can use any LangChain embeddings model.
from langchain_cohere import CohereEmbeddings
embedding = CohereEmbeddings(model="embed-english-v3.0")
Initialize a default PGVectorStore
Use the default table schema to connect to the vectorstore.
See Create a custom Vector Store or Create a Vector Store using existing table for customizing the schema.
from langchain_postgres import PGVectorStore
store = await PGVectorStore.create(
engine=pg_engine,
table_name=TABLE_NAME,
# schema_name=SCHEMA_NAME,
embedding_service=embedding,
)
Manage vector store
Add documents
Add documents to the vector store. Metadata is stored in a JSON column, see "Create a custom Vector Store" to store metadata to be used for filters.
import uuid
from langchain_core.documents import Document
docs = [
Document(
id=str(uuid.uuid4()),
page_content="Red Apple",
metadata={"description": "red", "content": "1", "category": "fruit"},
),
Document(
id=str(uuid.uuid4()),
page_content="Banana Cavendish",
metadata={"description": "yellow", "content": "2", "category": "fruit"},
),
Document(
id=str(uuid.uuid4()),
page_content="Orange Navel",
metadata={"description": "orange", "content": "3", "category": "fruit"},
),
]
await store.aadd_documents(docs)
Add texts
Add text directly to the vectorstore, if not structured as a Document.
import uuid
all_texts = ["Apples and oranges", "Cars and airplanes", "Pineapple", "Train", "Banana"]
metadatas = [{"len": len(t)} for t in all_texts]
ids = [str(uuid.uuid4()) for _ in all_texts]
await store.aadd_texts(all_texts, metadatas=metadatas, ids=ids)
Delete documents
Documents can be deleted using ids.
await store.adelete([ids[1]])
Query vector store
Search for documents
Use a natural language query to search for similar documents.
query = "I'd like a fruit."
docs = await store.asimilarity_search(query)
print(docs)
Search for documents by vector
Search for similar documents using a vector embedding.
query_vector = embedding.embed_query(query)
docs = await store.asimilarity_search_by_vector(query_vector, k=2)
print(docs)