Automating Postgres and pgvector Setup with Docker

When it comes to managing databases in development environments, Docker helps many developers by providing simplicity and isolation capabilities. In this post, we’ll explore how to set up a Docker container that runs a PostgreSQL database with the pgvector extension. This extension is crucial for efficiently performing vector operations in the database.
Why Use pgvector with PostgreSQL?
The pgvector extension supports vector operations directly within PostgreSQL, facilitating machine learning and other applications requiring vector computations. Integrating pgvector into PostgreSQL simplifies developing applications that rely on high-dimensional vector arithmetic.
Setting Up the Environment
We use Docker to create a contained environment that requires minimal configuration and is easily replicable across machines. Here’s how you can set it up step by step.
1. Docker Compose Configuration
First, create a docker-compose.yml
file. This file defines the PostgreSQL service and includes the configurations needed for the pgvector extension.
version: "3.9"
services:
pgvector-db:
env_file:
- ./postgres-pgvector/.env
build:
dockerfile: postgres.Dockerfile
container_name: postgres-pgvector
ports:
- "5454:5432"
volumes:
- db_data:/var/lib/postgresql/data
- ./postgres/vector_extension.sql:/docker-entrypoint-initdb.d/0-vector_extension.sql
networks:
- default
volumes:
db_data:
2. Dockerfile for PostgreSQL and pgvector
Next, create a postgres.Dockerfile
. This Dockerfile includes instructions to install PostgreSQL and the pgvector extension. It pulls the latest PostgreSQL image, installs necessary build packages, clones the pgvector repository, builds it, and installs it into PostgreSQL.
# Extend the official PostgreSQL 14.1 image
FROM postgres:14.1
# Install dependencies to build pgvector
RUN apt-get update && apt-get install -y \
build-essential \
postgresql-server-dev-14 \
git \
clang-11 \
llvm-11 \
ca-certificates \
&& update-ca-certificates \
&& rm -rf /var/lib/apt/lists/*
# Set clang as the default compiler
ENV CC=clang-11
ENV CXX=clang++-11
# Clone and build pgvector
WORKDIR /tmp
RUN git clone https://github.com/pgvector/pgvector.git
WORKDIR /tmp/pgvector
RUN make
RUN make install
# Enable pgvector in PostgreSQL
RUN echo "shared_preload_libraries = 'pgvector'" >> /usr/share/postgresql/postgresql.conf.sample
3. Initializing the Vector Extension
To initialize the pgvector extension, include a SQL script. In the docker-compose.yml
, this script mounts to /docker-entrypoint-initdb.d
. It runs when the container starts for the first time.
-- Create the 'vector' extension in the database set in the docker-compose.yml
CREATE EXTENSION IF NOT EXISTS vector;
We can run this without creating a database or connection details because the base image handles that using the settings from the docker-compose.yml
file.
Now, connect with your preferred SQL client using the connection details in the docker-compose file.
Conclusion
Using Docker to set up PostgreSQL with pgvector provides a smooth, repeatable process. It isolates the environment from your local setup, ensuring consistency across different machines. This approach minimizes potential conflicts during development.
By adding pgvector, developers gain powerful vector operations inside their databases, which is especially useful for machine learning applications.
The post Automating Postgres and pgvector Setup with Docker appeared first on Alpesh Kumar.
We build powerful websites using the latest technologies. Learn more about modern development practices on MDN Web Docs, or explore Yoast SEO for SEO tips.