Automating Postgres and pgvector Setup with Docker

| 3 min read

When it comes to managing databases in development environments, Docker helps many developers by providing simplicity and isolation capabilities. In this post, we’ll explore how to set up a Docker container that runs a PostgreSQL database with the pgvector extension. This extension is crucial for efficiently performing vector operations in the database.

Why Use pgvector with PostgreSQL?

The pgvector extension supports vector operations directly within PostgreSQL, facilitating machine learning and other applications requiring vector computations. Integrating pgvector into PostgreSQL simplifies developing applications that rely on high-dimensional vector arithmetic.

Setting Up the Environment

We use Docker to create a contained environment that requires minimal configuration and is easily replicable across machines. Here’s how you can set it up step by step.

1. Docker Compose Configuration

First, create a docker-compose.yml file. This file defines the PostgreSQL service and includes the configurations needed for the pgvector extension.

version: "3.9"

services:
  pgvector-db:
    env_file:
      - ./postgres-pgvector/.env
    build:
      dockerfile: postgres.Dockerfile
    container_name: postgres-pgvector
    ports:
      - "5454:5432"
    volumes:
      - db_data:/var/lib/postgresql/data
      - ./postgres/vector_extension.sql:/docker-entrypoint-initdb.d/0-vector_extension.sql
    networks:
      - default

volumes:
  db_data:

2. Dockerfile for PostgreSQL and pgvector

Next, create a postgres.Dockerfile. This Dockerfile includes instructions to install PostgreSQL and the pgvector extension. It pulls the latest PostgreSQL image, installs necessary build packages, clones the pgvector repository, builds it, and installs it into PostgreSQL.

# Extend the official PostgreSQL 14.1 image
FROM postgres:14.1

# Install dependencies to build pgvector
RUN apt-get update && apt-get install -y \
    build-essential \
    postgresql-server-dev-14 \
    git \
    clang-11 \
    llvm-11 \
    ca-certificates \
    && update-ca-certificates \
    && rm -rf /var/lib/apt/lists/*

# Set clang as the default compiler
ENV CC=clang-11
ENV CXX=clang++-11

# Clone and build pgvector
WORKDIR /tmp
RUN git clone https://github.com/pgvector/pgvector.git

WORKDIR /tmp/pgvector
RUN make
RUN make install

# Enable pgvector in PostgreSQL
RUN echo "shared_preload_libraries = 'pgvector'" >> /usr/share/postgresql/postgresql.conf.sample

3. Initializing the Vector Extension

To initialize the pgvector extension, include a SQL script. In the docker-compose.yml, this script mounts to /docker-entrypoint-initdb.d. It runs when the container starts for the first time.

-- Create the 'vector' extension in the database set in the docker-compose.yml
CREATE EXTENSION IF NOT EXISTS vector;

We can run this without creating a database or connection details because the base image handles that using the settings from the docker-compose.yml file.

Now, connect with your preferred SQL client using the connection details in the docker-compose file.

Conclusion

Using Docker to set up PostgreSQL with pgvector provides a smooth, repeatable process. It isolates the environment from your local setup, ensuring consistency across different machines. This approach minimizes potential conflicts during development.

By adding pgvector, developers gain powerful vector operations inside their databases, which is especially useful for machine learning applications.

The post Automating Postgres and pgvector Setup with Docker appeared first on Alpesh Kumar.

We build powerful websites using the latest technologies. Learn more about modern development practices on MDN Web Docs, or explore Yoast SEO for SEO tips.

 

Subscribe to Our Newsletter

We don’t spam! Read our privacy policy for more info.