ยท6 min read

Vector Databases Explained: Pinecone, Chroma, and Beyond

vector-dbragtutorial

Every other AI demo I've seen this year follows the same pattern. Take some data, embed it into vectors, store it somewhere, then retrieve the relevant pieces when a user asks a question. This is the core of Retrieval-Augmented Generation, and the "store it somewhere" part is a vector database. It has quietly become one of the most important pieces of infrastructure in the modern AI stack.

I've been building RAG systems for the past few months, both for my own projects and while helping students in CS3000 think about how to structure their final projects. I have opinions now. Let me share them.

How vector search works: raw data is embedded into vectors, indexed for fast lookup, then queried using approximate nearest neighbor algorithms.

What Are Vector Embeddings?

Before we talk about databases, we need to talk about embeddings. An embedding is a way to represent data, usually text, as a dense vector of floating-point numbers. A sentence like "the cat sat on the mat" might become a list of 1536 numbers that encode its semantic meaning.

The key property is that semantically similar inputs produce vectors that are close together in vector space. "The cat sat on the mat" and "A kitten rested on the rug" will have similar embeddings, even though they share almost no words. This is what makes semantic search possible.

You generate embeddings using a model. OpenAI's text-embedding-ada-002 is the most popular right now, but open-source alternatives from HuggingFace (like all-MiniLM-L6-v2 from Sentence Transformers) work well too, especially if you want to avoid API costs.

How Similarity Search Works

Once you have vectors, you need to find the ones most similar to a query vector. The two most common distance metrics are:

Cosine similarity measures the angle between two vectors. It ranges from -1 to 1, where 1 means identical direction. It's the default for most text applications because it handles varying document lengths well.

Dot product is faster to compute and works well when your vectors are normalized. In practice, for normalized vectors, cosine similarity and dot product give the same ranking.

The challenge is doing this search efficiently. With a million documents, you can't compare your query against every single vector. Vector databases use approximate nearest neighbor (ANN) algorithms like HNSW or IVF to make this fast, trading a tiny amount of accuracy for massive speed improvements.

The Contenders

Pinecone

The managed option. You don't run any infrastructure. You send vectors to their API, they handle indexing, scaling, and retrieval. It's fast, reliable, and the developer experience is polished.

Best for: Production applications where you don't want to manage infrastructure. Startups that need to move fast. Teams without dedicated ML ops.

Drawback: It's a paid service, and costs can add up with large datasets. You're also locked into their ecosystem.

Chroma

The open-source, developer-friendly option. It runs locally, stores data in SQLite by default, and has the simplest API of any vector database I've used.

import chromadb
from chromadb.utils import embedding_functions
 
client = chromadb.Client()
ef = embedding_functions.SentenceTransformerEmbeddingFunction(model_name="all-MiniLM-L6-v2")
 
collection = client.create_collection("my_docs", embedding_function=ef)
 
collection.add(
    documents=["Vector databases store embeddings", "RAG uses retrieval for generation"],
    ids=["doc1", "doc2"]
)
 
results = collection.query(query_texts=["How do vector databases work?"], n_results=1)
print(results["documents"])

Best for: Prototyping, local development, small to medium datasets. Students and researchers who want to understand the concepts without cloud costs.

Drawback: Not designed for large-scale production workloads out of the box.

Weaviate

A more feature-rich open-source option. It supports hybrid search (combining vector and keyword search), built-in vectorization modules, and GraphQL-based querying. It's heavier than Chroma but more capable.

Best for: Applications that need hybrid search or more complex query patterns. Teams that want open-source but need production-grade features.

Qdrant

Written in Rust, focused on performance. It has excellent filtering capabilities, meaning you can combine vector similarity with metadata filters efficiently. The API is clean and well-documented.

Best for: Performance-sensitive applications. Use cases where you need rich filtering alongside similarity search.

Which One Should You Pick?

For learning and prototyping, start with Chroma. It takes five minutes to set up and teaches you the core concepts without any infrastructure overhead. For production, the choice depends on your constraints. If you want managed simplicity, Pinecone. If you want open-source with production features, Weaviate or Qdrant. I go much deeper on the production trade-offs in my Pinecone vs Qdrant vs Weaviate decision framework, and if you go the self-hosted route, my guide to self-hosting Qdrant covers the full path from Docker Compose to production.

The Bigger Picture

Here's what I keep telling my students: vector databases are becoming as fundamental as relational databases. Not a replacement for them, but a necessary complement. Every application that uses AI for search, recommendations, or generation will need some form of vector storage and retrieval.

A few years ago, knowing SQL was non-negotiable for data work. I think understanding embeddings and vector search is heading in the same direction. The specific tools will evolve, Pinecone might dominate or get displaced, Chroma might scale up or stay niche. But the underlying concept of storing and retrieving semantic representations is here to stay. Build your intuition for it now.