LangChain from Scratch: Building Your First LLM App
A few of my students asked me this summer what the fastest way to build something useful with an LLM is. Not just calling the API and printing a response, but actually building an application that does something interesting. My answer right now is LangChain.
That comes with caveats. LangChain's abstractions can be frustrating. The documentation moves fast and sometimes contradicts itself. But the core patterns it teaches, loading documents, embedding them, retrieving relevant context, feeding it to an LLM, are genuinely the patterns the industry is converging on. So let's build something.
What We're Building
A simple document Q&A app. You give it a PDF or text file, ask questions about it in natural language, and it gives you answers grounded in the document's content. This is the classic RAG (Retrieval-Augmented Generation) pattern, and it's behind most of the "chat with your data" products you've seen launched this year.
Step 1: Load Your Documents
LangChain has document loaders for just about everything. PDFs, web pages, CSVs, Notion databases. For simplicity, we'll use a text file.
from langchain.document_loaders import TextLoader
loader = TextLoader("research_paper.txt")
documents = loader.load()Each document comes back as an object with page_content and metadata. Simple enough.
Step 2: Split the Text into Chunks
LLMs have context windows. You can't dump an entire 50-page paper into a prompt. Instead, you split the document into smaller chunks and only retrieve the relevant ones for each question.
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = text_splitter.split_documents(documents)The chunk_overlap parameter is important. It ensures that if a relevant passage sits right at the boundary between two chunks, you don't lose context. I've found 200 tokens of overlap to be a good default for most documents. For a deeper look at how different splitting approaches affect retrieval quality, see my post on chunking strategies that actually matter for RAG.
Step 3: Create Embeddings and Store Them
This is where the magic happens. You convert each text chunk into a vector embedding, a dense numerical representation that captures semantic meaning, and store it in a vector database. We'll use Chroma because it's lightweight and runs locally.
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(chunks, embeddings)Behind the scenes, each chunk gets sent to OpenAI's embedding model, converted to a 1536-dimensional vector, and stored in Chroma's local index. When you ask a question later, your question gets embedded the same way and Chroma finds the chunks with the most similar vectors.
Step 4: Build the Retrieval Chain
Now we connect everything. Create a retriever from the vector store, hook it up to an LLM, and let LangChain handle the orchestration.
from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=vectorstore.as_retriever(search_kwargs={"k": 3})
)
response = qa_chain.run("What are the main findings of this paper?")
print(response)The chain_type="stuff" means it takes the retrieved chunks and stuffs them all into the prompt. For longer documents, you might want map_reduce or refine, which process chunks in stages. But for most use cases, stuff works fine.
The Honest Take on LangChain
I want to be upfront about something. LangChain is both helpful and over-engineered. The simple chain we just built? You could write it in about 30 lines of raw Python with the OpenAI API and a basic vector search. LangChain's value isn't in the simple cases. It's in the complex ones, where you need agents that use tools (like function calling in production), chains that branch conditionally, or memory that persists across conversations.
The abstractions also change frequently. Code from three months ago might not work today. That's the cost of building on a library that's evolving as fast as the ecosystem it serves.
Where This Is All Heading
What I find most interesting is that the pattern we just built, load, split, embed, retrieve, generate, is becoming a standard. It doesn't matter if you use LangChain, LlamaIndex, or roll your own. The architecture is the same. The LLM app development stack is stabilizing around a few core abstractions: document loaders, text splitters, embedding models, vector stores, and orchestration chains.
That's a sign of a maturing ecosystem. A year ago, everyone was experimenting. Now, best practices are forming. If you're learning to build with LLMs, understanding these patterns matters more than mastering any specific framework. The frameworks will change. The patterns are here to stay.
Related Posts
A Beginner's Guide to RAG: Making LLMs Actually Useful
LLMs hallucinate because they don't know your data. Retrieval-Augmented Generation fixes that. Here's how it works and how to build one.
Benchmarking TurboQuant+ KV Cache Compression on Apple Silicon
I tested TurboQuant+ KV cache compression across 1.5B, 7B, and 14B models on an M4 MacBook Air. The speed gains are real, but there are sharp cliffs you need to know about.
Custom Commands and Slash Commands: Building Your Own Claude Code CLI
Slash commands turn Claude Code into a personalized CLI. A markdown file becomes a reusable workflow you invoke with a single slash. Here's how to build them.