ยท15 min read

NotebookLM from the Terminal: Querying Your Docs with Claude Code

claude-codetoolstutorial

NotebookLM is one of the best products Google has shipped in years, and almost nobody talks about it. It's buried under the noise of Gemini announcements and AI Studio launches and whatever the latest rebrand is. But the actual product, the thing you can go use right now, is genuinely excellent for a specific and important use case: asking questions about your own documents and getting answers that are grounded exclusively in those documents.

I've been using it for months. Research papers, project documentation, meeting transcripts, course materials. Upload them, ask questions, get answers with citations. No hallucination. No "based on my training data" hedging. Just answers from your documents, with inline references pointing to exactly where the information came from.

The problem is the workflow. Every time I want to query my documents, I have to leave whatever I'm doing, open a browser, navigate to notebooklm.google.com, find the right notebook, type my question, wait for the response, read it, and then context-switch back to whatever I was actually working on. If I'm deep in a terminal session debugging something and I need to check project documentation, that context switch is expensive. Not just in time. In focus.

So I built a Claude Code skill that does all of this from the terminal. You type a natural language question, Claude Code figures out which notebook to query, opens a browser session behind the scenes, navigates to NotebookLM, asks your question, extracts Gemini's response with full citations, and returns it right there in your terminal. No browser tabs. No context switches. No leaving your flow.

The full query flow from terminal to NotebookLM and back. Claude Code orchestrates browser automation to extract source-grounded answers with citations.

What NotebookLM Actually Does

For readers who haven't used it: NotebookLM is a document-grounded question-answering system built on Gemini. You create "notebooks" by uploading sources. PDFs, Google Docs, websites, YouTube videos, plain text. Each notebook is its own knowledge base.

When you ask a question, Gemini answers exclusively from the documents in that notebook. This is the critical distinction. Most LLMs draw on their entire training corpus when they answer questions. They might be right. They might be confidently wrong. You can't tell without verifying, because there's no citation trail.

NotebookLM flips this. Every claim in the response comes with a citation pointing to a specific source in your notebook. If you ask "what was the decision on the database schema?" and the answer isn't in your uploaded documents, it tells you. It doesn't guess. It doesn't synthesize an answer from general knowledge. It says the information isn't available in your sources.

This is source-grounded answering, not knowledge-grounded answering. The difference matters enormously when you're working with specific documents where accuracy is non-negotiable. Legal documents. Research papers. Technical specifications. Meeting notes where you need to know what was actually said, not what an LLM thinks might have been said.

I've tested it against my own RAG pipelines, and honestly, for the use case of "I have 10-50 documents and I want to ask questions about them," NotebookLM's retrieval quality is remarkably good. Better than most quick RAG setups I've built, which says something about the engineering Google put into it.

The Skill Architecture

The NotebookLM skill is a Claude Code skill backed by browser automation. Here's how it works.

Playwright MCP is the engine. NotebookLM doesn't have a public API. Google hasn't exposed one, and I doubt they will anytime soon. So the only way to interact with it programmatically is through the browser. The Playwright MCP server gives Claude Code the ability to control a real Chromium browser: navigate to URLs, click elements, fill input fields, read page content. It's the same tool that powers browser-based testing, repurposed for AI-driven automation.

Authentication is persistent. The skill uses a browser profile that maintains your Google login session. You authenticate once, and subsequent queries reuse that session. No re-entering credentials every time you want to ask a question. The browser profile lives locally on your machine and persists between Claude Code sessions.

The notebook library is the brain. This is a local JSON registry of all your NotebookLM notebooks. Each entry has a name, a description, a list of topics, and the notebook URL. When you ask a question, the skill doesn't need you to specify which notebook to query. It reads your question, looks at the library, and selects the most relevant notebook based on the topic match. You ask "what were the action items from last week's standup?" and it knows to query your meeting notes notebook, not your research papers notebook.

Each query is a fresh browser session. When you ask a question, the skill opens a new browser instance with your authenticated profile, navigates to the selected notebook's URL, types your question into the NotebookLM chat input, waits for Gemini to generate a response, extracts the full answer including citations, and closes the browser. Fresh sessions are more reliable than trying to maintain a long-running browser instance. NotebookLM's UI can get into weird states if you leave it open too long, and a clean session avoids all of that.

The response comes back structured. You don't just get raw text. The skill extracts the answer, the citations with source references, and any caveats Gemini includes about information availability. This gets formatted cleanly in your terminal, so you can read it, copy relevant parts, and keep working.

The whole round trip, from question to answer, takes roughly 15-30 seconds depending on the complexity of the query and how much Gemini needs to retrieve. Not instant. But fast enough that it's dramatically better than the manual browser workflow, especially when you factor in the context-switch cost.

Smart Discovery: Self-Describing Notebooks

The notebook library is great once it's populated. But populating it manually is tedious. Every notebook would need a name, a description, and a list of topics. For someone with a dozen notebooks, that's a lot of metadata to write by hand.

So I built a discovery feature that lets notebooks describe themselves.

Here's how it works. When you want to add a new notebook to the library, you give the skill the notebook URL. That's it. Just the URL. The skill then runs a meta-query: it opens the notebook and asks Gemini "What is this notebook about? What topics does it cover? Summarize the contents."

Gemini, which has access to every document in that notebook, responds with a comprehensive description. The skill takes that response and uses it to automatically populate the library entry: name, description, and topic tags. All derived from the actual contents of the notebook, described by the model that has read every document in it.

This is the kind of thing that feels obvious in retrospect but was satisfying to build. Instead of me writing metadata about my documents, I let the system that has actually read all the documents write the metadata. The descriptions are consistently more thorough and accurate than what I would have written manually, because Gemini has actually processed every page of every uploaded document. I've skimmed most of them at best.

The practical result is that adding a new notebook to the system takes about 30 seconds. Paste the URL, wait for the discovery query, review the auto-generated metadata, done. The library stays current with minimal effort.

When This Beats RAG

Let me be direct about this, because I've built enough RAG systems to have strong opinions.

RAG, retrieval augmented generation, is the standard approach for querying your own documents with an LLM. The pipeline looks like this: take your documents, chunk them into pieces, generate embeddings for each chunk, store those embeddings in a vector database, and when a question comes in, embed the question, find the most similar chunks, stuff them into the LLM's context, and generate an answer.

It works. I've shipped RAG systems in production. But the engineering overhead is substantial.

You need to choose a chunking strategy. Chunk too small and you lose context. Chunk too large and your retrieval gets noisy. You need to choose an embedding model. You need to set up and maintain a vector database. You need to tune retrieval parameters: how many chunks to retrieve, what similarity threshold to use, whether to use hybrid search. You need to handle document updates, re-indexing, and metadata filtering. And after all of that, you still might get retrieval failures where the relevant information exists in your documents but the vector search doesn't surface it.

NotebookLM does all of this for you. You upload documents. You ask questions. Google handles the chunking, the retrieval, the grounding, the citation extraction. The retrieval quality is excellent. The citation system is reliable. And you didn't have to write a single line of infrastructure code.

For personal knowledge bases and small document collections, this is a dramatically simpler solution. I have a notebook with about 40 research papers on voice AI. Building a proper RAG pipeline for those papers would take me half a day of setup, plus ongoing maintenance every time I add a new paper. With NotebookLM, I upload the PDF and start asking questions. Done.

RAG wins in specific scenarios. Production systems that need to serve thousands of users. Custom retrieval logic that goes beyond simple semantic similarity. Integration with existing databases and pipelines. Fine-grained control over what gets retrieved and how it's presented. Latency-sensitive applications where you can't afford browser automation overhead.

But for the use case of "I, a single person, want to ask questions about my documents and get accurate, cited answers" -- NotebookLM is better. It's not close. The retrieval quality is at least as good as most RAG setups, the citation system is better than anything I've built myself, and the total engineering effort is zero.

The skill I built doesn't change this calculus. It just removes the last remaining friction: having to leave the terminal to use it.

How I Actually Use This

Let me walk through real use cases, because theoretical value means nothing if it doesn't translate to daily utility.

Project documentation. I have notebooks for major projects I'm working on. When I'm deep in a coding session and I need to check a design decision or API specification, I query the notebook without leaving my terminal. "What was the agreed-upon schema for the user events table?" comes back with the exact specification, cited from the design doc. No tab switching. No hunting through Confluence or Google Docs.

Research. I read a lot of papers. Keeping track of what each paper said about a specific topic across 30-40 papers is genuinely hard. I have a notebook per research area, and I can ask cross-paper questions. "Which papers discuss sub-200ms latency for voice synthesis, and what approaches do they use?" This returns a synthesized answer with citations to specific papers. It's like having a research assistant who has actually read everything and can cross-reference on demand.

Meeting notes. I upload meeting transcripts into a notebook. "What were the action items from the February sprint planning?" gives me a bulleted list with citations pointing to the exact moments in the transcript. This alone has saved me from the "I thought we agreed to X" conversations that happen when memories differ.

Course materials. When I was TAing, I had notebooks with lecture slides, readings, and assignment specs. Students would ask questions, and I could instantly verify whether the answer was in the course materials before responding. Now I use it for my own learning. Upload the materials for whatever I'm studying, and use conversational querying as a study method. It's faster than flipping through slides and more reliable than my memory.

The common thread across all of these: the documents already exist somewhere. I'm not creating new content for the system. I'm making existing content queryable. And by running it from the terminal, I'm making it queryable without leaving my primary work environment.

Why Playwright Makes This Possible

This skill exists because of the Playwright MCP server, and I think the pattern it represents is worth discussing separately from the specific NotebookLM use case.

Playwright gives Claude Code the ability to control a real web browser. Navigate to pages, click buttons, fill forms, extract content, handle authentication flows. It's the same technology that powers browser testing frameworks, exposed as an MCP server that Claude Code can call like any other tool.

NotebookLM doesn't have an API. Lots of valuable web applications don't have APIs. But they all have web interfaces. Playwright bridges that gap. If a human can use it in a browser, Claude Code can automate it through Playwright.

This pattern, using browser automation to create programmatic access where APIs don't exist, is broadly applicable. I've used variations of it for other services that lack proper APIs. It's not as clean as a direct API call. It's slower, more fragile, and more dependent on UI stability. But it works, and "works with some caveats" beats "impossible because there's no API" every single time.

The key insight is that Playwright MCP turns every website into a potential tool for Claude Code. NotebookLM is one instantiation of that. But the same approach could work for any web application where you need programmatic access and the vendor hasn't provided an API.

The Honest Limitations

I'm not going to pretend this is a perfect solution. There are real trade-offs.

Speed. A query takes 15-30 seconds. A direct API call to a RAG system takes under a second. If you need sub-second latency, browser automation isn't the answer. For my use case, interactive querying during a work session, 20 seconds is fine. I ask the question, keep reading code while it runs, and the answer appears when it's ready.

UI fragility. Google updates NotebookLM's interface periodically. When they change CSS selectors or restructure the page layout, the automation breaks. I've had to update the skill's selectors three times in the past two months. Each fix takes 10-15 minutes once you identify the broken selector, but it's maintenance that a proper API would eliminate entirely.

Authentication quirks. Google's authentication system is sophisticated and occasionally hostile to automation. Session tokens expire. CAPTCHA challenges appear. Two-factor authentication prompts can interrupt the flow. The persistent browser profile handles most of this, but roughly once every few weeks I need to manually re-authenticate. A minor annoyance, not a dealbreaker.

No batch queries. Each question is a separate browser session. If you want to ask ten questions about the same notebook, that's ten browser sessions, each taking 15-30 seconds. A direct integration would let you batch queries. Browser automation doesn't. For single questions during a work session, this doesn't matter. For bulk analysis, it's a limitation.

Dependent on Google. If Google deprecates NotebookLM, changes it fundamentally, or puts it behind a paywall, the skill breaks. Building on someone else's free product always carries this risk. For now, NotebookLM appears to be a strategic product for Google, not an experiment likely to be killed. But this is Google we're talking about, so the risk is nonzero.

These are real limitations. I use the skill daily despite them because the value proposition, source-grounded answers from my documents without leaving the terminal, is strong enough to justify the trade-offs. Your calculus might be different depending on your tolerance for latency and maintenance.

The Friction Argument

Here's the thing I keep coming back to. The best tools meet you where you already are.

If you're a developer, you're in the terminal for hours at a time. Every time you leave the terminal to open a browser, find a page, and interact with a web application, you're paying a context-switch tax. The tax isn't just the seconds it takes. It's the mental overhead of leaving your current task, doing something else, and then re-establishing your focus on the original task.

This skill eliminates that tax for one specific use case: asking your documents questions. It's not trying to replace NotebookLM. The web interface is still better for browsing notebooks, managing sources, and exploring documents interactively. But for the targeted query, the "I need a specific piece of information from my documentation right now" moment, the terminal integration is strictly better.

I think this is the design principle worth generalizing. Don't build replacements for existing tools. Build bridges that let you access those tools from wherever you're already working. This is the same philosophy behind my lifestyle automation skills -- the terminal is where I already am, so that's where the interface should be. NotebookLM is excellent. The browser workflow is fine. But the terminal workflow is better, because the terminal is where I already am.

Zero hallucination, full citations, and you never have to leave your editor. That's the pitch. That's what this skill delivers. And honestly, for anyone who works with documents and lives in the terminal, it's hard to go back to the manual workflow once you've experienced the alternative.