Intro
I keep most of my personal notes in Notion. Project ideas, reading notes, snippets, half-finished thoughts. The problem with growing notes is that finding the right one becomes harder than writing a new one. Notion’s built-in search is fine for exact words, but it doesn’t help when I half-remember an idea and can’t recall the exact phrasing.
I had been wanting to play with RAG for a while, so this seemed like a good excuse. The idea was simple: ask Claude things like “what did I write about X?” or “summarise my notes on project Y” and get answers grounded in the actual content of my pages, not the model’s guesses.
A few constraints I set for myself:
- Everything local. Notes shouldn’t leave my machine during indexing. No third-party embedding API.
- No custom UI. Claude Code already gives me a great chat interface. Building another one made no sense.
- MCP-first. Expose the search as an MCP tool, let Claude Code do the talking.
How it works
The pipeline is small and boring, which is exactly what I wanted.
Claude Code ──stdio MCP──▶ notion-rag (Python)
│
┌───────────────────┼───────────────────┐
▼ ▼ ▼
Ollama Qdrant Notion API
(nomic-embed) (vector store) (live pages)
Indexing (scripts/ingest.py) does this:
- For each root page ID, fetch the page and all its blocks via the Notion API.
- Recursively descend into
child_pageblocks — every subpage becomes its own document. - Skip
child_databaseblocks (I deliberately don’t index databases, only pages). - Convert blocks to markdown, chunk with
RecursiveCharacterTextSplitter(1000 chars, 200 overlap). - Embed each chunk locally with
nomic-embed-textthrough Ollama (768-dim). - Upsert into a Qdrant collection with metadata:
page_id,page_title,page_url,last_edited_time.
The reindex is idempotent. Before inserting new chunks for a page, all old chunks for that page_id
are deleted. So I can re-run the script whenever I want without ending up with duplicates.
On the query side, the MCP server exposes two tools:
search_notion(query, k)— embeds the query with the same model, runsqdrant.query_points, returns the top-k chunks as markdown with links back to Notion.get_full_page(page_id)— bypasses Qdrant and fetches the page fresh from the Notion API. Claude calls it when a chunk is too short to answer the question.
That’s it. No reranking, no hybrid search, no auto-discovery, no cron job. Just enough to be useful.
A few lessons
Local embeddings are fine for personal use. I was worried nomic-embed-text would be too weak,
but for my volume of notes the recall is good enough that I don’t notice. If I needed better Polish
support I’d swap it for bge-m3, but I haven’t felt the need yet.
Idempotent reindex beats incremental. I briefly considered tracking last_edited_time to only
reindex changed pages, but it’s premature. A full reindex of all my notes takes under a minute and I
run it manually when I remember to. YAGNI until it isn’t.
Skipping databases was the right call. My actual notes live in regular pages with subpages. Notion databases in my workspace are mostly trackers and tables — structured data that doesn’t make sense to chunk and embed. Trying to index everything would have added complexity for no gain.
MCP over a custom UI. This is the part I’m happiest with. The whole “frontend” is one JSON entry
in ~/.claude.json pointing at the Python module, and Claude Code handles the rest. No auth, no web
server, no React app I’d have to maintain. The boundary is small and the value is immediate.
Claude knows when to drill down. Watching Claude decide on its own to call get_full_page after
search_notion returned a short chunk was satisfying. The two-tool design fell out naturally — I
didn’t have to engineer prompts for it.
What I deliberately left out
This isn’t a finished product, it’s a tool I use. The things I considered and skipped:
- Incremental reindex by
last_edited_time - A reranker layer (
bge-reranker-v2-m3would be the obvious pick) - Hybrid search (BM25 + dense, Qdrant supports it)
- Metadata filters in the MCP tool (
search_recent,search_in_section) - Auto-discovery of pages instead of a hand-maintained list
- A cache for
get_full_page - Other sources — PDFs, markdown from disk, Git repos
All of these would be reasonable, none of them solve a problem I currently have.
Is this better than Notion’s official MCP?
Honestly, for most people, no. Notion has its own MCP connector that does semantic search over your workspace, and it’s a faster path to “ask Claude about my notes”. I kept this project because:
- I have full control over the pipeline if I want to add reranking or hybrid search later.
- Queries don’t leave the machine.
- I can extend it to non-Notion sources with the same code.
- I learned a lot building it.
The last point is probably the real reason. Reading about RAG is one thing, wiring up Ollama + Qdrant + the Notion API + MCP yourself makes the trade-offs concrete in a way blog posts can’t.
Closing
The repo lives at ~/CodeWork/ai/personal-notion-rag on my machine. It’s about 400 lines of Python
across five files, and it does exactly one thing. That’s the kind of project I enjoy most — small enough
to hold in my head, useful enough to keep using.
If I extend it next, it’ll probably be to add markdown files from disk as a second source. But only when I actually need it.