Open Source AI Document Intelligence MCP Paperless-ngx OCR RAG Ollama TypeScript Knowledge Management

PaperCortex: Adding a Brain to Your Document Archive

Paperless-ngx is great at storing documents. It's terrible at understanding them. PaperCortex fixes that.

Rene Fichtmueller / 2026-04-05 / ~1 min read

I have a Paperless-ngx instance with thousands of documents. Invoices, contracts, receipts, technical specs, tax records. Paperless does OCR, stores them, lets me tag them. It's excellent at what it does.

But it doesn't understand anything. Search is keyword-based. If I search for "hotel expenses" and the receipt says "Marriott Bonn — Accommodation", Paperless won't find it. There's no semantic understanding. No automatic classification. No financial data extraction.

I built PaperCortex because I was spending hours every month manually tagging documents and extracting numbers from receipts for expense reports.

// what it does

// papercortex capabilities

semantic search	find by meaning, not keywords
auto-classification	type, category, correspondent, dates
receipt extraction	vendor, amounts, tax, line items
bank statement matching	fuzzy match receipts to transactions
DATEV export	German tax standard format
natural language queries	"How much on travel in Q1?"
MCP server	5 tools for Claude Code integration

// semantic search changes everything

Search for "accommodation costs Germany" and PaperCortex will find your Marriott receipt from Bonn, the Airbnb invoice from Munich, and the hotel booking confirmation from Berlin. Even if none of them contain the word "accommodation". Because it understands meaning, not just strings.

This runs on local embeddings via Ollama (nomic-embed-text). Vectors stored in SQLite with HNSW indexing. No cloud. No API costs. Your documents never leave your machine.

// receipt intelligence

This is the feature that saves me the most time. Drop a receipt — scanned, photographed, PDF, doesn't matter — and PaperCortex extracts: vendor name, date, total amount, tax rate, individual line items. Multi-page receipts. Multi-currency. It handles the German "Bewirtungsbeleg" format that makes accountants cry.

The extracted data feeds into bank statement matching with fuzzy logic and confidence scoring. It finds unmatched transactions automatically. If you're German and your tax advisor wants DATEV format — one click, SKR03/SKR04 mapping included.

// mcp server for claude code

PaperCortex exposes five tools via the Model Context Protocol:

search — semantic document search
classify — auto-classify a document
extract — pull structured data from a document
query — natural language questions about your archive
export — generate DATEV or CSV exports

This means I can ask Claude: "How much did we spend on office supplies in March?" and it queries my actual document archive to give me a real answer with source references.

100% local. No cloud. No subscriptions. GitHub — MIT licensed.