Knowledge RAG System Setup

Date: 2026-02-01 Status: ✅ Operational LLM Provider: Claude CLI (Max plan - unlimited)

Architecture

Paperless-ngx (OCR/Index) → Document Export → ChromaDB (GPU Embeddings) → Query API
                                                                            ↓
                                                                      Claude CLI
                                                                   (Max plan LLM)

Pipeline Changes (2026-02-01)

Removed: Direct Anthropic API (credits-based) Added: Claude CLI (Max plan - unlimited) as LLM provider

Benefits:

  • Unlimited usage via Max subscription
  • Consistent quality across all domains
  • No API key/credits management
  • Unified CLI for all knowledge systems (PatentBot, RAG, domains)
  • GPU dedicated to embeddings only

Components

1. Paperless-ngx (Document Management)

  • Location: Docker container paperless
  • Web UI: http://localhost:8000
  • Consume folder: /home/shdwdev/docker-services/paperless/consume/
  • API Token: 291bc34a562298e93b720ae08e0e02988b0dfe00

Docker compose: /home/shdwdev/docker-services/paperless/docker-compose.yml

Key features:

  • OCR processing with Tesseract
  • Full-text search
  • Tag-based organization
  • REST API for document retrieval

2. Knowledge RAG System

  • Location: /home/shdwdev/projects/knowledge-rag/
  • Python venv: ./venv/ (5.2GB with PyTorch CUDA)
  • ChromaDB: ./chroma_db/ (4.5GB+ with 721K+ chunks)

Environment (.env):

PAPERLESS_URL=http://localhost:8000
PAPERLESS_TOKEN=291bc34a562298e93b720ae08e0e02988b0dfe00
CHROMA_PATH=./chroma_db

3. Claude CLI (LLM Provider)

  • Provider: Claude CLI (Max plan subscription)
  • Location: ~/.local/bin/claude
  • Use Cases: RAG queries, entity extraction, gap detection, reasoning
  • Note: No API key needed - uses Max subscription auth

4. GPU Acceleration (Embeddings Only)

  • GPU: NVIDIA GeForce RTX 2060 SUPER (8GB VRAM)
  • CUDA: 13.0
  • Driver: 580.119.02
  • PyTorch: 2.5.1+cu121
  • Embedding Model: sentence-transformers/all-MiniLM-L6-v2 (CUDA)

Usage

Sync Documents from Paperless

cd /home/shdwdev/projects/knowledge-rag
source venv/bin/activate
export $(cat .env | xargs)
python3 knowledge_rag.py sync

List Indexed Documents

python3 knowledge_rag.py list

Search Knowledge Base

python3 knowledge_rag.py search "OSINT social media techniques"

RAG Query

python3 knowledge_rag.py query "What are the best OSINT techniques for social media?"

Python API

from knowledge_rag import KnowledgeBase
 
kb = KnowledgeBase()
kb.sync_from_paperless()  # Sync new documents
 
# Search only (no LLM)
results = kb.search("Greek mystery rites", n_results=5)
 
# Full RAG with Claude
answer = kb.query("What herbs were used in ancient Greek rituals?")
print(answer["answer"])
print(answer["sources"])

Current Corpus (as of 2026-02-01)

Total chunks: 721,275+

CorpusChunks
science554,335
engineering87,746
tech43,067
math26,311
knowledge_base7,127
greek1,135
esoteric846
quantum_biology555
biohacking153

Performance

OperationGPU (RTX 2060)
Embed 1000 docs~2 minutes
Search query~200ms
RAG query (Claude CLI)~2-5 seconds

Maintenance

Re-sync after new Paperless documents

python3 knowledge_rag.py sync

(Only indexes new/modified documents)

Clear and rebuild index

rm -rf ./chroma_db
python3 knowledge_rag.py sync

Troubleshooting

GPU not detected

python3 -c "import torch; print(torch.cuda.is_available())"
# Should print: True

Claude CLI not found

which claude  # Should show ~/.local/bin/claude
claude --version  # Verify working

If missing, install via: pip install claude-code

Paperless API 403 Forbidden

Regenerate token:

docker exec paperless python3 manage.py shell -c "
from django.contrib.auth.models import User
from rest_framework.authtoken.models import Token
user = User.objects.filter(is_superuser=True).first()
Token.objects.filter(user=user).delete()
token = Token.objects.create(user=user)
print(token.key)"