Knowledge RAG System Setup
Date: 2026-02-01 Status: ✅ Operational LLM Provider: Claude CLI (Max plan - unlimited)
Architecture
Paperless-ngx (OCR/Index) → Document Export → ChromaDB (GPU Embeddings) → Query API
↓
Claude CLI
(Max plan LLM)
Pipeline Changes (2026-02-01)
Removed: Direct Anthropic API (credits-based) Added: Claude CLI (Max plan - unlimited) as LLM provider
Benefits:
- Unlimited usage via Max subscription
- Consistent quality across all domains
- No API key/credits management
- Unified CLI for all knowledge systems (PatentBot, RAG, domains)
- GPU dedicated to embeddings only
Components
1. Paperless-ngx (Document Management)
- Location: Docker container
paperless - Web UI: http://localhost:8000
- Consume folder:
/home/shdwdev/docker-services/paperless/consume/ - API Token:
291bc34a562298e93b720ae08e0e02988b0dfe00
Docker compose: /home/shdwdev/docker-services/paperless/docker-compose.yml
Key features:
- OCR processing with Tesseract
- Full-text search
- Tag-based organization
- REST API for document retrieval
2. Knowledge RAG System
- Location:
/home/shdwdev/projects/knowledge-rag/ - Python venv:
./venv/(5.2GB with PyTorch CUDA) - ChromaDB:
./chroma_db/(4.5GB+ with 721K+ chunks)
Environment (.env):
PAPERLESS_URL=http://localhost:8000
PAPERLESS_TOKEN=291bc34a562298e93b720ae08e0e02988b0dfe00
CHROMA_PATH=./chroma_db
3. Claude CLI (LLM Provider)
- Provider: Claude CLI (Max plan subscription)
- Location:
~/.local/bin/claude - Use Cases: RAG queries, entity extraction, gap detection, reasoning
- Note: No API key needed - uses Max subscription auth
4. GPU Acceleration (Embeddings Only)
- GPU: NVIDIA GeForce RTX 2060 SUPER (8GB VRAM)
- CUDA: 13.0
- Driver: 580.119.02
- PyTorch: 2.5.1+cu121
- Embedding Model: sentence-transformers/all-MiniLM-L6-v2 (CUDA)
Usage
Sync Documents from Paperless
cd /home/shdwdev/projects/knowledge-rag
source venv/bin/activate
export $(cat .env | xargs)
python3 knowledge_rag.py syncList Indexed Documents
python3 knowledge_rag.py listSearch Knowledge Base
python3 knowledge_rag.py search "OSINT social media techniques"RAG Query
python3 knowledge_rag.py query "What are the best OSINT techniques for social media?"Python API
from knowledge_rag import KnowledgeBase
kb = KnowledgeBase()
kb.sync_from_paperless() # Sync new documents
# Search only (no LLM)
results = kb.search("Greek mystery rites", n_results=5)
# Full RAG with Claude
answer = kb.query("What herbs were used in ancient Greek rituals?")
print(answer["answer"])
print(answer["sources"])Current Corpus (as of 2026-02-01)
Total chunks: 721,275+
| Corpus | Chunks |
|---|---|
| science | 554,335 |
| engineering | 87,746 |
| tech | 43,067 |
| math | 26,311 |
| knowledge_base | 7,127 |
| greek | 1,135 |
| esoteric | 846 |
| quantum_biology | 555 |
| biohacking | 153 |
Performance
| Operation | GPU (RTX 2060) |
|---|---|
| Embed 1000 docs | ~2 minutes |
| Search query | ~200ms |
| RAG query (Claude CLI) | ~2-5 seconds |
Maintenance
Re-sync after new Paperless documents
python3 knowledge_rag.py sync(Only indexes new/modified documents)
Clear and rebuild index
rm -rf ./chroma_db
python3 knowledge_rag.py syncTroubleshooting
GPU not detected
python3 -c "import torch; print(torch.cuda.is_available())"
# Should print: TrueClaude CLI not found
which claude # Should show ~/.local/bin/claude
claude --version # Verify workingIf missing, install via: pip install claude-code
Paperless API 403 Forbidden
Regenerate token:
docker exec paperless python3 manage.py shell -c "
from django.contrib.auth.models import User
from rest_framework.authtoken.models import Token
user = User.objects.filter(is_superuser=True).first()
Token.objects.filter(user=user).delete()
token = Token.objects.create(user=user)
print(token.key)"