Tech Corpus Pipeline

Strategy for processing tech library assets into specialized RAG corpora.

📁 Source Assets

Category	Files	Size	Location
Tech (root)	757	15G	`/mnt/storage/books_organized/Tech/`
Tech/AI-ML	~50	~2G	Dedicated AI/ML subfolder
Tech/Security	~30	~1G	Dedicated security subfolder
Tech/Programming	~100	~3G	Languages, frameworks

🎯 Corpus Strategy

1. `osint_corpus` Enhancement

Source: OSINT-specific books (already indexed)
Add: Hacking/OSINT crossover from Tech root
Books to add:
- Practical Approach to Open Source Intelligence Vol 1 & 2
- Grey Area: Dark Web Data Collection and OSINT
- Hacking Web Intelligence
- Digital Forensics for Enterprises Beyond Kali Linux

2. `ai_corpus` (NEW) — Dreadbot Self-Improvement

Purpose: Self-replicating upgrade reference for Dreadbot
Source: /mnt/storage/books_organized/Tech/AI-ML/ + AI books from root
Key texts:
- Building AI Agents with LLMs, RAG, and Knowledge Graphs
- Building Agentic AI Systems
- Agentic AI: Theories and Practices
- Building LLM Agents with RAG, Knowledge Graphs and Reflection
- LLMOps: Managing Large Language Models in Production
- Generative AI with LangChain
- Knowledge Graphs and LLMs in Action
- Large Language Models: The Hard Parts

3. `security_corpus` — NTS Security Library

Purpose: NTS security consulting reference
Source: /mnt/storage/books_organized/Tech/Security/ + security books from root
Key texts:
- CompTIA Security+ guides
- CISM/CISA study guides
- Metasploit: The Penetration Tester’s Guide
- Santos: Redefining Hacking (comprehensive)
- Infrastructure Attack Strategies for Ethical Hacking
- Pentesting Active Directory
- Vulnerability Assessment and Penetration Testing (VAPT)
- Offensive Security Using Python
- Cryptography Algorithms

4. `tech_corpus` — General Tech Reference

Purpose: NTS infrastructure knowledge base
Source: Remaining Tech books
Focus areas:
- Cloud (AWS, Azure, GCP)
- DevOps/Kubernetes
- Programming best practices
- Networking

📋 Ingest Commands

cd ~/projects/knowledge-rag
source venv/bin/activate
 
# AI Corpus (Dreadbot upgrades)
python patentbot.py ingest \
  --corpus ai_corpus \
  --source /mnt/storage/books_organized/Tech/AI-ML/ \
  --filter "AI|LLM|Agent|Machine Learning|Deep Learning|Neural"
 
# Security Corpus (NTS)
python patentbot.py ingest \
  --corpus security_corpus \
  --source /mnt/storage/books_organized/Tech/Security/ \
  --filter "Security|Hacking|Penetration|CISM|CompTIA|Forensic"
 
# Tech General (NTS infrastructure)
python patentbot.py ingest \
  --corpus tech_corpus \
  --source /mnt/storage/books_organized/Tech/ \
  --exclude "AI-ML|Security"

🔗 NTS Integration

Corpus	NTS Service	Use Case
`security_corpus`	Security Consulting	Pentest methodology, compliance
`ai_corpus`	AI Consulting	Implementation guidance
`tech_corpus`	Infrastructure	Cloud/DevOps best practices
`osint_corpus`	OSINT Services	Investigation techniques

🤖 Dreadbot Self-Improvement Loop

ai_corpus → Query patterns → Identify gaps → 
  → Read source material → Update AGENTS.md/TOOLS.md →
  → Implement improvements → Log to memory/

Self-reference queries:

“How to improve RAG retrieval accuracy”
“Best practices for agentic AI systems”
“LLM prompt engineering techniques”
“Knowledge graph integration patterns”

📊 Priority Queue

✅ physicists_corpus — Currently indexing
🔜 ai_corpus — Dreadbot upgrades (CRITICAL)
🔜 security_corpus — NTS launch prep
📋 tech_corpus — General reference
📋 osint_corpus expansion

Build the knowledge. Upgrade the bot. Secure the clients.

Quartz 4

Explorer

🔧 Tech Corpus Pipeline

Tech Corpus Pipeline

📁 Source Assets

🎯 Corpus Strategy

1. `osint_corpus` Enhancement

2. `ai_corpus` (NEW) — Dreadbot Self-Improvement

3. `security_corpus` — NTS Security Library

4. `tech_corpus` — General Tech Reference

📋 Ingest Commands

🔗 NTS Integration

🤖 Dreadbot Self-Improvement Loop

📊 Priority Queue

Graph View

Table of Contents

Quartz 4

Explorer

🔧 Tech Corpus Pipeline

Tech Corpus Pipeline

📁 Source Assets

🎯 Corpus Strategy

1. osint_corpus Enhancement

2. ai_corpus (NEW) — Dreadbot Self-Improvement

3. security_corpus — NTS Security Library

4. tech_corpus — General Tech Reference

📋 Ingest Commands

🔗 NTS Integration

🤖 Dreadbot Self-Improvement Loop

📊 Priority Queue

🔗 Related

Graph View

Table of Contents

1. `osint_corpus` Enhancement

2. `ai_corpus` (NEW) — Dreadbot Self-Improvement

3. `security_corpus` — NTS Security Library

4. `tech_corpus` — General Tech Reference