PatentBot
An AI-powered system that ingests engineering/scientific literature, builds domain expertise, and identifies opportunities for novel patents by finding gaps, combinations, and unexplored intersections in the knowledge space.
Vision
Turn a book library into a patent factory.
Most patents come from combining existing ideas in novel ways. PatentBot:
- Ingests technical literature (PDFs, papers, textbooks)
- Builds structured knowledge graphs per domain
- Identifies βwhite spaceβ β unexplored combinations and gaps
- Generates patent-worthy invention disclosures
Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β KNOWLEDGE LAYER β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β Engineering β β Physics/ β β Materials β β
β β Corpus β β Quantum β β Science β β
β β (1000+ PDFs)β β Corpus β β Corpus β β
β ββββββββ¬ββββββββ ββββββββ¬ββββββββ ββββββββ¬ββββββββ β
β β β β β
β βββββββββββββββββββββΌββββββββββββββββββββ β
β βΌ β
β ββββββββββββββββββ β
β β RAG Pipeline β β
β β (Embeddings) β β
β ββββββββββ¬ββββββββ β
β β β
βββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ANALYSIS LAYER β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β Knowledge β β Gap β β Prior Art β β
β β Graph βββββΊβ Detection βββββΊβ Search β β
β β Extractor β β Engine β β Agent β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β β
βββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β OUTPUT LAYER β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Patent Disclosure Generator β β
β β β β
β β β’ Title & Abstract β β
β β β’ Problem Statement β β
β β β’ Novel Solution β β
β β β’ Claims (Independent + Dependent) β β
β β β’ Prior Art References β β
β β β’ Figures/Diagrams (conceptual) β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Core Components
1. Corpus Ingestion
- Input: PDF/EPUB technical books, papers, patents
- Processing: OCR β chunking β embeddings
- Storage: Vector DB (local ChromaDB or Qdrant)
- Metadata: Domain tags, publication date, citation graph
2. Knowledge Graph Extraction
- Extract entities: materials, processes, properties, applications
- Map relationships: βX enables Yβ, βA improves Bβ, βC replaces Dβ
- Build domain ontology automatically from corpus
3. Gap Detection Engine
- Combination gaps: A+B exists, B+C exists, but A+C doesnβt
- Property gaps: Material X has properties P,Q but no oneβs tested R
- Application gaps: Technique T used in domain D1 but not D2
- Temporal gaps: Old idea + new capability = new invention
4. Prior Art Agent
- Search USPTO, Google Patents, arXiv
- Validate novelty before generating disclosure
- Find closest prior art for differentiation
5. Disclosure Generator
- LLM-powered patent drafting
- Structured output: claims, abstract, description
- Human review workflow
Sub-Projects
πΊ Pharmakon Mining
Cross-axis patent discovery from Ancient Greek pharmacology. Mining 2,500-year-old pharmaceutical knowledge (theriac, PGM, Dioscorides) and mapping to modern science for novel patentable formulations.
Target Domains (Phase 1)
| Domain | Corpus Size | Patent Potential |
|---|---|---|
| Engineering | 1,000+ books | High β mechanical, electrical, systems |
| Quantum Tech | 150+ books | Very High β emerging field |
| Materials Science | TBD | High β nanotechnology, composites |
| Biohacking/Biotech | ~50 books | Medium β some regulatory hurdles |
| Ancient Greek | 1,135 chunks | Very High β unexplored pharmakon |
Tech Stack
| Component | Technology |
|---|---|
| Embeddings | sentence-transformers/all-MiniLM-L6-v2 (CUDA) |
| Vector Store | ChromaDB (721K+ chunks) |
| Knowledge Graph | Neo4j or NetworkX |
| LLM | Claude CLI (Max plan - unlimited) |
| Prior Art Search | USPTO API, Google Patents API |
| Orchestration | Python + custom pipeline |
Development Phases
Phase 1: Foundation (MVP)
- Set up RAG pipeline with engineering corpus
- Basic entity extraction (materials, processes)
- Simple gap detection: βWhat combinations donβt exist?β
- Manual patent drafting from insights
Phase 2: Intelligence
- Knowledge graph construction
- Prior art search integration
- Automated novelty scoring
- Disclosure template generation
Phase 3: Automation
- Full disclosure generator
- Patent attorney review workflow
- Filing assistance
- Portfolio management
Success Metrics
- Disclosures generated: Target 10+/month after Phase 2
- Novelty rate: >50% pass prior art check
- Filing rate: 1-2 provisional patents/quarter
- Time to disclosure: <4 hours from gap identification
Revenue Model
- Own patents β File provisional, license or sell
- Service β Generate disclosures for clients ($500-2000/disclosure)
- SaaS β Patent discovery platform for R&D teams
Risks & Mitigations
| Risk | Mitigation |
|---|---|
| Patent quality | Human expert review before filing |
| Prior art miss | Multiple search sources, conservative claims |
| Corpus gaps | Continuous expansion, paper scraping |
| LLM hallucination | RAG grounding, citation requirements |
Related Projects
- Knowledge RAG System β Foundation
- Book Library β Source corpus
- NTS β Potential service offering
Next Steps
- Build engineering corpus in RAG system
- Test entity extraction on 10 sample books
- Prototype gap detection algorithm
- Generate first 5 invention concepts manually
- Validate novelty with prior art search
βThe best way to predict the future is to invent it.β β Alan Kay