PatentBot

An AI-powered system that ingests engineering/scientific literature, builds domain expertise, and identifies opportunities for novel patents by finding gaps, combinations, and unexplored intersections in the knowledge space.


Vision

Turn a book library into a patent factory.

Most patents come from combining existing ideas in novel ways. PatentBot:

  1. Ingests technical literature (PDFs, papers, textbooks)
  2. Builds structured knowledge graphs per domain
  3. Identifies β€œwhite space” β€” unexplored combinations and gaps
  4. Generates patent-worthy invention disclosures

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     KNOWLEDGE LAYER                         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ Engineering  β”‚    β”‚   Physics/   β”‚    β”‚  Materials   β”‚  β”‚
β”‚  β”‚   Corpus     β”‚    β”‚   Quantum    β”‚    β”‚   Science    β”‚  β”‚
β”‚  β”‚  (1000+ PDFs)β”‚    β”‚   Corpus     β”‚    β”‚   Corpus     β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚         β”‚                   β”‚                   β”‚          β”‚
β”‚         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜          β”‚
β”‚                             β–Ό                              β”‚
β”‚                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                      β”‚
β”‚                    β”‚  RAG Pipeline  β”‚                      β”‚
β”‚                    β”‚  (Embeddings)  β”‚                      β”‚
β”‚                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜                      β”‚
β”‚                             β”‚                              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     ANALYSIS LAYER                          β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  Knowledge   β”‚    β”‚    Gap       β”‚    β”‚   Prior Art  β”‚  β”‚
β”‚  β”‚    Graph     │◄──►│  Detection   │◄──►│    Search    β”‚  β”‚
β”‚  β”‚  Extractor   β”‚    β”‚   Engine     β”‚    β”‚    Agent     β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                             β”‚                              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     OUTPUT LAYER                            β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚              Patent Disclosure Generator              β”‚  β”‚
β”‚  β”‚                                                       β”‚  β”‚
β”‚  β”‚  β€’ Title & Abstract                                   β”‚  β”‚
β”‚  β”‚  β€’ Problem Statement                                  β”‚  β”‚
β”‚  β”‚  β€’ Novel Solution                                     β”‚  β”‚
β”‚  β”‚  β€’ Claims (Independent + Dependent)                   β”‚  β”‚
β”‚  β”‚  β€’ Prior Art References                               β”‚  β”‚
β”‚  β”‚  β€’ Figures/Diagrams (conceptual)                      β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Core Components

1. Corpus Ingestion

  • Input: PDF/EPUB technical books, papers, patents
  • Processing: OCR β†’ chunking β†’ embeddings
  • Storage: Vector DB (local ChromaDB or Qdrant)
  • Metadata: Domain tags, publication date, citation graph

2. Knowledge Graph Extraction

  • Extract entities: materials, processes, properties, applications
  • Map relationships: β€œX enables Y”, β€œA improves B”, β€œC replaces D”
  • Build domain ontology automatically from corpus

3. Gap Detection Engine

  • Combination gaps: A+B exists, B+C exists, but A+C doesn’t
  • Property gaps: Material X has properties P,Q but no one’s tested R
  • Application gaps: Technique T used in domain D1 but not D2
  • Temporal gaps: Old idea + new capability = new invention

4. Prior Art Agent

  • Search USPTO, Google Patents, arXiv
  • Validate novelty before generating disclosure
  • Find closest prior art for differentiation

5. Disclosure Generator

  • LLM-powered patent drafting
  • Structured output: claims, abstract, description
  • Human review workflow

Sub-Projects

🏺 Pharmakon Mining

Cross-axis patent discovery from Ancient Greek pharmacology. Mining 2,500-year-old pharmaceutical knowledge (theriac, PGM, Dioscorides) and mapping to modern science for novel patentable formulations.


Target Domains (Phase 1)

DomainCorpus SizePatent Potential
Engineering1,000+ booksHigh β€” mechanical, electrical, systems
Quantum Tech150+ booksVery High β€” emerging field
Materials ScienceTBDHigh β€” nanotechnology, composites
Biohacking/Biotech~50 booksMedium β€” some regulatory hurdles
Ancient Greek1,135 chunksVery High β€” unexplored pharmakon

Tech Stack

ComponentTechnology
Embeddingssentence-transformers/all-MiniLM-L6-v2 (CUDA)
Vector StoreChromaDB (721K+ chunks)
Knowledge GraphNeo4j or NetworkX
LLMClaude CLI (Max plan - unlimited)
Prior Art SearchUSPTO API, Google Patents API
OrchestrationPython + custom pipeline

Development Phases

Phase 1: Foundation (MVP)

  • Set up RAG pipeline with engineering corpus
  • Basic entity extraction (materials, processes)
  • Simple gap detection: β€œWhat combinations don’t exist?”
  • Manual patent drafting from insights

Phase 2: Intelligence

  • Knowledge graph construction
  • Prior art search integration
  • Automated novelty scoring
  • Disclosure template generation

Phase 3: Automation

  • Full disclosure generator
  • Patent attorney review workflow
  • Filing assistance
  • Portfolio management

Success Metrics

  • Disclosures generated: Target 10+/month after Phase 2
  • Novelty rate: >50% pass prior art check
  • Filing rate: 1-2 provisional patents/quarter
  • Time to disclosure: <4 hours from gap identification

Revenue Model

  1. Own patents β€” File provisional, license or sell
  2. Service β€” Generate disclosures for clients ($500-2000/disclosure)
  3. SaaS β€” Patent discovery platform for R&D teams

Risks & Mitigations

RiskMitigation
Patent qualityHuman expert review before filing
Prior art missMultiple search sources, conservative claims
Corpus gapsContinuous expansion, paper scraping
LLM hallucinationRAG grounding, citation requirements


Next Steps

  1. Build engineering corpus in RAG system
  2. Test entity extraction on 10 sample books
  3. Prototype gap detection algorithm
  4. Generate first 5 invention concepts manually
  5. Validate novelty with prior art search

β€œThe best way to predict the future is to invent it.” β€” Alan Kay