Medical Corpus Staging

Staging medical texts for the cross-domain lexicon and Pharmakon Miner pipeline. Critical nexus for ancient→modern pharmacological mapping.


📊 Corpus Overview

SourceFilesEst. ChunksPriority
/Medicine/171~200k🔴 Critical
/Botany/43~50k🔴 Critical
/Biohacking/7~10k🟡 High
/History/Ancient/ (pharma)10+~15k🔴 Critical
/Science/Biology-*TBD~50k🟡 High

Total estimated: ~325k chunks → medical_corpus


🗂️ Source Breakdown

Medicine (171 files)

Location: /mnt/storage/books_organized/Medicine/

Pharmacology & Drug Science

TextFocus
Comprehensive Toxicology 4edToxicology reference
Lehne’s Pharmacology for Nursing Care 12edDrug mechanisms
Drug Metabolism and PharmacokineticsADME processes
Cobert’s Manual of Drug SafetyPharmacovigilance
Neuroimmune Pharmacology 3edNeuro-drug interactions

Biochemistry & Physiology

TextFocus
Vander’s Human Physiology 16edSystems physiology
Paul’s Fundamental Immunology 8edImmune mechanisms
Textbook of Biochemistry (Lal)Biochem foundations
Ross & Wilson Anatomy & Physiology 14edStructure-function
Gray’s Anatomy for Students 5edAnatomical reference

Clinical Medicine

TextFocus
Oxford Handbook of Clinical Medicine 11edClinical reference
CURRENT Medical Diagnosis & TreatmentDiagnostic guide
Harrison’s (if present)Internal medicine
Miller’s Anesthesia 10ed (2-vol)Anesthesiology

Specialized

TextFocus
Photodynamic Therapy in DermatologyLight therapy
Dark Matters - Circadian RhythmsChronobiology
Regenerative Medicine handbooksStem cells, tissue eng
Brain-Gut ConnectionGut-brain axis

Botany (43 files)

Location: /mnt/storage/books_organized/Botany/

Herbal Medicine

TextFocus
Alchemy of Herbal Medicine Vol 1-2Traditional herbalism
Native American Herbal Remedies EncyclopediaIndigenous medicine
Little Encyclopedia of Herbal MedicineQuick reference
Herbal Medicines Survival GuidePractical applications
Encyclopedia of Rare Drug PlantsUncommon botanicals

Ethnobotany & Psychedelics

TextFocus
Psilocybin Mushroom guides (10+)Cultivation, identification
Hallucinogenic Plants Field GuidePsychoactive botany
DMT Entities Illustrated GuidePhenomenology
Ayurveda (DK)Indian traditional medicine

Plant Science

TextFocus
Chemistry of Plant-derived Natural ProductsPhytochemistry
Encyclopedia of Cultivated PlantsBotanical reference
Plant Genetic Resources textbookGenetics

Biohacking (7 files)

Location: /mnt/storage/books_organized/Biohacking/

  • Biohacker’s Handbook - Sleep
  • Grindhouse Wetware
  • Kill Zombie Cells
  • Biohacking tech/kits guides

History/Ancient - Pharma-relevant

Location: /mnt/storage/books_organized/History/Ancient/

TextFocus
The Chemical Muse (Hillman)Drugs in antiquity
Mushrooms, Myth & Mithras (Ruck)Entheogens in religion
Greek Magical PapyriRitual pharmacology
HermeticaHermetic medicine

🔗 Pharmakon Bridge Points

Greek→Modern Mapping Targets

Ancient ConceptModern MappingCorpus Source
φάρμακον (drug/poison)Pharmacology, toxicologyMedicine, Botany
θηριακή (theriac)Polypharmacy, mithridatismHistory/Ancient
κυκεών (kykeon)Ergot alkaloids, entheogensBotany (psilocybin)
βοτάνη (herb)Phytochemistry, botanicalsBotany
χυμός (humor)Biochemistry, homeostasisMedicine
θεραπεία (therapy)Clinical medicineMedicine

Cross-Domain Connections

greek_corpus ──────┬──────► medical_corpus
                   │
                   ├──────► science_corpus (mechanisms)
                   │
                   └──────► physicists_corpus (quantum)

Key bridges:

  • Ancient plant remedies → Modern phytochemistry
  • Humoral theory → Biochemical homeostasis
  • Ritual medicine → Psychopharmacology
  • Theriac compounds → Polypharmacy principles

🛠️ Pipeline Integration

Ingest Command

cd ~/projects/knowledge-rag
source venv/bin/activate
 
# Create medical corpus
python patentbot.py ingest \
  --corpus medical_corpus \
  --source /mnt/storage/books_organized/Medicine/ \
  --source /mnt/storage/books_organized/Botany/ \
  --source /mnt/storage/books_organized/Biohacking/

Entity Extraction Focus

MEDICAL_ENTITY_TYPES = [
    "drugs",           # Pharmaceuticals, compounds
    "mechanisms",      # Pathways, receptors, enzymes
    "conditions",      # Diseases, syndromes
    "plants",          # Botanical sources
    "compounds",       # Chemical structures
    "therapies",       # Treatment modalities
    "anatomy",         # Body systems, organs
]

Pharmakon Enrichment

def enrich_medical_chunk(chunk, greek_lexicon):
    """Add Greek term mappings to medical chunks"""
    
    # Find Greek equivalents
    for term in extract_medical_terms(chunk):
        if greek_match := greek_lexicon.get(term):
            chunk.metadata["greek_mapping"] = {
                "term": greek_match["greek"],
                "transliteration": greek_match["translit"],
                "ancient_usage": greek_match["context"]
            }
    
    return chunk

📋 Staging Checklist

Phase 1: Inventory

  • Identify Medicine folder (171 files)
  • Identify Botany folder (43 files)
  • Identify Biohacking folder (7 files)
  • Map pharma-relevant History/Ancient texts
  • Check Science/Biology-* folders

Phase 2: Organization

  • Create symlink or staging folder
  • Remove duplicates
  • Verify PDF text extraction works
  • Test sample ingest

Phase 3: Ingest

  • Run full ingest to medical_corpus
  • Entity extraction pass
  • Domain classification
  • Quality check

Phase 4: Integration

  • Connect to cross-domain lexicon
  • Build Greek→Medical mappings
  • Test Pharmakon Miner queries
  • Validate cross-corpus search

📊 Target Metrics

MetricTarget
Total chunks~325k
Unique drug entities5,000+
Plant compounds mapped1,000+
Greek→Modern bridges500+
Mechanism coverage2,000+ pathways


Medical corpus: the modern anchor for ancient pharmacological knowledge.