Cross-Domain Lexicon System

Custom lexicon built from multi-corpus entity extraction. Recursive refinement captures expert-level domain intelligence through iterative enrichment.


๐ŸŽฏ Purpose

Traditional lexicons are static. Ours grows through:

  1. Multi-corpus extraction โ€” Terms emerge from actual usage
  2. Cross-domain mapping โ€” Same term, different meanings
  3. Recursive refinement โ€” Each pass adds context
  4. Expert system capture โ€” Implicit knowledge made explicit

๐Ÿ—๏ธ Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚              CROSS-DOMAIN LEXICON PIPELINE                      โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                 โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”       โ”‚
โ”‚  โ”‚ greek_   โ”‚  โ”‚ science_ โ”‚  โ”‚physicistsโ”‚  โ”‚ osint_   โ”‚       โ”‚
โ”‚  โ”‚ corpus   โ”‚  โ”‚ corpus   โ”‚  โ”‚ _corpus  โ”‚  โ”‚ corpus   โ”‚       โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜       โ”‚
โ”‚       โ”‚             โ”‚             โ”‚             โ”‚              โ”‚
โ”‚       โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜              โ”‚
โ”‚                            โ–ผ                                    โ”‚
โ”‚              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                       โ”‚
โ”‚              โ”‚   ENTITY EXTRACTION     โ”‚                       โ”‚
โ”‚              โ”‚   (Pass 1: Raw terms)   โ”‚                       โ”‚
โ”‚              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                       โ”‚
โ”‚                          โ–ผ                                      โ”‚
โ”‚              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                       โ”‚
โ”‚              โ”‚   CROSS-REFERENCE       โ”‚                       โ”‚
โ”‚              โ”‚   (Find term overlaps)  โ”‚                       โ”‚
โ”‚              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                       โ”‚
โ”‚                          โ–ผ                                      โ”‚
โ”‚              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                       โ”‚
โ”‚              โ”‚   RECURSIVE REFINEMENT  โ”‚โ—„โ”€โ”€โ”€โ”€โ”€โ”€โ”               โ”‚
โ”‚              โ”‚   (Pass N: Add context) โ”‚       โ”‚               โ”‚
โ”‚              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜       โ”‚               โ”‚
โ”‚                          โ”‚                     โ”‚               โ”‚
โ”‚                          โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜               โ”‚
โ”‚                          โ–ผ          (iterate)                  โ”‚
โ”‚              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                       โ”‚
โ”‚              โ”‚   MASTER LEXICON        โ”‚                       โ”‚
โ”‚              โ”‚   (ChromaDB + JSON)     โ”‚                       โ”‚
โ”‚              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                       โ”‚
โ”‚                                                                 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ“Š Lexicon Schema

Term Entry Structure

{
  "term": "mitochondria",
  "canonical": "mitochondrion",
  "domains": {
    "quantum_biology": {
      "frequency": 4550,
      "context": "quantum coherence, electron transport, biophoton emission",
      "related_terms": ["ATP", "ETC", "CCO", "heteroplasmy"],
      "sample_chunks": ["chunk_id_1", "chunk_id_2"]
    },
    "greek_medical": {
      "frequency": 0,
      "greek_equivalent": null,
      "note": "Ancient Greeks had no microscopy; humoral model instead"
    },
    "physicists": {
      "frequency": 45,
      "context": "cellular energy, thermodynamics",
      "related_terms": ["entropy", "dissipative structures"]
    }
  },
  "cross_domain_notes": "Bridge between ancient humoral theory (ฯ‡ฯ…ฮผฯŒฯ‚) and modern bioenergetics",
  "refinement_passes": 3,
  "confidence": 0.92,
  "last_updated": "2026-02-02"
}

Cross-Reference Entry

{
  "mapping_id": "pharmakon_mitochondria_001",
  "greek_term": "ฯ†ฮฌฯฮผฮฑฮบฮฟฮฝ",
  "modern_terms": ["drug", "medicine", "pharmaceutical", "toxin"],
  "mechanism_bridge": {
    "ancient_concept": "substance that heals or harms",
    "modern_mechanism": "mitochondrial modulation, receptor binding",
    "quantum_angle": "electron transport chain interference"
  },
  "corpus_evidence": {
    "greek_corpus": ["chunk_123", "chunk_456"],
    "science_corpus": ["chunk_789"],
    "physicists_corpus": ["chunk_012"]
  },
  "confidence": 0.78,
  "refinement_history": [
    {"pass": 1, "date": "2026-02-01", "added": "basic mapping"},
    {"pass": 2, "date": "2026-02-02", "added": "mechanism bridge"},
    {"pass": 3, "date": "2026-02-02", "added": "quantum angle"}
  ]
}

๐Ÿ”„ Recursive Refinement Algorithm

Pass 1: Raw Extraction

def pass_1_extract(corpus: str) -> dict:
    """Extract raw entities from corpus"""
    entities = {}
    for chunk in get_all_chunks(corpus):
        extracted = llm_extract_entities(chunk.text)
        for entity in extracted:
            if entity not in entities:
                entities[entity] = {
                    "frequency": 0,
                    "chunks": [],
                    "contexts": []
                }
            entities[entity]["frequency"] += 1
            entities[entity]["chunks"].append(chunk.id)
            entities[entity]["contexts"].append(chunk.text[:200])
    return entities

Pass 2: Cross-Domain Mapping

def pass_2_crossmap(all_entities: dict) -> dict:
    """Find same/similar terms across domains"""
    mappings = {}
    
    for corpus_a, entities_a in all_entities.items():
        for corpus_b, entities_b in all_entities.items():
            if corpus_a >= corpus_b:
                continue
            
            # Exact matches
            overlap = set(entities_a.keys()) & set(entities_b.keys())
            
            # Semantic matches (embedding similarity)
            for term_a in entities_a:
                similar = find_similar_terms(term_a, entities_b, threshold=0.85)
                overlap.update(similar)
            
            for term in overlap:
                mappings[term] = {
                    "domains": [corpus_a, corpus_b],
                    "frequencies": {
                        corpus_a: entities_a.get(term, {}).get("frequency", 0),
                        corpus_b: entities_b.get(term, {}).get("frequency", 0)
                    }
                }
    
    return mappings

Pass 3+: Contextual Enrichment

def pass_n_enrich(term: str, current_entry: dict, n: int) -> dict:
    """Recursively enrich term with deeper context"""
    
    # Get all chunks containing term
    chunks = get_chunks_containing(term)
    
    # Extract co-occurring entities
    cooccurrence = extract_cooccurrence(chunks)
    
    # Ask LLM for deeper analysis
    enrichment_prompt = f"""
    Term: {term}
    Current understanding: {current_entry}
    Sample contexts: {chunks[:5]}
    Co-occurring terms: {cooccurrence[:20]}
    
    Provide:
    1. Refined definition incorporating all domains
    2. Key relationships not yet captured
    3. Cross-domain bridges (how ancient/modern concepts connect)
    4. Confidence assessment
    """
    
    enriched = llm_analyze(enrichment_prompt)
    
    current_entry["refinement_passes"] = n
    current_entry["cross_domain_notes"] = enriched["bridges"]
    current_entry["related_terms"].extend(enriched["relationships"])
    current_entry["confidence"] = enriched["confidence"]
    
    return current_entry

Iteration Controller

def recursive_refinement(lexicon: dict, max_passes: int = 5) -> dict:
    """Iterate until convergence or max passes"""
    
    for n in range(1, max_passes + 1):
        changes = 0
        
        for term, entry in lexicon.items():
            old_confidence = entry.get("confidence", 0)
            
            enriched = pass_n_enrich(term, entry, n)
            
            # Check if significantly changed
            if abs(enriched["confidence"] - old_confidence) > 0.05:
                changes += 1
            
            lexicon[term] = enriched
        
        print(f"Pass {n}: {changes} terms updated")
        
        # Convergence check
        if changes < len(lexicon) * 0.01:  # <1% changed
            print(f"Converged at pass {n}")
            break
    
    return lexicon

๐Ÿท๏ธ Domain-Specific Lexicons

Quantum Biology Lexicon

TermFrequencyKey Context
mitochondria4,550Quantum coherence, biophotons
melanin3,822Quantum antenna, evolution
deuterium2,349Kinetic isotope effect
heteroplasmy403mtDNA mutation load
proton tunneling392Grotthuss mechanism

Greek Medical Lexicon

GreekTransliterationModern Mapping
ฯ†ฮฌฯฮผฮฑฮบฮฟฮฝpharmakondrug/toxin (dose-dependent)
ฯ‡ฯ…ฮผฯŒฯ‚chymosbiochemical milieu
ฮธฮตฯฮฑฯ€ฮตฮฏฮฑtherapeiatherapeutic intervention
ฮบฯแพถฯƒฮนฯ‚krasishomeostatic balance
ฮดฯฮฝฮฑฮผฮนฯ‚dynamisbioactive potency

Physics Lexicon

TermFrequencyCross-Domain Bridge
coherence1,200+QBio: biological coherence
tunneling800+QBio: enzyme catalysis
entropy600+QBio: negentropy of life
field2,000+QBio: biofield, nnEMF

๐Ÿ”— Expert System Capture

Implicit Knowledge Extraction

The lexicon captures expert-level knowledge by:

  1. Co-occurrence patterns โ€” What experts mention together
  2. Contextual usage โ€” How terms are actually used
  3. Cross-domain bridges โ€” Connections only experts see
  4. Confidence gradients โ€” Which mappings are solid vs speculative

Example: Theriac โ†’ Modern Pharmacology

Ancient term: ฮธฮทฯฮนฮฑฮบฮฎ (thฤ“riakฤ“) - "beast medicine"
โ”œโ”€โ”€ Greek corpus context: antidote to venomous bites, 60+ ingredients
โ”œโ”€โ”€ Cross-reference: opium, viper flesh, botanical compounds
โ”œโ”€โ”€ Modern mapping:
โ”‚   โ”œโ”€โ”€ Polypharmacy (multi-compound formulation)
โ”‚   โ”œโ”€โ”€ Mithridatism (graduated poison tolerance)
โ”‚   โ””โ”€โ”€ Hormesis (low-dose stimulation)
โ”œโ”€โ”€ Quantum biology angle:
โ”‚   โ”œโ”€โ”€ Mitochondrial hormesis
โ”‚   โ””โ”€โ”€ Adaptive stress response
โ””โ”€โ”€ Confidence: 0.72 (solid historical, speculative mechanism)

๐Ÿ“‹ Implementation TODO

Phase 1: Foundation

  • Build lexicon ChromaDB collection
  • Pass 1 extraction on all corpora
  • JSON export for inspection

Phase 2: Cross-Mapping

  • Implement semantic similarity matching
  • Greekโ†’English term mapping
  • Build cross-reference index

Phase 3: Recursive Refinement

  • LLM enrichment pipeline
  • Convergence detection
  • Confidence scoring

Phase 4: Expert System

  • Query interface for lexicon
  • Integration with Pharmakon Miner
  • Novel connection discovery

๐Ÿ› ๏ธ Tools

lexicon_builder.py

#!/usr/bin/env python3
"""Build cross-domain lexicon from corpora"""
 
class LexiconBuilder:
    def __init__(self, chroma_client, corpora: list):
        self.client = chroma_client
        self.corpora = corpora
        self.lexicon = {}
    
    def build(self, max_passes: int = 5):
        # Pass 1: Extract from all corpora
        for corpus in self.corpora:
            self.lexicon[corpus] = self.extract_entities(corpus)
        
        # Pass 2: Cross-map
        self.cross_mappings = self.build_cross_map()
        
        # Pass 3+: Recursive refinement
        self.master_lexicon = self.recursive_refine(max_passes)
        
        return self.master_lexicon
    
    def export(self, path: str):
        with open(path, 'w') as f:
            json.dump(self.master_lexicon, f, indent=2)

๐Ÿ“Š Metrics

MetricTargetCurrent
Unique terms10,000+TBD
Cross-domain mappings2,000+TBD
Greekโ†’Modern bridges500+TBD
Avg confidence>0.75TBD
Refinement passes3-5TBD


Recursive refinement: each pass makes the lexicon smarter.