Local GPU Job Queue

⚠️ ARCHIVED (2026-02-01): Queue system deprecated. LLM via Claude CLI (Max plan). GPU for embeddings only.

Historical Overview

SQLite-backed job queue for local Ollama inference with GPU acceleration — now superseded by Claude CLI.

Current Architecture

All LLM processing now routes through:

  • Claude CLI (Max plan) - Unlimited usage, all LLM tasks
  • **

GPU Still Used For

  • Embeddings: sentence-transformers/all-MiniLM-L6-v2 (CUDA accelerated)
  • Vector operations: ChromaDB similarity search

See Also