Distributed Compute Options

Options for pooling resources between pop-os (16GB) and ShadowMaster (14GB Windows).

Available Resources

Host	RAM	GPU	OS	Notes
pop-os	16GB	RTX 2060 Super 8GB	Linux	Main workstation
ShadowMaster	14GB	Unknown	Windows	DJ bot/streaming
Proxmox	16GB	None	Linux	Hypervisor, ~10GB idle

Easy Options (Recommended)

1. Remote ChromaDB Server

Effort: Low | Benefit: Medium

Run ChromaDB as a server on one machine, query from others:

# On pop-os (has GPU for embeddings)
pip install chromadb
chroma run --path /mnt/storage/knowledge-rag/chroma_db --host 0.0.0.0 --port 8001

# From any machine
import chromadb
client = chromadb.HttpClient(host="192.168.1.156", port=8001)

✅ Centralizes storage ✅ Network queries don’t load full DB into RAM ❌ Doesn’t add RAM, just shares access

2. Ray Cluster (Distributed Processing)

Effort: Medium | Benefit: High

Pool CPU/RAM across machines for parallel processing:

# On pop-os (head node)
pip install ray
ray start --head --port=6379
 
# On ShadowMaster (worker)
pip install ray
ray start --address='192.168.1.156:6379'

import ray
ray.init(address="auto")
 
@ray.remote
def process_chunk(chunk):
    # Processing runs on any available node
    return enriched_chunk
 
# Distribute work across cluster
futures = [process_chunk.remote(c) for c in chunks]
results = ray.get(futures)

✅ Combines RAM pools (~30GB total) ✅ Parallel processing ❌ Requires Python on Windows ❌ Network overhead for data transfer

3. Dask Distributed

Effort: Medium | Benefit: High

Similar to Ray, better for data-heavy workloads:

# On pop-os
pip install dask distributed
dask scheduler --host 0.0.0.0 --port 8786
dask worker 192.168.1.156:8786
 
# On ShadowMaster
dask worker 192.168.1.156:8786

✅ Better pandas/numpy integration ✅ Web dashboard for monitoring ❌ Same cross-platform challenges

Complex Options (Future)

4. Network Block Device (NBD) RAM Disk

Export RAM as network storage - very complex, high latency.

5. Kubernetes / Docker Swarm

Overkill for 2 machines, but scalable if adding more hosts.

Recommendation

For your use case (ChromaDB bulk operations):

Short-term: Keep using batched processing (already implemented)
Medium-term: Set up Ray cluster if you need parallel enrichment
Skip: Remote RAM pooling - latency makes it worse than batching

The batched approach we just implemented handles 1M+ chunks fine on 16GB. Adding ShadowMaster’s RAM would only help if you’re doing parallel LLM calls or need to load multiple collections simultaneously.

Quartz 4

Explorer

🔗 Distributed Compute Options

Distributed Compute Options

Available Resources

Easy Options (Recommended)

1. Remote ChromaDB Server

2. Ray Cluster (Distributed Processing)

3. Dask Distributed

Complex Options (Future)

4. Network Block Device (NBD) RAM Disk

5. Kubernetes / Docker Swarm

Recommendation

Graph View

Table of Contents

Quartz 4

Explorer

🔗 Distributed Compute Options

Distributed Compute Options

Available Resources

Easy Options (Recommended)

1. Remote ChromaDB Server

2. Ray Cluster (Distributed Processing)

3. Dask Distributed

Complex Options (Future)

4. Network Block Device (NBD) RAM Disk

5. Kubernetes / Docker Swarm

Recommendation

Related

Graph View

Table of Contents