Every LLM starts with amnesia
Claude Code doesn't know what you discussed in ChatGPT. Copilot doesn't know the architecture decisions you made in Claude. You re-explain the same context dozens of times per week. Simple top-k vector retrieval doesn't fix this. It returns fragments without structure, ranking, or provenance.
A memory layer that follows you across tools
Loom watches your work across LLM tools, builds a knowledge graph from it, and compiles the right context into any AI tool at query time. It replaces "paste your context" with an always-on memory layer spanning Claude Code, Codex CLI, ChatGPT, GitHub Copilot, and anything that speaks MCP.
Two pipelines, one database
PostgreSQL 16 is the single system of record. No external vector store. No graph database. pgvector handles embeddings, recursive CTEs handle graph traversal, pgAudit handles compliance. Two strictly separated pipelines share the database but never share runtime.
Online pipeline
- Intent classification (primary + secondary class)
- Namespace resolution
- Parallel retrieval profiles (1-3, merged)
- Memory weight modifiers per task class
- Rank on 4 dimensions: relevance, recency, stability, provenance
- Compile context package
- Full audit trace
Offline pipeline
- Ingest episode + SHA-256 dedup
- Extract entities (structured prompt)
- Three-pass entity resolution
- Extract facts against predicate registry
- Link facts to source episodes
- Resolve supersession
- Compute derived ranking state
Three types, strict authority
Facts are never more authoritative than their source episodes. Procedures are candidate patterns until promoted. Two tiers in MVP: Hot (always injected, configurable budget per namespace) and Warm (retrieved per-query). New facts always start warm. Namespace isolation is absolute.
Episodic
Raw interaction records. Immutable evidence. Strongest audit anchor. Primary for debug and compliance tasks.
Semantic
Extracted facts and entity relationships. Derived from episodes. Revisable. Primary for architecture tasks.
Procedural
Inferred behavioral patterns. Most provisional. Requires 3+ episodes across 7+ days and 0.8+ confidence to promote.
Rust engine, local inference, zero cloud dependency
tokio, axum, sqlx. Compile-time SQL checking. ~20MB Docker image.
TypeScript. Static files served by Caddy.
pgvector, pgAudit. Single system of record.
Gemma 4 26B MoE extraction. Gemma 4 E4B classification. nomic-embed-text embeddings.
Run-once scripts. Parses Claude.ai, ChatGPT, Codex CLI exports.
Caddy reverse proxy + TLS. Five containers total.
Three tools. Not five.
loom_think Compiles a context package for a query. Fires automatically before complex tasks. The primary integration point.
loom_learn Ingests a new episode from any source. Returns accepted, duplicate, or queued status.
loom_recall Returns raw search results without compilation. For when you want to browse, not compile.
Primary integration: Claude Code via native MCP. Secondary: manual REST ingestion. Codex CLI follows after validation.
Fail either gate: simplify or kill
Week 4: Extraction
50-episode sample, human-annotated. Blocks all further work if thresholds aren't met.
Fact precision ≥ 0.75
Predicate consistency ≥ 0.85
Week 8: Compilation
Compiled context (C) must beat raw retrieval (B) across 10 benchmark tasks.
Token reduction ≥ 30%
Zero task success regression
Full observability into the memory system
Pipeline health monitoring, compilation trace viewer with per-candidate score breakdowns, knowledge graph explorer, entity conflict review queue, predicate candidate review with pack browsing, retrieval quality metrics (precision, latency percentiles, classification confidence), extraction quality comparison across model versions, and A/B/C benchmark comparison views.
Core thesis: a memory compiler with explicit ranking, graph traversal, and audit logging produces better context packages than simple top-k vector retrieval. The MVP exists to prove or disprove that thesis.
12-week build · PostgreSQL-native · Local inference · MCP-first