Every LLM starts with amnesia
Claude Code doesn't know what you discussed in ChatGPT. Copilot doesn't know the architecture decisions you made in Claude. You re-explain the same context dozens of times per week. Simple top-k vector retrieval doesn't fix this. It returns fragments without structure, ranking, or provenance.
A memory layer that follows you across five surfaces
Loom watches your work across LLM tools, builds a knowledge graph from it, and compiles the right context into any AI tool at query time. It replaces "paste your context" with an always-on memory layer spanning Claude Desktop, Claude Code, ChatGPT Desktop, GitHub Copilot, and M365 Copilot. All five are first-class clients with shipped integration guides, discipline templates, and where the vendor publishes one, a bootstrap parser for vendor exports.
Two pipelines, one database, one telemetry sampler
PostgreSQL 17 is the single system of record. No external vector store. No graph database. pgvector handles embeddings, recursive CTEs handle graph traversal, pgAudit handles compliance. Two strictly separated pipelines share the database but never share runtime. A 1 Hz telemetry sampler runs alongside both, streaming live status to the dashboard via SSE.
Online pipeline
- Intent classification (primary + secondary class)
- Namespace resolution
- Parallel retrieval profiles via tokio::join!
- Memory weight modifiers per task class
- Rank on 4 dimensions: relevance, recency, stability, provenance
- Compile context package (XML for Claude, JSON for others)
- Full audit trace
Offline pipeline
- Ingest episode + SHA-256 dedup
- Embed (768d via nomic-embed-text)
- Extract entities (schema-constrained output via Ollama response_format)
- Three-pass entity resolution
- Extract facts against pack-aware predicate registry
- Link facts to source episodes
- Resolve supersession with bounded retries on failure
Telemetry sampler
In-process ring buffer. Host CPU/memory, Ollama state, per-stage p50 latency, queue counters, recent failures. Streams to the Runtime dashboard at 1 Hz over SSE. No new tables, no time-series store.
Three modes, one invariant
Every episode entering Loom carries an ingestion_mode that determines how
the four-dimension ranker weights it. The three valid modes are the only paths in. The
taxonomy enforces a single invariant: LLM-generated content cannot become canonical memory.
user_authored_seed The loom-seed CLI POSTs markdown you wrote 0.8 vendor_import Bootstrap parsers POST vendor export excerpts 0.6 live_mcp_capture MCP loom_learn or PostSession hook 1.0
A fourth mode for LLM reconstructions does not exist. The MCP server hardcodes
live_mcp_capture at the entry point — clients cannot forge the mode.
Three types, strict authority
Facts are never more authoritative than their source episodes. Procedures are candidate patterns until promoted. Two tiers: Hot (always injected, configurable budget per namespace) and Warm (retrieved per-query). New facts always start warm. Namespace isolation is absolute.
Episodic
Raw interaction records. Immutable evidence. Strongest audit anchor. Primary for debug and compliance tasks.
Semantic
Extracted facts and entity relationships. Derived from episodes. Revisable. Primary for architecture tasks.
Procedural
Inferred behavioral patterns. Most provisional. Requires 3+ episodes across 7+ days and 0.8+ confidence to promote.
Rust engine, local inference, zero cloud dependency
tokio, axum 0.8, sqlx 0.8. Compile-time SQL checking. ring as the rustls crypto provider. Single static binary on debian:bookworm-slim.
TypeScript 6. 13 pages. Built into a static volume served by Caddy.
pgvector, pgAudit optional. Single system of record.
gemma4:26b for discrete GPU, qwen2.5:14b for iGPU/APU/CPU, gemma4:e4b for tight memory. gemma4:e4b for classification. nomic-embed-text for embeddings.
Per-client vendor export parsers for Claude Code, Claude.ai/Desktop, ChatGPT Data Controls, M365 Copilot Purview audit. GitHub Copilot stub (no export published).
Caddy reverse proxy + automatic TLS. Five containers total.
Three tools, two transports
Loom exposes its three tools on two HTTP surfaces. Both require a bearer token.
POST /mcp is the MCP JSON-RPC 2.0 dispatcher — what real MCP clients
hit after registering Loom as an MCP server. The per-tool REST endpoints
(/mcp/loom_learn, /mcp/loom_think,
/mcp/loom_recall) remain mounted for direct curl, integration tests,
and callers that don't want to speak the JSON-RPC wire protocol. Both surfaces share the
same handler code. Clients cannot override ingestion_mode through either.
loom_think Compiles a context package for a query. Fires automatically before complex tasks. The primary integration point.
loom_learn Ingests a new episode. Returns accepted, duplicate, or queued status.
loom_recall Returns raw search results without compilation. For when you want to browse, not compile.
Three honest gates, held in production
Extraction quality
50-episode samples held against thresholds. The Metrics page reports current rolling values.
Fact precision ≥ 0.75
Predicate consistency ≥ 0.85
Compilation quality
A/B/C benchmark runs across 10 tasks. Compiled context (C) measured against raw retrieval (B) and no-memory baseline (A).
Token reduction
Task success
Pipeline reliability
The Runtime page reports the live state. Failed episodes surface for operator triage rather than retrying forever.
Thirteen pages, one purpose
Full inspectability into the memory system. The dashboard is how you trust the compiler.
The thesis: a memory compiler with explicit ranking, graph traversal, and audit logging produces better context packages than simple top-k vector retrieval. The dashboard is where that thesis is proven or disproven on a continuous basis.
Read more in Weaving Memory, the practitioner journal of building, using, and learning from Loom.
PostgreSQL-native · Local inference · MCP-first