Project

Loom

A PostgreSQL-native memory compiler for AI workflows. Evidence-grounded memory, strict scoping, shallow graph, inspectable context assembly.

"Weaving threads of knowledge into fabric."

View on GitHub

Every LLM starts with amnesia

Claude Code doesn't know what you discussed in ChatGPT. Copilot doesn't know the architecture decisions you made in Claude. You re-explain the same context dozens of times per week. Simple top-k vector retrieval doesn't fix this. It returns fragments without structure, ranking, or provenance.

A memory layer that follows you across five surfaces

Loom watches your work across LLM tools, builds a knowledge graph from it, and compiles the right context into any AI tool at query time. It replaces "paste your context" with an always-on memory layer spanning Claude Desktop, Claude Code, ChatGPT Desktop, GitHub Copilot, and M365 Copilot. All five are first-class clients with shipped integration guides, discipline templates, and where the vendor publishes one, a bootstrap parser for vendor exports.

You work normally Loom learns You ask any LLM Loom compiles + injects

Two pipelines, one database, one telemetry sampler

PostgreSQL 17 is the single system of record. No external vector store. No graph database. pgvector handles embeddings, recursive CTEs handle graph traversal, pgAudit handles compliance. Two strictly separated pipelines share the database but never share runtime. A 1 Hz telemetry sampler runs alongside both, streaming live status to the dashboard via SSE.

Online pipeline

  1. Intent classification (primary + secondary class)
  2. Namespace resolution
  3. Parallel retrieval profiles via tokio::join!
  4. Memory weight modifiers per task class
  5. Rank on 4 dimensions: relevance, recency, stability, provenance
  6. Compile context package (XML for Claude, JSON for others)
  7. Full audit trace

Offline pipeline

  1. Ingest episode + SHA-256 dedup
  2. Embed (768d via nomic-embed-text)
  3. Extract entities (schema-constrained output via Ollama response_format)
  4. Three-pass entity resolution
  5. Extract facts against pack-aware predicate registry
  6. Link facts to source episodes
  7. Resolve supersession with bounded retries on failure

Telemetry sampler

In-process ring buffer. Host CPU/memory, Ollama state, per-stage p50 latency, queue counters, recent failures. Streams to the Runtime dashboard at 1 Hz over SSE. No new tables, no time-series store.

Three modes, one invariant

Every episode entering Loom carries an ingestion_mode that determines how the four-dimension ranker weights it. The three valid modes are the only paths in. The taxonomy enforces a single invariant: LLM-generated content cannot become canonical memory.

Mode How it enters Provenance coefficient
user_authored_seed The loom-seed CLI POSTs markdown you wrote 0.8
vendor_import Bootstrap parsers POST vendor export excerpts 0.6
live_mcp_capture MCP loom_learn or PostSession hook 1.0

A fourth mode for LLM reconstructions does not exist. The MCP server hardcodes live_mcp_capture at the entry point — clients cannot forge the mode.

Three types, strict authority

Facts are never more authoritative than their source episodes. Procedures are candidate patterns until promoted. Two tiers: Hot (always injected, configurable budget per namespace) and Warm (retrieved per-query). New facts always start warm. Namespace isolation is absolute.

Episodic

Raw interaction records. Immutable evidence. Strongest audit anchor. Primary for debug and compliance tasks.

Semantic

Extracted facts and entity relationships. Derived from episodes. Revisable. Primary for architecture tasks.

Procedural

Inferred behavioral patterns. Most provisional. Requires 3+ episodes across 7+ days and 0.8+ confidence to promote.

Authority: Episodes > Facts > Procedures

Rust engine, local inference, zero cloud dependency

Engine Rust

tokio, axum 0.8, sqlx 0.8. Compile-time SQL checking. ring as the rustls crypto provider. Single static binary on debian:bookworm-slim.

Dashboard React 19 + Vite 8

TypeScript 6. 13 pages. Built into a static volume served by Caddy.

Database PostgreSQL 17

pgvector, pgAudit optional. Single system of record.

LLM inference Ollama (local)

gemma4:26b for discrete GPU, qwen2.5:14b for iGPU/APU/CPU, gemma4:e4b for tight memory. gemma4:e4b for classification. nomic-embed-text for embeddings.

Bootstrap Python

Per-client vendor export parsers for Claude Code, Claude.ai/Desktop, ChatGPT Data Controls, M365 Copilot Purview audit. GitHub Copilot stub (no export published).

Deployment Docker Compose

Caddy reverse proxy + automatic TLS. Five containers total.

Three tools, two transports

Loom exposes its three tools on two HTTP surfaces. Both require a bearer token. POST /mcp is the MCP JSON-RPC 2.0 dispatcher — what real MCP clients hit after registering Loom as an MCP server. The per-tool REST endpoints (/mcp/loom_learn, /mcp/loom_think, /mcp/loom_recall) remain mounted for direct curl, integration tests, and callers that don't want to speak the JSON-RPC wire protocol. Both surfaces share the same handler code. Clients cannot override ingestion_mode through either.

loom_think

Compiles a context package for a query. Fires automatically before complex tasks. The primary integration point.

loom_learn

Ingests a new episode. Returns accepted, duplicate, or queued status.

loom_recall

Returns raw search results without compilation. For when you want to browse, not compile.

Three honest gates, held in production

Extraction quality

50-episode samples held against thresholds. The Metrics page reports current rolling values.

Entity precision ≥ 0.80
Fact precision ≥ 0.75
Predicate consistency ≥ 0.85

Compilation quality

A/B/C benchmark runs across 10 tasks. Compiled context (C) measured against raw retrieval (B) and no-memory baseline (A).

Precision improvement
Token reduction
Task success

Pipeline reliability

The Runtime page reports the live state. Failed episodes surface for operator triage rather than retrying forever.

Thirteen pages, one purpose

Full inspectability into the memory system. The dashboard is how you trust the compiler.

Runtime — SSE-driven live status. Host CPU/memory, Ollama model and GPU/CPU badge, per-stage p50 latency, ingestion queue counters, recent failures with bulk-requeue.
Pipeline Health — Episode counts by source and namespace, entity counts by type, pending queue depth, failed-episode count, model config.
Compilations — Paginated loom_think history with drill-down to per-candidate score breakdowns.
Entities — Search and detail view with 1–2 hop neighborhood graph, tier pills, salience bars.
Predicates — Custom predicate candidate queue with map-to-canonical and promote-to-pack actions. Pack browser with usage heatmap.
Conflicts — Entity resolution conflict queue. Merge, keep separate, split.
Metrics — Retrieval precision over time, latency percentiles, classification confidence, extraction quality, hot-tier utilization.
Benchmarks — A/B/C condition runs, per-task precision and latency, winner card.
Parser Health — Per-bootstrap-parser episode counts, parser versions, freshness pills.
Ingestion Mode Distribution — Per-namespace Mode 1/2/3 breakdown with seed-only warning list.

The thesis: a memory compiler with explicit ranking, graph traversal, and audit logging produces better context packages than simple top-k vector retrieval. The dashboard is where that thesis is proven or disproven on a continuous basis.

PostgreSQL-native · Local inference · MCP-first