Project Loom - PostgreSQL-Native Memory Compiler for AI Workflows | Technical Anxiety

The problem

Every LLM starts with amnesia

Claude Code doesn't know what you discussed in ChatGPT. Copilot doesn't know the architecture decisions you made in Claude. You re-explain the same context dozens of times per week. Simple top-k vector retrieval doesn't fix this. It returns fragments without structure, ranking, or provenance.

What Loom is

A memory layer that follows you across five surfaces

Loom watches your work across LLM tools, builds a knowledge graph from it, and compiles the right context into any AI tool at query time. It replaces "paste your context" with an always-on memory layer spanning Claude Desktop, Claude Code, ChatGPT Desktop, GitHub Copilot, and M365 Copilot. All five are first-class clients with shipped integration guides, discipline templates, and where the vendor publishes one, a bootstrap parser for vendor exports.

You work normally → Loom learns → You ask any LLM → Loom compiles + injects

Architecture

Two pipelines, one database, one telemetry sampler

PostgreSQL 17 is the single system of record. No external vector store. No graph database. pgvector handles embeddings, recursive CTEs handle graph traversal, pgAudit handles compliance. Two strictly separated pipelines share the database but never share runtime. A 1 Hz telemetry sampler runs alongside both, streaming live status to the dashboard via SSE.

Online pipeline

Intent classification (primary + secondary class)
Namespace resolution
Parallel retrieval profiles via tokio::join!
Memory weight modifiers per task class
Rank on 4 dimensions: relevance, recency, stability, provenance
Compile context package (XML for Claude, JSON for others)
Full audit trace

Offline pipeline

Ingest episode + SHA-256 dedup
Embed (768d via nomic-embed-text)
Extract entities (schema-constrained output via Ollama response_format)
Three-pass entity resolution
Extract facts against pack-aware predicate registry
Link facts to source episodes
Resolve supersession with bounded retries on failure

Embedding inputs are bounded at 16K characters. Extraction output is constrained to a JSON Schema via Ollama's response_format — neither stage can produce routine poison pills. See ADR-011.

Consolidation phase

Runs nightly per namespace. Two steps in sequence:

Consolidation. Queries entities with ≥ 5 stable, non-superseded facts older than 48 hours (capped at 20 entities per run). For each cluster, calls the offline LLM with a schema-constrained prompt to produce a single coherent summary paragraph with a coverage map — every claim must cite a source fact UUID. A hallucination guard cross-checks every cited UUID against the source cluster; any unrecognized UUID causes the summary to be rejected rather than stored. Accepted summaries are written to loom_summaries with full provenance: source fact UUIDs, synthesis model, prompt version, and an invalidated_at timestamp that is stamped when any source fact is superseded.

Pruning. Soft-deletes stale procedures (default 90-day TTL since last match), auto-resolves resolution conflicts older than 60 days, and removes summaries whose source facts were superseded more than 30 days ago. TTLs are configurable per namespace.

All consolidation activity is logged to loom_consolidation_log. The Consolidation dashboard page surfaces run history, KPIs, and a manual trigger. See ADR-012.

Telemetry sampler

In-process ring buffers, 300 data points per metric (5-minute window at 1 Hz). Host CPU/memory sampled via the sysinfo crate. Ollama model state and GPU/CPU compute placement queried from Ollama's /api/ps every 5 seconds. Per-stage p50 latency and queue counters queried from PostgreSQL every 5 seconds — no schema changes, no new tables. EventSource cannot send Authorization headers; the SSE endpoint accepts ?token= as a fallback, TLS-only, read-only route. No Prometheus, no external monitoring stack. See ADR-010.

Ingestion model

Three modes, one invariant

Every episode entering Loom carries an ingestion_mode that determines how the four-dimension ranker weights it. The three valid modes are the only paths in. The taxonomy enforces a single invariant: LLM-generated content cannot become canonical memory.

Mode How it enters Provenance coefficient

user_authored_seed The loom-seed CLI POSTs markdown you wrote 0.8

vendor_import Bootstrap parsers POST vendor export excerpts 0.6

live_mcp_capture MCP loom_learn or PostSession hook 1.0

A fourth mode for LLM reconstructions does not exist. The MCP server hardcodes live_mcp_capture at the entry point — clients cannot forge the mode.

Memory model

Three types, strict authority

Facts are never more authoritative than their source episodes. Procedures are candidate patterns until promoted. Two tiers: Hot (always injected, configurable budget per namespace) and Warm (retrieved per-query). New facts always start warm. Namespace isolation is absolute.

Episodic

Raw interaction records. Immutable evidence. Strongest audit anchor. Primary for debug and compliance tasks.

Semantic

Extracted facts and entity relationships. Derived from episodes. Revisable. Primary for architecture tasks.

Procedural

Inferred behavioral patterns. Most provisional. Requires 3+ episodes across 7+ days and 0.8+ confidence to promote.

Authority: Episodes > Facts > Procedures

Technology stack

Rust engine, local inference, zero cloud dependency

Engine Rust

tokio, axum 0.8, sqlx 0.8. Compile-time SQL checking. ring as the rustls crypto provider. Single static binary on debian:bookworm-slim. Telemetry: sysinfo (host resource sampling), async-stream + tokio-stream (SSE feed).

Dashboard React 19 + Vite 8

TypeScript 6. 13 pages. Built into a static volume served by Caddy.

Database PostgreSQL 17

pgvector, pgAudit optional. Single system of record.

LLM inference Ollama (local)

gemma4:26b for discrete GPU, qwen2.5:14b for iGPU/APU/CPU, gemma4:e4b for tight memory. gemma4:e4b for classification. nomic-embed-text for embeddings.

Bootstrap Python

Per-client vendor export parsers for Claude Code, Claude.ai/Desktop, ChatGPT Data Controls, M365 Copilot Purview audit. GitHub Copilot stub (no export published).

Deployment Docker Compose

Caddy reverse proxy + automatic TLS. Five containers total.

MCP interface

Three tools, two transports

Loom exposes its three tools on two HTTP surfaces. Both require a bearer token. POST /mcp is the MCP JSON-RPC 2.0 dispatcher — what real MCP clients hit after registering Loom as an MCP server. The per-tool REST endpoints (/mcp/loom_learn, /mcp/loom_think, /mcp/loom_recall) remain mounted for direct curl, integration tests, and callers that don't want to speak the JSON-RPC wire protocol. Both surfaces share the same handler code. Clients cannot override ingestion_mode through either.

loom_think

Compiles a context package for a query. Fires automatically before complex tasks. The primary integration point.

loom_learn

Ingests a new episode. Returns accepted, duplicate, or queued status.

loom_recall

Returns raw search results without compilation. For when you want to browse, not compile.

What gets measured

Three honest gates, held in production

Extraction quality

50-episode samples held against thresholds. The Metrics page reports current rolling values.

Entity precision ≥ 0.80
Fact precision ≥ 0.75
Predicate consistency ≥ 0.85

Compilation quality

A/B/C benchmark runs across 10 tasks. Compiled context (C) measured against raw retrieval (B) and no-memory baseline (A).

Precision improvement
Token reduction
Task success

Pipeline reliability

The Runtime page reports the live state. Failed episodes surface for operator triage rather than retrying forever.

Operational dashboard

Thirteen pages, one purpose

Full inspectability into the memory system. The dashboard is how you trust the compiler.

Runtime — SSE-driven live status at 1 Hz. Dense btop-inspired layout: host CPU/memory bar gauges with 5-minute sparklines, Ollama model name with GPU/CPU compute badge, per-stage p50 pipeline latency (classify → retrieve → rank → compile) with latency sparkline, ingestion queue counters (active / pending / failed), recent extraction failure tail with episode ID, source, and error. Ring buffers hold 300 data points per metric. Sparkline history resets on engine restart — acceptable for single-operator local infrastructure.

Pipeline Health — Episode counts by source and namespace, entity counts by type, pending queue depth, failed-episode count, model config.

Compilations — Paginated loom_think history with drill-down to per-candidate score breakdowns.

Entities — Search and detail view with 1–2 hop neighborhood graph, tier pills, salience bars.

Predicates — Custom predicate candidate queue with map-to-canonical and promote-to-pack actions. Pack browser with usage heatmap.

Conflicts — Entity resolution conflict queue. Merge, keep separate, split.

Metrics — Retrieval precision over time, latency percentiles, classification confidence, extraction quality, hot-tier utilization.

Benchmarks — A/B/C condition runs, per-task precision and latency, winner card.

Parser Health — Per-bootstrap-parser episode counts, parser versions, freshness pills.

Ingestion Mode Distribution — Per-namespace Mode 1/2/3 breakdown with seed-only warning list.

Consolidation — Active and invalidated summary counts, nightly run history (type, status, duration, per-phase counters), and a "Run now" button that triggers an immediate consolidation + pruning cycle for the selected namespace.

Architecture decision records

Architecture Decision Records

Non-obvious design choices are documented as ADRs with zero-padded three-digit numbering under docs/adr/ in the repository.

ADR Decision

ADR-001 PostgreSQL as single system of record — no external vector store, no graph database

ADR-004 Three-mode ingestion taxonomy — user_authored_seed, vendor_import, live_mcp_capture. llm_reconstruction does not exist

ADR-007 Episode retry backoff — bounded attempts with exponential backoff; permanently-failed episodes surface for operator triage rather than retrying forever

ADR-008 MCP JSON-RPC 2.0 dispatcher at POST /mcp alongside per-tool REST endpoints

ADR-009 Extraction model selection by hardware tier — qwen2.5:14b for iGPU/APU, gemma4:26b for discrete GPU, gemma4:e4b for tight memory

ADR-010 Streaming telemetry via SSE + in-process ring buffers — SSE over WebSockets; ring buffers over persisted time-series; sysinfo over NVML; Ollama /api/ps over direct GPU access

ADR-011 Bounded inputs + constrained outputs — 16K character embed cap; response_format: json_schema on all extraction calls

ADR-012 Memory consolidation + active forgetting — nightly synthesis of stable fact clusters into provenance-traced summaries; TTL-based pruning of stale procedures, conflicts, and invalidated summaries

The ADR log is where rejected alternatives live. The choices that were not made are as important as the ones that were.

The thesis: a memory compiler with explicit ranking, graph traversal, and audit logging produces better context packages than simple top-k vector retrieval. The dashboard is where that thesis is proven or disproven on a continuous basis.

Read more in Weaving Memory, the practitioner journal of building, using, and learning from Loom.

PostgreSQL-native · Local inference · MCP-first