Honcho's dream cycle: how an AI memory system teaches itself
Dan Billings — 2026-06-01
After a few weeks of using Hermes Agent with Honcho memory, something unexpected happened. The agent started knowing things I hadn't told it in that session. Not from context — the conversation was new. Not from retrieval of past transcripts — Honcho doesn't store those verbatim. From something closer to what you'd call accumulated understanding.
This post is about the piece of Honcho that makes that possible: the dream cycle. The previous post covered how to get Honcho running and the deriver basics. This one goes into what the deriver is actually doing and what happens after it's been running long enough to have something to work with.
What the deriver does
Before dreaming there's the deriver — a background worker that runs continuously while Honcho is up. Every time a new message comes in, the deriver eventually processes it. It sends batches of messages to the LLM and asks: what can we extract from this? What does it tell us about the user?
The output is observations — structured facts stored in Postgres with a pgvector embedding attached. "Dan prefers concise responses." "Dan is working on a home ML cluster." "Dan's RTX 4090 is running Qwen3.6-27B at --parallel 2 --ctx-size 131072."
There's an important tuning detail here. The deriver processes messages in token-capped batches (DERIVER_REPRESENTATION_BATCH_MAX_TOKENS=8192 in our setup). With DERIVER_FLUSH_ENABLED=true — the upstream default — it would send everything at once: 100+ messages in a single LLM call that overflows context and returns empty JSON. The fix is DERIVER_FLUSH_ENABLED=false, which lets the batch cap do its job. With a 64k-per-slot context window and 8k-token batches, the deriver extracts 5-10 observations per batch and works through the queue reliably. Small detail, complete difference in behavior.
The dream cycle
The dream cycle is different from the deriver. The deriver extracts — it takes messages and converts them into observations. The dream cycle reasons — it takes the accumulated observations and asks deeper questions.
It fires when two conditions are both met: at least 50 new explicit observations have accumulated since the last dream, and at least 8 hours have passed. Both conditions must be true. The 50-observation threshold means the first dream cycle doesn't fire for weeks on a new install — you need enough material for the reasoning to have something to work with. The 8-hour cap prevents it from running so frequently it interferes with the deriver.
When it fires, it runs two specialist agents: deduction and induction.
Deduction: the detective
The deduction agent is a tool-using LLM with a specific set of tools:
get_recent_observations— what's been learned recentlysearch_memory— semantic search over stored documentssearch_messages— raw conversation contentcreate_observations_deductive— store a new deductive conclusiondelete_observations— prune stale or contradicted facts
Its job is logical implication. From separate observations like "Dan runs Arch Linux on danarch," "Dan runs Ubuntu on danwin (WSL2)," and "Dan runs macOS on dans-mac-mini," it can derive: "Dan operates a heterogeneous home ML cluster." None of those individual observations say that; the deduction agent reads across them.
It also handles knowledge updates. When a session produces "Qwen3.6 running on 4090 at --parallel 4 --ctx-size 65536" and a later session produces "updated to --parallel 2 --ctx-size 131072," those two observations contradict each other about the current state. Deduction creates an updated observation with the new values and deletes the stale one. The memory doesn't accumulate obsolete facts; it stays current.
Contradictions that genuinely can't be reconciled get flagged rather than silently resolved. The deduction agent notes that both facts exist and that they conflict; at dialectic time, the synthesis LLM decides how to handle the conflict. If the contradiction is persistent and wrong, there's a manual escape hatch via the Honcho API.
Induction: the pattern recognizer
The induction agent doesn't look for logical implications. It looks for regularities across observations — generalizations that aren't stated in any individual observation.
The tool set is similar, but the output is different: inductive observations, which are conclusions about patterns rather than facts. These are the ones that make the agent feel like it understands you rather than just knowing things about you.
Examples of what induction over real observations might produce:
- Multiple observations about Iron types, 2-value enums, no booleans,
assumesat refinement boundaries → induction: "Dan has a strong preference for compile-time correctness over runtime validation. He uses the type system as the primary constraint mechanism." - Multiple observations about correcting agents mid-task, explicit preferences for being told when the approach is wrong → induction: "Dan prefers early course correction over completed work that requires fundamental revision. He wants to be a co-pilot, not a consumer."
- Multiple observations about specific hardware, models, flags, VRAM accounting → induction: "Dan treats the inference stack as a system he reasons about from first principles, not a product he configures by trial and error."
None of those generalizations appear verbatim in any individual observation. They're inferred from the pattern. And they're the context that makes Hermes feel different from a stateless agent: not just knowing that you use Iron types, but understanding why you use Iron types.
Surprisal sampling: what the dream focuses on
There's an optional pre-filter called surprisal sampling that's currently disabled in our setup (we're still accumulating toward the ~200-observation threshold where it becomes useful). It's worth understanding because it changes what the dream cycle pays attention to.
Without surprisal sampling, the dream specialists would see everything. The problem: if 80% of observations are about infrastructure, the dream cycle keeps generating more infrastructure conclusions. Not because those are the most useful, but because that's what dominates the frequency distribution.
Surprisal sampling builds a kd-tree over all observation embeddings in pgvector. For each observation, it computes geometric surprisal: how far is this observation from its k nearest neighbors? Observations that cluster tightly score low surprisal — they're well-covered, adding another one about them doesn't help. Observations that are isolated in embedding space score high surprisal — they were mentioned once, they touch a topic nothing else touches, they're the underexplored edges of what the system knows.
The top 10% most surprising observations get passed as hints to the specialists. The dream focuses on the sparse regions of the knowledge graph rather than the dense center. Without it, you get more infrastructure generalizations. With it, you get the single observation about how you handle VRAM contention decisions, which leads to an induction about how you reason about hardware constraints.
The peer card
Alongside observations, the deduction agent maintains what Honcho calls the peer card — a separate store for stable identity markers. Not episodic facts or behavioral patterns; durable identity.
The format uses explicit prefixes:
IDENTITY:— name, kindATTRIBUTE:— stable properties (location, role)RELATIONSHIP:— connections to things (machines, projects, teams)INSTRUCTION:— explicit preferences the user has stated
Behavioral tendencies go in observations, not the card. The card is what persists when observations are pruned, updated, or contradicted. It's the stable core that survives the dream cycle's cleanup operations.
The prewarm problem
The lag before the first dream (50 observations, minimum) creates an obvious problem: what does the dialectic return at session start before any dreaming has happened?
Honcho's answer is the dialectic prewarm. At session startup, before the first turn, Hermes issues a query: "Summarize what you know about this user. Focus on preferences, current projects, and working style." This hits the existing observations synchronously — whatever the deriver has extracted so far — and primes the session context.
For the first few sessions with a new install, the prewarm is thin. By the tenth session, the deriver has extracted a couple dozen observations and the prewarm is useful. After 50 observations and a dream cycle, the prewarm pulls from both raw observations and the deduced/induced conclusions, which is substantially better.
The lag matters for deep inductive generalizations. It doesn't matter much for basic retrieval. The deriver can extract "Dan runs a home ML cluster" from session one; the induction agent can't conclude "Dan treats infrastructure decisions as hardware optimization problems" until there's enough material to see the pattern.
What it looks like when it's working
This is early — we're a few weeks in and the first dream cycle hasn't fired yet on this install (the 50-observation threshold takes time to hit). But the deriver has been extracting observations, and the dialectic prewarm already does useful things:
The agent knows which model is running where without being told. It knows the tool preferences — Iron types, no booleans, Free Monad for DSL composition — well enough to apply them without reminders. It knows the machine topology: danarch on RTX 3070 running Honcho and Qwen3-8B, danwin on RTX 4090 for Qwen3.6-27B, dans-mac-mini running Hermes.
After a dream cycle fires, the expectation is that these individual facts get synthesized into generalizations. Less "Dan uses Iron types" and more "Dan's design philosophy prioritizes correctness proofs at type level." That's the difference between memory and understanding.
The playbook
The Honcho setup is managed via Ansible in this repo. The danarch playbook installs pgvector from source, sets up the Postgres database, clones the Honcho repo, writes the .env configuration, pulls the nomic-embed model, and installs the three systemd units (API server, deriver worker, embedding server). The deriver batch config — DERIVER_FLUSH_ENABLED=false, DERIVER_REPRESENTATION_BATCH_MAX_TOKENS=8192 — lives in the playbook defaults.
The deriver is currently pointed at danwin's Qwen3.6-27B for observation extraction. When the 5090 arrives and the 4090 gets dedicated to background work, the deriver URL changes to the 4090 endpoint and extraction speed goes up proportionally.
The dream cycle hasn't fired yet on this install. I'll update with concrete examples — what deduction produced, what the inductive generalizations look like — once the threshold is hit.