Move a conversation from one model to another and something subtle happens: some of the meaning leaks away. Names get dropped, intent gets flattened, the "why" behind a decision gets lost. Most tools never measure this — they just hand over a transcript and hope.
Context Fidelity™ is our benchmark for exactly that: how much meaning survives when context moves from one model to the next.
The handoff problem
Picture a chain — Claude → GPT → Gemini. Each step is an opportunity to lose something. The question we care about is simple to state and hard to answer well: after the handoff, does the next model still understand what mattered?
True intelligence isn't about volume. It's about how much understanding you can carry forward.
How we think about the score
Rather than measuring raw tokens preserved, we focus on preserved meaning: entities, decisions, relationships and intent. The .aal format is what makes a high score possible — it carries the structure around a memory, not just its words, so the receiving model inherits context instead of guessing at it.
The figures we show on the site (such as a 98% target) are illustrative goals while the benchmark matures. We'll publish methodology and live results here as they're ready.
The goal is a number you can trust — and a system where switching models never means starting over.