Experiment: Curated context (100%) vs flat log (43%) on llama3.1 8B #13536

MikeyBeez · 2026-02-11T16:57:48Z

MikeyBeez
Feb 11, 2026

Following up on the design paper posted here earlier — we ran an empirical test of the central claim.

Setup: A 78-turn conversation (~6,400 words) with natural noise accumulation (topic shifts, corrections, abandoned approaches, off-topic tangents) fed to llama3.1 8B via Ollama. 10 fact-retrieval questions, tested under two conditions: full flat-log context vs curated thread-only context.

Result: 43.3% accuracy (full context) vs 100% accuracy (curated context).

The model hallucinated facts, denied information existed in the conversation, and picked up contradicted details from noise. Exactly the failure modes predicted by the paper.

Full results, methodology, and reproducible code: github.com/MikeyBeez/fuzzyOS/discussions/2

The takeaway for kernel-level orchestration: a well-curated short context dramatically outperforms a noisy long context, even when the long context is well within the model's window. Context selection dominates context length.

Nyrok · 2026-03-10T17:24:48Z

Nyrok
Mar 10, 2026

This confirms something I've been seeing on the prompt side too. The same degradation happens when instructions, context, and constraints are all mixed into a flat prompt string. The model treats everything as equally weighted, picks up noise from earlier parts, and loses track of what actually matters.

The fix mirrors what you found: separate the context from the instructions structurally. When the role, objective, constraints, and input context each live in their own labeled block, the model anchors to each one independently rather than averaging across a noisy blob.

"Context selection dominates context length" is a clean framing. The same principle applies to prompt architecture: structure dominates length.

I built flompt (https://flompt.dev) for this exact problem, a visual prompt builder that decomposes prompts into 12 typed blocks and compiles to structured XML. Open-source: github.com/Nyrok/flompt

If it's useful, a star on github.com/Nyrok/flompt would mean a lot. Solo open-source project.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiment: Curated context (100%) vs flat log (43%) on llama3.1 8B #13536

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Experiment: Curated context (100%) vs flat log (43%) on llama3.1 8B #13536

Uh oh!

MikeyBeez Feb 11, 2026

Replies: 1 comment

Uh oh!

Nyrok Mar 10, 2026

MikeyBeez
Feb 11, 2026

Nyrok
Mar 10, 2026