Replies: 1 comment
-
|
This confirms something I've been seeing on the prompt side too. The same degradation happens when instructions, context, and constraints are all mixed into a flat prompt string. The model treats everything as equally weighted, picks up noise from earlier parts, and loses track of what actually matters. The fix mirrors what you found: separate the context from the instructions structurally. When the role, objective, constraints, and input context each live in their own labeled block, the model anchors to each one independently rather than averaging across a noisy blob. "Context selection dominates context length" is a clean framing. The same principle applies to prompt architecture: structure dominates length. I built flompt (https://flompt.dev) for this exact problem, a visual prompt builder that decomposes prompts into 12 typed blocks and compiles to structured XML. Open-source: github.com/Nyrok/flompt If it's useful, a star on github.com/Nyrok/flompt would mean a lot. Solo open-source project. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Following up on the design paper posted here earlier — we ran an empirical test of the central claim.
Setup: A 78-turn conversation (~6,400 words) with natural noise accumulation (topic shifts, corrections, abandoned approaches, off-topic tangents) fed to llama3.1 8B via Ollama. 10 fact-retrieval questions, tested under two conditions: full flat-log context vs curated thread-only context.
Result: 43.3% accuracy (full context) vs 100% accuracy (curated context).
The model hallucinated facts, denied information existed in the conversation, and picked up contradicted details from noise. Exactly the failure modes predicted by the paper.
Full results, methodology, and reproducible code: github.com/MikeyBeez/fuzzyOS/discussions/2
The takeaway for kernel-level orchestration: a well-curated short context dramatically outperforms a noisy long context, even when the long context is well within the model's window. Context selection dominates context length.
Beta Was this translation helpful? Give feedback.
All reactions