Documentation index for the gossip-rs workspace. This guide covers the distributed
coordination layer, scanner engine, scheduler, and supporting infrastructure.
Document
Focus
Key Concepts
architecture-overview.md
C4-style component diagram
CLI, Engine, Pipeline, Memory, Data Structures
architecture.md
Data flow
Walker → Reader → Scanner → Output, transform worklist
data-types.md
Class diagrams
Key type relationships across crates
pipeline-flow.md
Pipeline execution flow
Discovery, executor model, backpressure
pipeline-state-machine.md
State transitions & termination
Executor termination, scan_local states, worker tasks
git-scanning.md
End-to-end Git scanning pipeline
Pipeline stages, persistence contract, ODB-blob mode
git-pack-execution.md
Git packfile internals
Pack parsing, delta resolution, blob introduction, caching
git-object-store.md
Git object storage layer
OID indexing, pack/loose unification, delta resolution
commit-walking.md
Commit graph traversal
Two-frontier walk, generation ordering, topo sort
runner-orchestration.md
Scan runner lifecycle
Engine adapter, scheduling, finalization
tree-diffing.md
Tree diff algorithm
Merge-walk, canonical ordering, caching, streaming
spill-and-memory.md
Spill & memory management
External sort, arenas, blob introduction, memory budgets
pack-internals.md
Pack file low-level internals
Index lookup, delta chains, inflation, caching, planning
2. Coordination & Distributed Runtime
Document
Module
Description
detection-engine.md
crates/scanner-engine/
Multi-stage pattern matching: anchor scan, window building, regex confirmation
detection-rules.md
crates/scanner-engine/src/rules/
Rule anatomy, anchor strategy, two-phase examples
engine-vectorscan-prefilter.md
crates/scanner-engine/src/engine/vectorscan_prefilter.rs
Database compilation, pattern types, callback mechanism
engine-window-validation.md
crates/scanner-engine/src/engine/window_validate.rs
Gate checks, regex execution, entropy checking
Document
Module
Description
transform-chain.md
crates/scanner-engine/src/engine/transform.rs
Recursive URL/Base64 decode flow, TimingWheel scheduling
engine-transforms.md
crates/scanner-engine/src/engine/transform.rs
URL/Base64 span detection, streaming decode, budget enforcement
engine-stream-decode.md
crates/scanner-engine/src/engine/stream_decode.rs
Streaming decode, ring buffer, timing wheel integration
engine-decode-state.md
crates/scanner-engine/src/engine/decode_state.rs
Decode step arena, provenance tracking, parent-linked chains
Engine Internals & Policy
Document
Module
Description
engine-api-types.md
crates/scanner-engine/src/api.rs
Public API types: RuleSpec, FindingRec, Tuning, gates, transforms
engine-offline-validation.md
crates/scanner-engine/src/engine/offline_validate.rs
Offline structural validators: CRC32, AWS, GitHub PAT, JWT, Slack
regex-to-anchor-extraction.md
crates/scanner-engine/src/regex2anchor.rs
Regex AST → literal anchor extraction for Vectorscan prefiltering
engine-internals.md
crates/scanner-engine/src/engine/{scratch,hit_pool,...}
ScanScratch layout, HitAccPool, VsDbCache, compiled rule repr
content-policy-and-caching.md
crates/scanner-engine/src/{content_policy,b64_yara,...}
Content type detection, YARA base64 gate, set-associative cache
Document
Module
Description
scheduler-task-graph.md
crates/scanner-scheduler/src/scheduler/task_graph.rs
Object lifecycle FSM (enumerate → fetch → scan → done)
scheduler-engine-abstraction.md
crates/scanner-scheduler/src/scheduler/engine_trait.rs
ScanEngine/EngineScratch/FindingRecord traits
scheduler-engine-impl.md
crates/scanner-scheduler/src/scheduler/engine_impl.rs
Real engine adapter, lazy reset, zero-copy extraction
Document
Module
Description
scheduler-remote-backend.md
crates/scanner-scheduler/src/scheduler/remote.rs
HTTP/object-store backend, retry policies
scheduler-local-fs-uring.md
crates/scanner-scheduler/src/scheduler/local_fs_uring.rs
Linux io_uring async I/O, SQE/CQE management
scheduler-ts-buffer-pool.md
crates/scanner-scheduler/src/scheduler/ts_buffer_pool.rs
Thread-safe buffer recycling, work-conserving stealing
scheduler-device-slots.md
crates/scanner-scheduler/src/scheduler/device_slots.rs
Per-device I/O concurrency limits, backpressure
scheduler-global-resource-pool.md
crates/scanner-scheduler/src/scheduler/global_resource_pool.rs
Centralized permits, SLAs, memory management
scheduler-executor.md
crates/scanner-scheduler/src/scheduler/executor.rs
Work-stealing CPU executor, task lifecycle, shutdown
archive-scanning.md
crates/scanner-scheduler/src/archive/
Archive parsing (tar/gzip/bzip2/zip), budget enforcement
6. Memory Management & Formal Verification
Document
Focus
Key Concepts
memory-management.md
Buffer lifecycle & pools
BufferPool, RAII, 8MiB fixed buffers, DecodeSlab, ScanScratch
kani-verification.md
Bounded model checking
80 Kani proofs across 4 crates, Miri, Loom, ASAN
Document
Focus
Key Concepts
simulation-harness.md
Deterministic simulation
FoundationDB-style, VOPR-inspired, fault injection
coordination-testing.md
Coordination test tiers
Isolation, invariant interaction, workflow, randomized
counterexample-testing-unification.md
Counterexample-driven testing
Unified approach across subsystems
scanner_harness_modes.md
Scanner test modes
Mode 1 (synthetic stress) vs Mode 2 (real ruleset)
scanner_test_harness_guide.md
Scanner simulation harness
Corpus replay, random stress, deterministic oracles
scheduler_test_harness_guide.md
Scheduler simulation harness
Work-stealing policy checks, deterministic replay
git_simulation_harness_guide.md
Git simulation harness
Stage model, fault injection, corpus replay
simulation-framework.md
Scanner simulation framework
SimClock, fault injection, mutation testing, minimization
scanner-engine-integration-tests.md
Integration test crate
Test binaries, corpora, feature gates, runner instructions
Document
Focus
eval-harness.md
Precision/recall measurement against labeled corpora, regression gating
8. Consolidation & Parity
9. Shared Infrastructure & Runtime
Document
Crate
Focus
gossip-scanner-runtime.md
crates/gossip-scanner-runtime/
Runtime orchestration: CLI, distributed mode, output sinks
source-families.md
workspace boundary guide
Source-family model: ordered content, Git discovery, mirroring, execution
gossip-worker.md
crates/gossip-worker/
Distributed worker binary: CLI, scan dispatch, tracing
scanner-rs-cli.md
crates/scanner-rs-cli/
Standalone CLI binary: argument parsing, output formats
shard-algebra.md
crates/gossip-frontier/
Shard algebra: key encoding, range arithmetic, hint framing
gossip-stdx.md
crates/gossip-stdx/
Shared data structures: ByteSlab, InlineVec, RingBuffer, TimingWheel, etc.
gossip-persistence-inmemory.md
crates/gossip-persistence-inmemory/
In-memory persistence reference backend: done-ledger, findings sink, fault injection
Reports from benchmark and analysis sessions, stored in findings/ .
Chart assets: assets/charts/ (scan-time, cold-warm-ratio, memory-rss, throughput SVGs).
Vectorscan - Pattern matching library (Hyperscan fork)
Kani - Rust verification tool
Criterion - Benchmarking framework