perf(db): add compound indexes to fix query regression from TS migration#632
perf(db): add compound indexes to fix query regression from TS migration#632carlos-alm wants to merge 3 commits intomainfrom
Conversation
…tion Add migration v16 with idx_edges_kind_target and idx_edges_kind_source indexes. The fan-in/fan-out GROUP BY subqueries used by fnDeps, roles, and triage need kind-first indexes to avoid full-table scans on the edges table. Without them, a 32% growth in graph density (from the TypeScript migration) caused a disproportionate 136% query slowdown. Also optimize classifyNodeRoles (both full and incremental paths) to skip parameter and property nodes from the expensive fan-in/fan-out JOINs. These ~4,100 nodes are always classified as dead-leaf by definition and can never have callers or callees. Measured locally: fan-in subquery ~8ms → 0.43ms, roles query 7ms → 5.3ms.
|
Claude finished @carlos-alm's task —— View job Reviewing PR: Performance indexes for query regression fixTodo List:
Phase 1: Root Cause Analysis ✓Problem: TypeScript migration caused 32% graph growth → 136% fnDeps slowdown due to missing compound indexes Phase 2: Configuration Change Detection ✓HIGH-RISK CHANGES: ✅ NONE - No config/linting/build changes Phase 3: Test Integrity ✓RED FLAGS: ✅ NONE - No test modifications, author reports all 2,128 tests pass Phase 4: Code Quality Review ✓MIGRATION v16: Proper compound index ordering, uses Phase 5: Backlog Compliance Assessment ✓Root Cause AnalysisProblem Being Solved: TypeScript migration regression - 32% graph growth caused disproportionate 136% query slowdown due to missing compound indexes for fan-in/fan-out subqueries. Backlog Compliance
Final Recommendation
Excellent work - this is exactly how performance regressions should be addressed: measure the bottleneck, fix the underlying issue with proper database design, and add algorithmic optimizations based on domain knowledge. |
Greptile SummaryThis PR addresses a 136% Confidence Score: 5/5Safe to merge — performance fix is correct and well-tested with no logic regressions. The compound indexes are correctly ordered for the existing query patterns and IF NOT EXISTS makes the migration idempotent. The classifier optimization is logically sound: parameter/property nodes are guaranteed leaf nodes, idsByRole merges pre-classified and classifier-determined dead-leaf ids without duplication, and the transaction-scoped NULL reset covers all affected node kinds. The prior reviewer concern (unused columns in leafRows) was already addressed. The only remaining finding is a cosmetic corrupted comment character — a non-blocking P2 that does not affect correctness or performance. No files require special attention. Important Files Changed
Sequence DiagramsequenceDiagram
participant C as classifyNodeRoles
participant DB as SQLite DB
Note over C,DB: Full path (classifyNodeRolesFull)
C->>DB: SELECT id FROM nodes WHERE kind IN ('parameter','property')
DB-->>C: leafRows (~4,100 nodes)
C->>DB: SELECT id,name,kind,file,fan_in,fan_out FROM nodes LEFT JOIN edges(kind_target idx) fi LEFT JOIN edges(kind_source idx) fo WHERE kind NOT IN ('file','directory','parameter','property')
DB-->>C: rows (callable nodes only)
Note over C: Pre-assign leafRows to dead-leaf
Note over C: classifyRoles(rows) to roleMap
C->>DB: UPDATE nodes SET role=NULL (global reset)
C->>DB: Batch UPDATE role per chunk for all idsByRole entries
Note over C,DB: Incremental path (classifyNodeRolesIncremental)
C->>DB: Expand affected files via neighbour JOIN
C->>DB: SELECT id WHERE kind IN ('parameter','property') AND file IN (affected)
DB-->>C: leafRows (scoped)
C->>DB: SELECT id,... WHERE kind NOT IN (...,'parameter','property') AND file IN (affected) correlated subqueries use idx_edges_kind_target / idx_edges_kind_source
DB-->>C: rows
Note over C: Pre-assign leafRows to dead-leaf
Note over C: classifyRoles(rows, globalMedians) to roleMap
C->>DB: UPDATE nodes SET role=NULL WHERE file IN (affected) AND kind NOT IN ('file','directory')
C->>DB: Batch UPDATE role per chunk for all idsByRole entries
Reviews (2): Last reviewed commit: "fix: remove unused columns from full-pat..." | Re-trigger Greptile |
src/features/structure.ts
Outdated
| const leafRows = db | ||
| .prepare( | ||
| `SELECT n.id, n.name, n.kind, n.file | ||
| FROM nodes n | ||
| WHERE n.kind IN ('parameter', 'property')`, | ||
| ) | ||
| .all() as { id: number; name: string; kind: string; file: string }[]; |
There was a problem hiding this comment.
Unused columns in full-path
leafRows query
name, kind, and file are fetched here but only id is ever accessed in the processing loop below (for (const row of leafRows) leafIds.push(row.id)). The incremental path correctly fetches only SELECT n.id — the full path can match that.
| const leafRows = db | |
| .prepare( | |
| `SELECT n.id, n.name, n.kind, n.file | |
| FROM nodes n | |
| WHERE n.kind IN ('parameter', 'property')`, | |
| ) | |
| .all() as { id: number; name: string; kind: string; file: string }[]; | |
| const leafRows = db | |
| .prepare( | |
| `SELECT n.id | |
| FROM nodes n | |
| WHERE n.kind IN ('parameter', 'property')`, | |
| ) | |
| .all() as { id: number }[]; |
There was a problem hiding this comment.
Fixed in 6c8cc17. The full-path leafRows query now selects only n.id, matching the lean form already used by the incremental path.
There was a problem hiding this comment.
Fixed in 6c8cc17. The full-path leafRows query now selects only n.id, matching the lean form already used by the incremental path.
The full-path leafRows query selected name, kind, and file columns that were never accessed — only id is used. Align with the incremental path which already uses the lean SELECT n.id form.
Summary
idx_edges_kind_targetandidx_edges_kind_sourcecompound indexes on theedgestable. The fan-in/fan-outGROUP BYsubqueries (used by fnDeps, roles, triage) need kind-first indexes to avoid full-table scans. Without them, the 32% graph growth from the TypeScript migration caused a disproportionate 136% fnDeps slowdown.classifyNodeRoles(both full and incremental paths) to skip ~4,100 parameter/property nodes from the expensive fan-in/fan-out JOINs. These leaf nodes are alwaysdead-leafby definition and can never have callers or callees.Measured locally: fan-in subquery ~8ms → 0.43ms, roles query 7ms → 5.3ms, fnDeps path 8.5ms → ~1ms.
Addresses regressions reported in #625, #626, #627.
Test plan
codegraph build --no-incrementalsucceeds, new indexes createdcodegraph statsshows unchanged node/edge counts