perf(db): add compound indexes to fix query regression from TS migration by carlos-alm · Pull Request #632 · optave/codegraph

carlos-alm · 2026-03-26T08:26:09Z

Summary

Add migration v16 with idx_edges_kind_target and idx_edges_kind_source compound indexes on the edges table. The fan-in/fan-out GROUP BY subqueries (used by fnDeps, roles, triage) need kind-first indexes to avoid full-table scans. Without them, the 32% graph growth from the TypeScript migration caused a disproportionate 136% fnDeps slowdown.
Optimize classifyNodeRoles (both full and incremental paths) to skip ~4,100 parameter/property nodes from the expensive fan-in/fan-out JOINs. These leaf nodes are always dead-leaf by definition and can never have callers or callees.

Measured locally: fan-in subquery ~8ms → 0.43ms, roles query 7ms → 5.3ms, fnDeps path 8.5ms → ~1ms.

Addresses regressions reported in #625, #626, #627.

Test plan

Full test suite passes (2,128 tests, 0 failures)
codegraph build --no-incremental succeeds, new indexes created
codegraph stats shows unchanged node/edge counts
Local benchmark confirms query latency improvement

…tion Add migration v16 with idx_edges_kind_target and idx_edges_kind_source indexes. The fan-in/fan-out GROUP BY subqueries used by fnDeps, roles, and triage need kind-first indexes to avoid full-table scans on the edges table. Without them, a 32% growth in graph density (from the TypeScript migration) caused a disproportionate 136% query slowdown. Also optimize classifyNodeRoles (both full and incremental paths) to skip parameter and property nodes from the expensive fan-in/fan-out JOINs. These ~4,100 nodes are always classified as dead-leaf by definition and can never have callers or callees. Measured locally: fan-in subquery ~8ms → 0.43ms, roles query 7ms → 5.3ms.

claude · 2026-03-26T08:26:24Z

Claude finished @carlos-alm's task —— View job

Reviewing PR: Performance indexes for query regression fix

Todo List:

Read and analyze changed files
Phase 1: Root Cause Analysis
Phase 2: Configuration Change Detection
Phase 3: Test Integrity
Phase 4: Code Quality Review
Phase 5: Backlog Compliance Assessment
Provide structured final assessment

Phase 1: Root Cause Analysis ✓

Problem: TypeScript migration caused 32% graph growth → 136% fnDeps slowdown due to missing compound indexes
Solution: Add idx_edges_kind_target/idx_edges_kind_source indexes + skip 4,100 leaf nodes from expensive JOINs
Risk: Low - purely additive, preserves correctness

Phase 2: Configuration Change Detection ✓

HIGH-RISK CHANGES: ✅ NONE - No config/linting/build changes

Phase 3: Test Integrity ✓

RED FLAGS: ✅ NONE - No test modifications, author reports all 2,128 tests pass

Phase 4: Code Quality Review ✓

MIGRATION v16: Proper compound index ordering, uses IF NOT EXISTS, follows patterns
OPTIMIZATION: Correctly identifies parameter/property as structural leaves, safe 4,100-node skip
TECHNICAL: Index order matches access patterns, incremental path preserved, no behavior changes

Phase 5: Backlog Compliance Assessment ✓

Root Cause Analysis

Problem Being Solved: TypeScript migration regression - 32% graph growth caused disproportionate 136% query slowdown due to missing compound indexes for fan-in/fan-out subqueries.
Why This Approach: Database-first solution targeting the actual bottleneck (index scans) plus algorithmic optimization (skip leaf nodes that can't participate in call relationships).
Risk Assessment: Minimal - purely additive schema changes + logically sound optimization that preserves all semantics.

Backlog Compliance

Zero-dep: ✅ - No new runtime dependencies, builds on existing SQLite/tree-sitter infrastructure
Foundation-aligned: ✅ - Pure performance optimization, no cloud APIs, maintains offline-first principle
Problem-fit: 5 - Directly addresses core issue: prevents AI agents from experiencing slow codegraph queries that waste tokens during re-orientation in large codebases
Breaking: No - Schema migrations are backward compatible, no API changes, preserves all existing behavior
Tier: 1 (zero-dep ✅ + foundation-aligned ✅)

Final Recommendation

Rating: ⭐⭐⭐⭐⭐ (5/5)
Action: APPROVE
Reasoning: Textbook performance fix addressing a real regression with proper root cause analysis. Database indexes are correctly ordered for access patterns, optimization correctly identifies structural properties of parameter/property nodes, no correctness risks, comprehensive test validation, and directly improves AI agent experience by preventing query slowdowns.

Excellent work - this is exactly how performance regressions should be addressed: measure the bottleneck, fix the underlying issue with proper database design, and add algorithmic optimizations based on domain knowledge.

greptile-apps · 2026-03-26T08:30:50Z

Greptile Summary

This PR addresses a 136% fnDeps latency regression introduced by the TypeScript-migration graph growth by adding two compound indexes and optimizing the node role classifier.\n\n- src/db/migrations.ts — Migration v16 adds idx_edges_kind_target ON edges(kind, target_id) and idx_edges_kind_source ON edges(kind, source_id). Placing kind first is the right choice: the fan-in/fan-out GROUP BY subqueries (full path) and the correlated WHERE kind = 'calls' AND target_id = n.id subqueries (incremental path) both benefit from a leading kind prefix that eliminates the full-table scans responsible for the regression.\n- src/features/structure.ts — Both classifyNodeRolesFull and classifyNodeRolesIncremental now fast-path parameter/property nodes as dead-leaf without executing the expensive fan-in/fan-out JOINs. The idsByRole map is seeded with these leaf ids before the main classifier loop, and the subsequent loop correctly appends any classifier-determined dead-leaf nodes to the same array rather than creating a duplicate entry. The incremental scoped NULL-reset (WHERE file IN (...) AND kind NOT IN ('file', 'directory')) still covers parameter/property nodes, so they receive their updated role correctly in the same transaction.\n\nThe only non-blocking finding is a corrupted UTF-8 replacement character in a comment on the incremental path — a cosmetic artifact with no runtime impact.

Confidence Score: 5/5

Safe to merge — performance fix is correct and well-tested with no logic regressions.

The compound indexes are correctly ordered for the existing query patterns and IF NOT EXISTS makes the migration idempotent. The classifier optimization is logically sound: parameter/property nodes are guaranteed leaf nodes, idsByRole merges pre-classified and classifier-determined dead-leaf ids without duplication, and the transaction-scoped NULL reset covers all affected node kinds. The prior reviewer concern (unused columns in leafRows) was already addressed. The only remaining finding is a cosmetic corrupted comment character — a non-blocking P2 that does not affect correctness or performance.

No files require special attention.

Important Files Changed

Filename	Overview
src/db/migrations.ts	Adds migration v16 with two compound indexes (kind, target_id) and (kind, source_id) on the edges table; index column order correctly places `kind` first to cover the WHERE/GROUP BY patterns used in both the full-path JOIN subqueries and the incremental correlated subqueries.
src/features/structure.ts	Optimizes both full and incremental classifyNodeRoles paths by short-circuiting ~4,100 parameter/property nodes as dead-leaf without fan-in/fan-out JOINs; logic is correct — idsByRole merges leaf pre-classified ids with classifier-assigned dead-leaf ids safely, and the incremental NULL reset still covers parameter/property nodes. One corrupted UTF-8 comment character at line 600.

Sequence Diagram

sequenceDiagram
    participant C as classifyNodeRoles
    participant DB as SQLite DB

    Note over C,DB: Full path (classifyNodeRolesFull)
    C->>DB: SELECT id FROM nodes WHERE kind IN ('parameter','property')
    DB-->>C: leafRows (~4,100 nodes)
    C->>DB: SELECT id,name,kind,file,fan_in,fan_out FROM nodes LEFT JOIN edges(kind_target idx) fi LEFT JOIN edges(kind_source idx) fo WHERE kind NOT IN ('file','directory','parameter','property')
    DB-->>C: rows (callable nodes only)
    Note over C: Pre-assign leafRows to dead-leaf
    Note over C: classifyRoles(rows) to roleMap
    C->>DB: UPDATE nodes SET role=NULL (global reset)
    C->>DB: Batch UPDATE role per chunk for all idsByRole entries

    Note over C,DB: Incremental path (classifyNodeRolesIncremental)
    C->>DB: Expand affected files via neighbour JOIN
    C->>DB: SELECT id WHERE kind IN ('parameter','property') AND file IN (affected)
    DB-->>C: leafRows (scoped)
    C->>DB: SELECT id,... WHERE kind NOT IN (...,'parameter','property') AND file IN (affected) correlated subqueries use idx_edges_kind_target / idx_edges_kind_source
    DB-->>C: rows
    Note over C: Pre-assign leafRows to dead-leaf
    Note over C: classifyRoles(rows, globalMedians) to roleMap
    C->>DB: UPDATE nodes SET role=NULL WHERE file IN (affected) AND kind NOT IN ('file','directory')
    C->>DB: Batch UPDATE role per chunk for all idsByRole entries

_{Reviews (2): Last reviewed commit: "fix: remove unused columns from full-pat..." | Re-trigger Greptile}

greptile-apps · 2026-03-26T08:30:53Z

src/features/structure.ts

+  const leafRows = db
+    .prepare(
+      `SELECT n.id, n.name, n.kind, n.file
+      FROM nodes n
+      WHERE n.kind IN ('parameter', 'property')`,
+    )
+    .all() as { id: number; name: string; kind: string; file: string }[];


Unused columns in full-path leafRows query

name, kind, and file are fetched here but only id is ever accessed in the processing loop below (for (const row of leafRows) leafIds.push(row.id)). The incremental path correctly fetches only SELECT n.id — the full path can match that.

Suggested change

const leafRows = db

.prepare(

`SELECT n.id, n.name, n.kind, n.file

FROM nodes n

WHERE n.kind IN ('parameter', 'property')`,

)

.all() as { id: number; name: string; kind: string; file: string }[];

const leafRows = db

.prepare(

`SELECT n.id

FROM nodes n

WHERE n.kind IN ('parameter', 'property')`,

)

.all() as { id: number }[];

Fixed in 6c8cc17. The full-path leafRows query now selects only n.id, matching the lean form already used by the incremental path.

The full-path leafRows query selected name, kind, and file columns that were never accessed — only id is used. Align with the incremental path which already uses the lean SELECT n.id form.

carlos-alm · 2026-03-26T08:47:00Z

@greptileai

greptile-apps bot reviewed Mar 26, 2026

View reviewed changes

carlos-alm added 2 commits March 26, 2026 02:37

Merge branch 'main' into perf/benchmark-regression-indexes

9080b1a

fix: remove unused columns from full-path leafRows query (#632)

6c8cc17

The full-path leafRows query selected name, kind, and file columns that were never accessed — only id is used. Align with the incremental path which already uses the lean SELECT n.id form.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(db): add compound indexes to fix query regression from TS migration#632

perf(db): add compound indexes to fix query regression from TS migration#632
carlos-alm wants to merge 3 commits intomainfrom
perf/benchmark-regression-indexes

carlos-alm commented Mar 26, 2026

Uh oh!

claude bot commented Mar 26, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Mar 26, 2026 •

edited

Loading

Uh oh!

greptile-apps bot Mar 26, 2026

Uh oh!

carlos-alm Mar 26, 2026

Uh oh!

carlos-alm Mar 26, 2026

Uh oh!

carlos-alm commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

carlos-alm commented Mar 26, 2026

Summary

Test plan

Uh oh!

claude bot commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewing PR: Performance indexes for query regression fix

Phase 1: Root Cause Analysis ✓

Phase 2: Configuration Change Detection ✓

Phase 3: Test Integrity ✓

Phase 4: Code Quality Review ✓

Phase 5: Backlog Compliance Assessment ✓

Root Cause Analysis

Backlog Compliance

Final Recommendation

Uh oh!

greptile-apps bot commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude bot commented Mar 26, 2026 •

edited

Loading

greptile-apps bot commented Mar 26, 2026 •

edited

Loading