-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Open
Labels
area-System.Text.RegularExpressionsuntriagedNew issue has not been triaged by the area ownerNew issue has not been triaged by the area owner
Description
Analyzed all 17,434 real-world patterns from Regex_RealWorldPatterns.json across three tracks:
- Track A (Tree analysis): Ran smell-detection on reduced trees; found 6 categories of potential improvements. Confirmed existing reductions (Concat-with-Empty, Single-child-Concat/Alternate) are already fully effective.
FinalReduce()(PR Improve regex optimizer through investigation of regex optimizer passes #125289) correctly simplifies 195 patterns (1.1%). - Track B (Source generator codegen): Sampled 872 patterns then analyzed all 17,434. Most codegen "smells" (large code, many gotos) are inherent to pattern complexity; found 639 capture-free patterns that generate unnecessary backtracking infra.
- Track C (Cross-engine comparison with Rust): Compared literal extraction strategies against Rust's
regex-cli. Discovered .NET misses suffix-based literal search — Rust extracts suffixes that could enable SIMD-acceleratedIndexOfon 374+ patterns with no usable prefix.
Priority Ranking (impact × feasibility)
| # | Finding | Perf Win | Cost | Risk | Verdict |
|---|---|---|---|---|---|
| 1 | Suffix search | ★★★★★ (5-30x for FindFirstChar on 374+ patterns) | Medium (200-400 lines, new feature) | Moderate | Best ROI — significant win, manageable scope |
| 2 | Shared-prefix bailout | ★☆☆☆☆ (negligible for 80%, modest for ~50) | Trivial (1 line) | Near zero | Just do it — trivial fix for a clear bug |
| 3 | Redundant-Atomic | ☆☆☆☆☆ (zero measurable) | Trivial (4 lines) | Zero | Just do it — free cleanup |
| 4 | Better atomicity | ★★★☆☆ (potentially large but hard to quantify) | Hard (100-300 lines, complex analysis) | High | Defer — high risk, uncertain reward |
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
area-System.Text.RegularExpressionsuntriagedNew issue has not been triaged by the area ownerNew issue has not been triaged by the area owner