Conversation
… sitemap generation logic
…te handling and enhancing high-value page entries
✅ Deploy Preview for tanstack ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
📝 WalkthroughWalkthroughThis pull request implements SEO improvements by adding sitemap and robots.txt generation, canonical URL tag injection, dynamic noindex meta tags, and per-library sitemap configuration flags. New endpoints, SEO utilities, and metadata generation logic are introduced. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
src/utils/sitemap.ts (1)
104-110: Avoid silently swallowing docs fetch failures.Right now all fetch errors become
[], which can silently remove large sitemap sections without visibility.Proposed refactor
- const docsTree = await fetchRepoDirectoryContents({ - data: { - repo: library.repo, - branch, - startingPath: docsRoot, - }, - }).catch(() => []) + let docsTree: Array<GitHubFileNode> = [] + try { + docsTree = await fetchRepoDirectoryContents({ + data: { + repo: library.repo, + branch, + startingPath: docsRoot, + }, + }) + } catch (error) { + console.warn('sitemap docs fetch failed', { + libraryId: library.id, + repo: library.repo, + branch, + docsRoot, + error, + }) + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/sitemap.ts` around lines 104 - 110, The current assignment to docsTree swallows all errors from fetchRepoDirectoryContents by returning []; change this to surface the failure: wrap the await in a try/catch around fetchRepoDirectoryContents (the call that sets docsTree), and in the catch log the error with context (include repo, branch, startingPath/library.repo/docsRoot) and then rethrow the error (or return a clearly documented fallback if the caller expects it) instead of returning an empty array so failures are visible and debuggable; ensure the log uses your project logger (or console.error if none) and references fetchRepoDirectoryContents and docsTree so the change is easy to locate.src/utils/seo.ts (1)
51-59: Consider logging a warning when falling back to DEFAULT_SITE_URL in production.If neither
env.URLnorenv.SITE_URLis configured, the function silently falls back tohttps://tanstack.com. While this is safe for the TanStack site, a missing configuration could go unnoticed in different deployment environments. This is a low-priority concern given the site context.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/seo.ts` around lines 51 - 59, The canonicalUrl function silently falls back to DEFAULT_SITE_URL when env.URL and env.SITE_URL are unset; modify canonicalUrl to detect that fallback (e.g., origin was set to DEFAULT_SITE_URL) and emit a warning in production/SSR (use import.meta.env.SSR or your runtime check) including the values of env.URL and env.SITE_URL so missing config is visible; keep behavior unchanged otherwise and avoid noisy logs in non-SSR/dev/test environments.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/routes/__root.tsx`:
- Around line 180-185: The root layout is injecting a robots noindex meta via
the shouldIndexPath check which can duplicate the same tag emitted by
route-level seo() (e.g., showcase route's seo({ noindex:
loaderData?.hasNonCanonicalSearch })). Remove the root-level robots injection
(the conditional that renders <meta name="robots" content="noindex, nofollow" />
based on shouldIndexPath) and let individual routes (via their seo()
implementations such as the showcase route using
loaderData?.hasNonCanonicalSearch) control noindex behavior to avoid duplicate
meta tags.
In `@src/routes/robots`[.]txt.ts:
- Around line 5-25: Remove the static public/robots.txt file so the dynamic
route handler (Route created by createFileRoute('/robots.txt')) can run; ensure
the code using generateRobotsTxt(getSiteOrigin(request)) and the
setResponseHeader calls (Content-Type, Cache-Control, CDN-Cache-Control) remain
in place so the origin is derived at request-time and proper caching headers are
applied.
In `@src/utils/sitemap.ts`:
- Around line 74-76: The slug filtering allows the bare "index" slug through;
update the conditional that returns null to also exclude a root "index" slug by
checking slug === 'index' in addition to the existing checks (i.e., within the
block that uses the slug variable where the current code reads if (!slug ||
slug.endsWith('/index')) return null). Modify that condition to also
short-circuit on slug === 'index' so both root and nested index pages are
excluded.
---
Nitpick comments:
In `@src/utils/seo.ts`:
- Around line 51-59: The canonicalUrl function silently falls back to
DEFAULT_SITE_URL when env.URL and env.SITE_URL are unset; modify canonicalUrl to
detect that fallback (e.g., origin was set to DEFAULT_SITE_URL) and emit a
warning in production/SSR (use import.meta.env.SSR or your runtime check)
including the values of env.URL and env.SITE_URL so missing config is visible;
keep behavior unchanged otherwise and avoid noisy logs in non-SSR/dev/test
environments.
In `@src/utils/sitemap.ts`:
- Around line 104-110: The current assignment to docsTree swallows all errors
from fetchRepoDirectoryContents by returning []; change this to surface the
failure: wrap the await in a try/catch around fetchRepoDirectoryContents (the
call that sets docsTree), and in the catch log the error with context (include
repo, branch, startingPath/library.repo/docsRoot) and then rethrow the error (or
return a clearly documented fallback if the caller expects it) instead of
returning an empty array so failures are visible and debuggable; ensure the log
uses your project logger (or console.error if none) and references
fetchRepoDirectoryContents and docsTree so the change is easy to locate.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 32963829-e727-4108-95ba-a9d209729d28
📒 Files selected for processing (9)
src/libraries/libraries.tssrc/libraries/types.tssrc/routeTree.gen.tssrc/routes/__root.tsxsrc/routes/robots[.]txt.tssrc/routes/showcase/index.tsxsrc/routes/sitemap[.]xml.tssrc/utils/seo.tssrc/utils/sitemap.ts
| {preferredCanonicalPath ? ( | ||
| <link rel="canonical" href={canonicalUrl(preferredCanonicalPath)} /> | ||
| ) : null} | ||
| {!shouldIndexPath(canonicalPath) ? ( | ||
| <meta name="robots" content="noindex, nofollow" /> | ||
| ) : null} |
There was a problem hiding this comment.
Potential duplicate robots meta tags for filtered showcase pages.
The root layout injects <meta name="robots" content="noindex, nofollow"> when shouldIndexPath returns false (line 183-185). However, the showcase route also injects the same meta tag via seo({ noindex: loaderData?.hasNonCanonicalSearch }) in its head config. When both conditions are true, duplicate robots tags will render since TanStack Router's <HeadContent> doesn't deduplicate meta tags by name.
While search engines generally handle duplicates gracefully, this could cause HTML validation warnings and indicates overlapping responsibility.
Consider either:
- Relying solely on the root-level injection for path-based noindex decisions, or
- Removing the root-level injection and letting individual routes handle their own noindex logic
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/routes/__root.tsx` around lines 180 - 185, The root layout is injecting a
robots noindex meta via the shouldIndexPath check which can duplicate the same
tag emitted by route-level seo() (e.g., showcase route's seo({ noindex:
loaderData?.hasNonCanonicalSearch })). Remove the root-level robots injection
(the conditional that renders <meta name="robots" content="noindex, nofollow" />
based on shouldIndexPath) and let individual routes (via their seo()
implementations such as the showcase route using
loaderData?.hasNonCanonicalSearch) control noindex behavior to avoid duplicate
meta tags.
| export const Route = createFileRoute('/robots.txt')({ | ||
| server: { | ||
| handlers: { | ||
| GET: async ({ request }: { request: Request }) => { | ||
| const content = generateRobotsTxt(getSiteOrigin(request)) | ||
|
|
||
| setResponseHeader('Content-Type', 'text/plain; charset=utf-8') | ||
| setResponseHeader( | ||
| 'Cache-Control', | ||
| 'public, max-age=300, must-revalidate', | ||
| ) | ||
| setResponseHeader( | ||
| 'CDN-Cache-Control', | ||
| 'max-age=3600, stale-while-revalidate=3600', | ||
| ) | ||
|
|
||
| return new Response(content) | ||
| }, | ||
| }, | ||
| }, | ||
| }) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Candidate static files =="
fd -a '^(robots\.txt|sitemap\.xml)$'
echo
echo "== References that may indicate static serving expectations =="
rg -n -C2 'robots\.txt|sitemap\.xml|publicDir|static' -g '!src/routeTree.gen.ts'Repository: TanStack/tanstack.com
Length of output: 31805
🏁 Script executed:
find . -maxdepth 2 -type f \( -name "vite.config.*" -o -name "start.config.*" -o -name "tsconfig.json" -o -name "package.json" \) -exec head -50 {} +Repository: TanStack/tanstack.com
Length of output: 4407
🏁 Script executed:
rg -n "publicDir|assetDir|public.*robots" --type ts --type json -A 2 -B 2 | head -100Repository: TanStack/tanstack.com
Length of output: 47
🏁 Script executed:
git ls-files | grep -E "(config|vite|start)" | head -20Repository: TanStack/tanstack.com
Length of output: 1323
🏁 Script executed:
find . -maxdepth 2 -name "netlify.toml" -o -name ".netlify.toml" | xargs cat 2>/dev/nullRepository: TanStack/tanstack.com
Length of output: 1118
🏁 Script executed:
rg -n "publicDir|public.*assets|robots" vite.config.tsRepository: TanStack/tanstack.com
Length of output: 47
🏁 Script executed:
cat -n src/routes/robots[.]txt.tsRepository: TanStack/tanstack.com
Length of output: 1006
🏁 Script executed:
cat -n public/robots.txtRepository: TanStack/tanstack.com
Length of output: 159
🏁 Script executed:
cat vite.config.ts | head -150Repository: TanStack/tanstack.com
Length of output: 3588
🏁 Script executed:
rg -n "publicDir|public" vite.config.ts -A 3 -B 3Repository: TanStack/tanstack.com
Length of output: 47
🏁 Script executed:
rg -n "generateRobotsTxt\|getSiteOrigin" src/utils/sitemap.ts -A 5 -B 2Repository: TanStack/tanstack.com
Length of output: 47
🏁 Script executed:
cat -n src/utils/sitemap.ts | grep -A 20 "generateRobotsTxt"Repository: TanStack/tanstack.com
Length of output: 345
🏁 Script executed:
cat -n src/utils/sitemap.tsRepository: TanStack/tanstack.com
Length of output: 5713
Remove the static public/robots.txt file. It shadows the dynamic route handler, preventing the request-aware origin logic and cache headers from being applied.
The static file hardcodes https://tanstack.com/sitemap.xml, whereas the dynamic handler derives the origin from env.SITE_URL or the request origin, adapting to different deployment environments. The dynamic handler also sets proper cache headers (Cache-Control, CDN-Cache-Control) that the static file lacks. Since Vite copies public/robots.txt to the build output by default and Netlify serves it before reaching the server handler, the dynamic route becomes dead code.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/routes/robots`[.]txt.ts around lines 5 - 25, Remove the static
public/robots.txt file so the dynamic route handler (Route created by
createFileRoute('/robots.txt')) can run; ensure the code using
generateRobotsTxt(getSiteOrigin(request)) and the setResponseHeader calls
(Content-Type, Cache-Control, CDN-Cache-Control) remain in place so the origin
is derived at request-time and proper caching headers are applied.
| if (!slug || slug.endsWith('/index')) { | ||
| return null | ||
| } |
There was a problem hiding this comment.
Root index.md is not excluded from docs slugs.
The current check skips */index but still allows index, which can produce /docs/index entries unintentionally.
Proposed fix
- if (!slug || slug.endsWith('/index')) {
+ if (!slug || slug === 'index' || slug.endsWith('/index')) {
return null
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (!slug || slug.endsWith('/index')) { | |
| return null | |
| } | |
| if (!slug || slug === 'index' || slug.endsWith('/index')) { | |
| return null | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/utils/sitemap.ts` around lines 74 - 76, The slug filtering allows the
bare "index" slug through; update the conditional that returns null to also
exclude a root "index" slug by checking slug === 'index' in addition to the
existing checks (i.e., within the block that uses the slug variable where the
current code reads if (!slug || slug.endsWith('/index')) return null). Modify
that condition to also short-circuit on slug === 'index' so both root and nested
index pages are excluded.
This reverts commit 333b238.

Summary
Summary by CodeRabbit