feat: sitemap / robots.txt by LadyBluenotes · Pull Request #783 · TanStack/tanstack.com

LadyBluenotes · 2026-03-26T01:10:13Z

Summary

add an automated SEO-focused sitemap.xml and robots.txt backed by real site/content data instead of manual URL lists
centralize canonical and indexing policy so preferred URLs consistently resolve to latest, faceted pages avoid indexing, and private/auth pages are excluded from search
tighten sitemap scope to high-value non-doc pages, published blog posts, selected library landing pages, and dynamically discovered shallow docs pages with no frontmatter maintenance burden

Summary by CodeRabbit

New Features
- Added dynamic sitemap and robots.txt generation for improved search engine discovery.
- Implemented canonical URL management to prevent duplicate content issues.
- Added intelligent indexing control—pages with non-canonical filters are automatically marked as noindex.
- Per-library configuration options for sitemap inclusion of landing pages and documentation.

… sitemap generation logic

…te handling and enhancing high-value page entries

netlify · 2026-03-26T01:10:19Z

✅ Deploy Preview for tanstack ready!

Name	Link
🔨 Latest commit	`dee7519`
🔍 Latest deploy log	https://app.netlify.com/projects/tanstack/deploys/69c48780eeb53f0009b576b9
😎 Deploy Preview	https://deploy-preview-783--tanstack.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.
Lighthouse	1 paths audited Performance: 36 (🔴 down 23 from production) Accessibility: 90 (no change from production) Best Practices: 83 (🔴 down 9 from production) SEO: 97 (no change from production) PWA: 70 (no change from production) View the detailed breakdown and full score reports

To edit notification comments on pull requests, go to your Netlify project configuration.

coderabbitai · 2026-03-26T01:10:32Z

📝 Walkthrough

Walkthrough

This pull request implements SEO improvements by adding sitemap and robots.txt generation, canonical URL tag injection, dynamic noindex meta tags, and per-library sitemap configuration flags. New endpoints, SEO utilities, and metadata generation logic are introduced.

Changes

Cohort / File(s)	Summary
Library Configuration `src/libraries/types.ts`, `src/libraries/libraries.ts`	Added optional `sitemap` configuration object to `LibrarySlim` type with `includeLandingPage` and `includeTopLevelDocsPages` flags. Updated all major library exports (`query`, `router`, `start`, `table`, `form`, `db`) to include these sitemap settings.
Route Definitions `src/routeTree.gen.ts`	Extended generated route tree with new `/sitemap.xml` and `/robots.txt` file routes, including corresponding TypeScript route type definitions and module augmentations.
SEO Utilities `src/utils/seo.ts`	Added `getCanonicalPath`, `shouldIndexPath`, and `canonicalUrl` functions for canonical path computation, indexability filtering, and URL generation. Refactored `seo` function parameter type to use new `SeoOptions` type.
Sitemap Generation `src/utils/sitemap.ts`	New module providing `getSitemapEntries`, `generateSitemapXml`, and `generateRobotsTxt` functions. Includes XML escaping, origin normalization, and logic to traverse documentation repositories and aggregate library landing pages, docs, and blog posts.
Sitemap Route Handler `src/routes/sitemap[.]xml.ts`	New file route for `/sitemap.xml` with server-side GET handler that generates sitemap XML with cache-control headers (300s CDN, 3600s stale-while-revalidate).
Robots Route Handler `src/routes/robots[.]txt.ts`	New file route for `/robots.txt` with server-side GET handler that generates robots.txt content with matching cache-control configuration.
Root Layout Updates `src/routes/__root.tsx`	Modified `ShellComponent` to derive current canonical path from router state and conditionally inject `<link rel="canonical">` and `<meta name="robots" content="noindex, nofollow">` tags based on computed path preferences.
Showcase Route `src/routes/showcase/index.tsx`	Added `hasNonCanonicalSearch` helper to detect non-default search parameters. Updated route loader to return this flag and modified head function to set `noindex` dynamically based on search canonicality.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 A sitemap spun from whisker to tail,
Canonical paths that shall never fail,
Robots and sitemaps now taking flight,
SEO sparkles burning oh-so-bright! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title "feat: sitemap / robots.txt" directly and specifically describes the main changes—adding sitemap and robots.txt functionality to the codebase.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch sitemap

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (2)

src/utils/sitemap.ts (1)

104-110: Avoid silently swallowing docs fetch failures.

Right now all fetch errors become [], which can silently remove large sitemap sections without visibility.

Proposed refactor

-  const docsTree = await fetchRepoDirectoryContents({
-    data: {
-      repo: library.repo,
-      branch,
-      startingPath: docsRoot,
-    },
-  }).catch(() => [])
+  let docsTree: Array<GitHubFileNode> = []
+  try {
+    docsTree = await fetchRepoDirectoryContents({
+      data: {
+        repo: library.repo,
+        branch,
+        startingPath: docsRoot,
+      },
+    })
+  } catch (error) {
+    console.warn('sitemap docs fetch failed', {
+      libraryId: library.id,
+      repo: library.repo,
+      branch,
+      docsRoot,
+      error,
+    })
+  }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/utils/sitemap.ts` around lines 104 - 110, The current assignment to
docsTree swallows all errors from fetchRepoDirectoryContents by returning [];
change this to surface the failure: wrap the await in a try/catch around
fetchRepoDirectoryContents (the call that sets docsTree), and in the catch log
the error with context (include repo, branch,
startingPath/library.repo/docsRoot) and then rethrow the error (or return a
clearly documented fallback if the caller expects it) instead of returning an
empty array so failures are visible and debuggable; ensure the log uses your
project logger (or console.error if none) and references
fetchRepoDirectoryContents and docsTree so the change is easy to locate.

src/utils/seo.ts (1)

51-59: Consider logging a warning when falling back to DEFAULT_SITE_URL in production.

If neither env.URL nor env.SITE_URL is configured, the function silently falls back to https://tanstack.com. While this is safe for the TanStack site, a missing configuration could go unnoticed in different deployment environments. This is a low-priority concern given the site context.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/seo.ts` around lines 51 - 59, The canonicalUrl function silently
falls back to DEFAULT_SITE_URL when env.URL and env.SITE_URL are unset; modify
canonicalUrl to detect that fallback (e.g., origin was set to DEFAULT_SITE_URL)
and emit a warning in production/SSR (use import.meta.env.SSR or your runtime
check) including the values of env.URL and env.SITE_URL so missing config is
visible; keep behavior unchanged otherwise and avoid noisy logs in
non-SSR/dev/test environments.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/routes/__root.tsx`:
- Around line 180-185: The root layout is injecting a robots noindex meta via
the shouldIndexPath check which can duplicate the same tag emitted by
route-level seo() (e.g., showcase route's seo({ noindex:
loaderData?.hasNonCanonicalSearch })). Remove the root-level robots injection
(the conditional that renders <meta name="robots" content="noindex, nofollow" />
based on shouldIndexPath) and let individual routes (via their seo()
implementations such as the showcase route using
loaderData?.hasNonCanonicalSearch) control noindex behavior to avoid duplicate
meta tags.

In `@src/routes/robots`[.]txt.ts:
- Around line 5-25: Remove the static public/robots.txt file so the dynamic
route handler (Route created by createFileRoute('/robots.txt')) can run; ensure
the code using generateRobotsTxt(getSiteOrigin(request)) and the
setResponseHeader calls (Content-Type, Cache-Control, CDN-Cache-Control) remain
in place so the origin is derived at request-time and proper caching headers are
applied.

In `@src/utils/sitemap.ts`:
- Around line 74-76: The slug filtering allows the bare "index" slug through;
update the conditional that returns null to also exclude a root "index" slug by
checking slug === 'index' in addition to the existing checks (i.e., within the
block that uses the slug variable where the current code reads if (!slug ||
slug.endsWith('/index')) return null). Modify that condition to also
short-circuit on slug === 'index' so both root and nested index pages are
excluded.

---

Nitpick comments:
In `@src/utils/seo.ts`:
- Around line 51-59: The canonicalUrl function silently falls back to
DEFAULT_SITE_URL when env.URL and env.SITE_URL are unset; modify canonicalUrl to
detect that fallback (e.g., origin was set to DEFAULT_SITE_URL) and emit a
warning in production/SSR (use import.meta.env.SSR or your runtime check)
including the values of env.URL and env.SITE_URL so missing config is visible;
keep behavior unchanged otherwise and avoid noisy logs in non-SSR/dev/test
environments.

In `@src/utils/sitemap.ts`:
- Around line 104-110: The current assignment to docsTree swallows all errors
from fetchRepoDirectoryContents by returning []; change this to surface the
failure: wrap the await in a try/catch around fetchRepoDirectoryContents (the
call that sets docsTree), and in the catch log the error with context (include
repo, branch, startingPath/library.repo/docsRoot) and then rethrow the error (or
return a clearly documented fallback if the caller expects it) instead of
returning an empty array so failures are visible and debuggable; ensure the log
uses your project logger (or console.error if none) and references
fetchRepoDirectoryContents and docsTree so the change is easy to locate.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 32963829-e727-4108-95ba-a9d209729d28

📥 Commits

Reviewing files that changed from the base of the PR and between f5c6166 and dee7519.

📒 Files selected for processing (9)

src/libraries/libraries.ts
src/libraries/types.ts
src/routeTree.gen.ts
src/routes/__root.tsx
src/routes/robots[.]txt.ts
src/routes/showcase/index.tsx
src/routes/sitemap[.]xml.ts
src/utils/seo.ts
src/utils/sitemap.ts

coderabbitai · 2026-03-26T01:19:44Z

src/routes/__root.tsx

+        {preferredCanonicalPath ? (
+          <link rel="canonical" href={canonicalUrl(preferredCanonicalPath)} />
+        ) : null}
+        {!shouldIndexPath(canonicalPath) ? (
+          <meta name="robots" content="noindex, nofollow" />
+        ) : null}


⚠️ Potential issue | 🟡 Minor

Potential duplicate robots meta tags for filtered showcase pages.

The root layout injects <meta name="robots" content="noindex, nofollow"> when shouldIndexPath returns false (line 183-185). However, the showcase route also injects the same meta tag via seo({ noindex: loaderData?.hasNonCanonicalSearch }) in its head config. When both conditions are true, duplicate robots tags will render since TanStack Router's <HeadContent> doesn't deduplicate meta tags by name.

While search engines generally handle duplicates gracefully, this could cause HTML validation warnings and indicates overlapping responsibility.

Consider either:

Relying solely on the root-level injection for path-based noindex decisions, or

Removing the root-level injection and letting individual routes handle their own noindex logic

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/routes/__root.tsx` around lines 180 - 185, The root layout is injecting a robots noindex meta via the shouldIndexPath check which can duplicate the same tag emitted by route-level seo() (e.g., showcase route's seo({ noindex: loaderData?.hasNonCanonicalSearch })). Remove the root-level robots injection (the conditional that renders <meta name="robots" content="noindex, nofollow" /> based on shouldIndexPath) and let individual routes (via their seo() implementations such as the showcase route using loaderData?.hasNonCanonicalSearch) control noindex behavior to avoid duplicate meta tags.

coderabbitai · 2026-03-26T01:19:44Z

src/routes/robots[.]txt.ts

+export const Route = createFileRoute('/robots.txt')({
+  server: {
+    handlers: {
+      GET: async ({ request }: { request: Request }) => {
+        const content = generateRobotsTxt(getSiteOrigin(request))
+
+        setResponseHeader('Content-Type', 'text/plain; charset=utf-8')
+        setResponseHeader(
+          'Cache-Control',
+          'public, max-age=300, must-revalidate',
+        )
+        setResponseHeader(
+          'CDN-Cache-Control',
+          'max-age=3600, stale-while-revalidate=3600',
+        )
+
+        return new Response(content)
+      },
+    },
+  },
+})


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== Candidate static files ==" fd -a '^(robots\.txt|sitemap\.xml)$' echo echo "== References that may indicate static serving expectations ==" rg -n -C2 'robots\.txt|sitemap\.xml|publicDir|static' -g '!src/routeTree.gen.ts'

Repository: TanStack/tanstack.com

Length of output: 31805

🏁 Script executed:

find . -maxdepth 2 -type f $ -name "vite.config.*" -o -name "start.config.*" -o -name "tsconfig.json" -o -name "package.json" $ -exec head -50 {} +

Repository: TanStack/tanstack.com

Length of output: 4407

🏁 Script executed:

rg -n "publicDir|assetDir|public.*robots" --type ts --type json -A 2 -B 2 | head -100

Repository: TanStack/tanstack.com

Length of output: 47

🏁 Script executed:

git ls-files | grep -E "(config|vite|start)" | head -20

Repository: TanStack/tanstack.com

Length of output: 1323

🏁 Script executed:

find . -maxdepth 2 -name "netlify.toml" -o -name ".netlify.toml" | xargs cat 2>/dev/null

Repository: TanStack/tanstack.com

Length of output: 1118

🏁 Script executed:

rg -n "publicDir|public.*assets|robots" vite.config.ts

Repository: TanStack/tanstack.com

Length of output: 47

🏁 Script executed:

cat -n src/routes/robots[.]txt.ts

Repository: TanStack/tanstack.com

Length of output: 1006

🏁 Script executed:

cat -n public/robots.txt

Repository: TanStack/tanstack.com

Length of output: 159

🏁 Script executed:

cat vite.config.ts | head -150

Repository: TanStack/tanstack.com

Length of output: 3588

🏁 Script executed:

rg -n "publicDir|public" vite.config.ts -A 3 -B 3

Repository: TanStack/tanstack.com

Length of output: 47

🏁 Script executed:

rg -n "generateRobotsTxt\|getSiteOrigin" src/utils/sitemap.ts -A 5 -B 2

Repository: TanStack/tanstack.com

Length of output: 47

🏁 Script executed:

cat -n src/utils/sitemap.ts | grep -A 20 "generateRobotsTxt"

Repository: TanStack/tanstack.com

Length of output: 345

🏁 Script executed:

cat -n src/utils/sitemap.ts

Repository: TanStack/tanstack.com

Length of output: 5713

Remove the static public/robots.txt file. It shadows the dynamic route handler, preventing the request-aware origin logic and cache headers from being applied.

The static file hardcodes https://tanstack.com/sitemap.xml, whereas the dynamic handler derives the origin from env.SITE_URL or the request origin, adapting to different deployment environments. The dynamic handler also sets proper cache headers (Cache-Control, CDN-Cache-Control) that the static file lacks. Since Vite copies public/robots.txt to the build output by default and Netlify serves it before reaching the server handler, the dynamic route becomes dead code.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/routes/robots`[.]txt.ts around lines 5 - 25, Remove the static public/robots.txt file so the dynamic route handler (Route created by createFileRoute('/robots.txt')) can run; ensure the code using generateRobotsTxt(getSiteOrigin(request)) and the setResponseHeader calls (Content-Type, Cache-Control, CDN-Cache-Control) remain in place so the origin is derived at request-time and proper caching headers are applied.

coderabbitai · 2026-03-26T01:19:45Z

src/utils/sitemap.ts

+  if (!slug || slug.endsWith('/index')) {
+    return null
+  }


⚠️ Potential issue | 🟡 Minor

Root index.md is not excluded from docs slugs.

The current check skips */index but still allows index, which can produce /docs/index entries unintentionally.

Proposed fix

- if (!slug || slug.endsWith('/index')) { + if (!slug || slug === 'index' || slug.endsWith('/index')) { return null }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if (!slug || slug.endsWith('/index')) {

return null

}

if (!slug || slug === 'index' || slug.endsWith('/index')) {

return null

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/utils/sitemap.ts` around lines 74 - 76, The slug filtering allows the bare "index" slug through; update the conditional that returns null to also exclude a root "index" slug by checking slug === 'index' in addition to the existing checks (i.e., within the block that uses the slug variable where the current code reads if (!slug || slug.endsWith('/index')) return null). Modify that condition to also short-circuit on slug === 'index' so both root and nested index pages are excluded.

This reverts commit 333b238.

LadyBluenotes added 5 commits March 24, 2026 15:20

feat(sitemap): add routes for robots.txt and sitemap.xml

0b98cbf

feat(sitemap): enhance sitemap configuration for libraries and update…

a02e416

… sitemap generation logic

feat(seo): enhance SEO handling with canonical links and indexing logic

f334274

refactor(seo): remove canonical link functions to simplify SEO handling

722ff22

refactor(sitemap): simplify sitemap generation by removing unused rou…

8496cea

…te handling and enhancing high-value page entries

Merge branch 'main' into sitemap

dee7519

LadyBluenotes requested a review from tannerlinsley March 26, 2026 01:10

coderabbitai bot reviewed Mar 26, 2026

View reviewed changes

tannerlinsley merged commit 333b238 into main Mar 26, 2026
8 checks passed

tannerlinsley deleted the sitemap branch March 26, 2026 01:31

LadyBluenotes added a commit that referenced this pull request Mar 26, 2026

Revert "feat: sitemap / robots.txt (#783)"

4ef5d30

This reverts commit 333b238.

LadyBluenotes mentioned this pull request Mar 26, 2026

Revert "feat: sitemap / robots.txt" #784

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: sitemap / robots.txt#783

feat: sitemap / robots.txt#783
tannerlinsley merged 6 commits intomainfrom
sitemap

LadyBluenotes commented Mar 26, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

netlify bot commented Mar 26, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Mar 26, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 26, 2026

Uh oh!

coderabbitai bot Mar 26, 2026

Uh oh!

coderabbitai bot Mar 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

LadyBluenotes commented Mar 26, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Summary by CodeRabbit

Uh oh!

netlify bot commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for tanstack ready!

Uh oh!

coderabbitai bot commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

LadyBluenotes commented Mar 26, 2026 •

edited by coderabbitai bot

Loading

netlify bot commented Mar 26, 2026 •

edited

Loading

coderabbitai bot commented Mar 26, 2026 •

edited

Loading