Skip to content

feat: sitemap / robots.txt#783

Merged
tannerlinsley merged 6 commits intomainfrom
sitemap
Mar 26, 2026
Merged

feat: sitemap / robots.txt#783
tannerlinsley merged 6 commits intomainfrom
sitemap

Conversation

@LadyBluenotes
Copy link
Member

@LadyBluenotes LadyBluenotes commented Mar 26, 2026

Summary

  • add an automated SEO-focused sitemap.xml and robots.txt backed by real site/content data instead of manual URL lists
  • centralize canonical and indexing policy so preferred URLs consistently resolve to latest, faceted pages avoid indexing, and private/auth pages are excluded from search
  • tighten sitemap scope to high-value non-doc pages, published blog posts, selected library landing pages, and dynamically discovered shallow docs pages with no frontmatter maintenance burden

Summary by CodeRabbit

  • New Features
    • Added dynamic sitemap and robots.txt generation for improved search engine discovery.
    • Implemented canonical URL management to prevent duplicate content issues.
    • Added intelligent indexing control—pages with non-canonical filters are automatically marked as noindex.
    • Per-library configuration options for sitemap inclusion of landing pages and documentation.

@netlify
Copy link

netlify bot commented Mar 26, 2026

Deploy Preview for tanstack ready!

Name Link
🔨 Latest commit dee7519
🔍 Latest deploy log https://app.netlify.com/projects/tanstack/deploys/69c48780eeb53f0009b576b9
😎 Deploy Preview https://deploy-preview-783--tanstack.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
Lighthouse
Lighthouse
1 paths audited
Performance: 36 (🔴 down 23 from production)
Accessibility: 90 (no change from production)
Best Practices: 83 (🔴 down 9 from production)
SEO: 97 (no change from production)
PWA: 70 (no change from production)
View the detailed breakdown and full score reports

To edit notification comments on pull requests, go to your Netlify project configuration.

@coderabbitai
Copy link

coderabbitai bot commented Mar 26, 2026

📝 Walkthrough

Walkthrough

This pull request implements SEO improvements by adding sitemap and robots.txt generation, canonical URL tag injection, dynamic noindex meta tags, and per-library sitemap configuration flags. New endpoints, SEO utilities, and metadata generation logic are introduced.

Changes

Cohort / File(s) Summary
Library Configuration
src/libraries/types.ts, src/libraries/libraries.ts
Added optional sitemap configuration object to LibrarySlim type with includeLandingPage and includeTopLevelDocsPages flags. Updated all major library exports (query, router, start, table, form, db) to include these sitemap settings.
Route Definitions
src/routeTree.gen.ts
Extended generated route tree with new /sitemap.xml and /robots.txt file routes, including corresponding TypeScript route type definitions and module augmentations.
SEO Utilities
src/utils/seo.ts
Added getCanonicalPath, shouldIndexPath, and canonicalUrl functions for canonical path computation, indexability filtering, and URL generation. Refactored seo function parameter type to use new SeoOptions type.
Sitemap Generation
src/utils/sitemap.ts
New module providing getSitemapEntries, generateSitemapXml, and generateRobotsTxt functions. Includes XML escaping, origin normalization, and logic to traverse documentation repositories and aggregate library landing pages, docs, and blog posts.
Sitemap Route Handler
src/routes/sitemap[.]xml.ts
New file route for /sitemap.xml with server-side GET handler that generates sitemap XML with cache-control headers (300s CDN, 3600s stale-while-revalidate).
Robots Route Handler
src/routes/robots[.]txt.ts
New file route for /robots.txt with server-side GET handler that generates robots.txt content with matching cache-control configuration.
Root Layout Updates
src/routes/__root.tsx
Modified ShellComponent to derive current canonical path from router state and conditionally inject <link rel="canonical"> and <meta name="robots" content="noindex, nofollow"> tags based on computed path preferences.
Showcase Route
src/routes/showcase/index.tsx
Added hasNonCanonicalSearch helper to detect non-default search parameters. Updated route loader to return this flag and modified head function to set noindex dynamically based on search canonicality.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 A sitemap spun from whisker to tail,
Canonical paths that shall never fail,
Robots and sitemaps now taking flight,
SEO sparkles burning oh-so-bright! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title "feat: sitemap / robots.txt" directly and specifically describes the main changes—adding sitemap and robots.txt functionality to the codebase.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch sitemap

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
src/utils/sitemap.ts (1)

104-110: Avoid silently swallowing docs fetch failures.

Right now all fetch errors become [], which can silently remove large sitemap sections without visibility.

Proposed refactor
-  const docsTree = await fetchRepoDirectoryContents({
-    data: {
-      repo: library.repo,
-      branch,
-      startingPath: docsRoot,
-    },
-  }).catch(() => [])
+  let docsTree: Array<GitHubFileNode> = []
+  try {
+    docsTree = await fetchRepoDirectoryContents({
+      data: {
+        repo: library.repo,
+        branch,
+        startingPath: docsRoot,
+      },
+    })
+  } catch (error) {
+    console.warn('sitemap docs fetch failed', {
+      libraryId: library.id,
+      repo: library.repo,
+      branch,
+      docsRoot,
+      error,
+    })
+  }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/sitemap.ts` around lines 104 - 110, The current assignment to
docsTree swallows all errors from fetchRepoDirectoryContents by returning [];
change this to surface the failure: wrap the await in a try/catch around
fetchRepoDirectoryContents (the call that sets docsTree), and in the catch log
the error with context (include repo, branch,
startingPath/library.repo/docsRoot) and then rethrow the error (or return a
clearly documented fallback if the caller expects it) instead of returning an
empty array so failures are visible and debuggable; ensure the log uses your
project logger (or console.error if none) and references
fetchRepoDirectoryContents and docsTree so the change is easy to locate.
src/utils/seo.ts (1)

51-59: Consider logging a warning when falling back to DEFAULT_SITE_URL in production.

If neither env.URL nor env.SITE_URL is configured, the function silently falls back to https://tanstack.com. While this is safe for the TanStack site, a missing configuration could go unnoticed in different deployment environments. This is a low-priority concern given the site context.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/seo.ts` around lines 51 - 59, The canonicalUrl function silently
falls back to DEFAULT_SITE_URL when env.URL and env.SITE_URL are unset; modify
canonicalUrl to detect that fallback (e.g., origin was set to DEFAULT_SITE_URL)
and emit a warning in production/SSR (use import.meta.env.SSR or your runtime
check) including the values of env.URL and env.SITE_URL so missing config is
visible; keep behavior unchanged otherwise and avoid noisy logs in
non-SSR/dev/test environments.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/routes/__root.tsx`:
- Around line 180-185: The root layout is injecting a robots noindex meta via
the shouldIndexPath check which can duplicate the same tag emitted by
route-level seo() (e.g., showcase route's seo({ noindex:
loaderData?.hasNonCanonicalSearch })). Remove the root-level robots injection
(the conditional that renders <meta name="robots" content="noindex, nofollow" />
based on shouldIndexPath) and let individual routes (via their seo()
implementations such as the showcase route using
loaderData?.hasNonCanonicalSearch) control noindex behavior to avoid duplicate
meta tags.

In `@src/routes/robots`[.]txt.ts:
- Around line 5-25: Remove the static public/robots.txt file so the dynamic
route handler (Route created by createFileRoute('/robots.txt')) can run; ensure
the code using generateRobotsTxt(getSiteOrigin(request)) and the
setResponseHeader calls (Content-Type, Cache-Control, CDN-Cache-Control) remain
in place so the origin is derived at request-time and proper caching headers are
applied.

In `@src/utils/sitemap.ts`:
- Around line 74-76: The slug filtering allows the bare "index" slug through;
update the conditional that returns null to also exclude a root "index" slug by
checking slug === 'index' in addition to the existing checks (i.e., within the
block that uses the slug variable where the current code reads if (!slug ||
slug.endsWith('/index')) return null). Modify that condition to also
short-circuit on slug === 'index' so both root and nested index pages are
excluded.

---

Nitpick comments:
In `@src/utils/seo.ts`:
- Around line 51-59: The canonicalUrl function silently falls back to
DEFAULT_SITE_URL when env.URL and env.SITE_URL are unset; modify canonicalUrl to
detect that fallback (e.g., origin was set to DEFAULT_SITE_URL) and emit a
warning in production/SSR (use import.meta.env.SSR or your runtime check)
including the values of env.URL and env.SITE_URL so missing config is visible;
keep behavior unchanged otherwise and avoid noisy logs in non-SSR/dev/test
environments.

In `@src/utils/sitemap.ts`:
- Around line 104-110: The current assignment to docsTree swallows all errors
from fetchRepoDirectoryContents by returning []; change this to surface the
failure: wrap the await in a try/catch around fetchRepoDirectoryContents (the
call that sets docsTree), and in the catch log the error with context (include
repo, branch, startingPath/library.repo/docsRoot) and then rethrow the error (or
return a clearly documented fallback if the caller expects it) instead of
returning an empty array so failures are visible and debuggable; ensure the log
uses your project logger (or console.error if none) and references
fetchRepoDirectoryContents and docsTree so the change is easy to locate.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 32963829-e727-4108-95ba-a9d209729d28

📥 Commits

Reviewing files that changed from the base of the PR and between f5c6166 and dee7519.

📒 Files selected for processing (9)
  • src/libraries/libraries.ts
  • src/libraries/types.ts
  • src/routeTree.gen.ts
  • src/routes/__root.tsx
  • src/routes/robots[.]txt.ts
  • src/routes/showcase/index.tsx
  • src/routes/sitemap[.]xml.ts
  • src/utils/seo.ts
  • src/utils/sitemap.ts

Comment on lines +180 to +185
{preferredCanonicalPath ? (
<link rel="canonical" href={canonicalUrl(preferredCanonicalPath)} />
) : null}
{!shouldIndexPath(canonicalPath) ? (
<meta name="robots" content="noindex, nofollow" />
) : null}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential duplicate robots meta tags for filtered showcase pages.

The root layout injects <meta name="robots" content="noindex, nofollow"> when shouldIndexPath returns false (line 183-185). However, the showcase route also injects the same meta tag via seo({ noindex: loaderData?.hasNonCanonicalSearch }) in its head config. When both conditions are true, duplicate robots tags will render since TanStack Router's <HeadContent> doesn't deduplicate meta tags by name.

While search engines generally handle duplicates gracefully, this could cause HTML validation warnings and indicates overlapping responsibility.

Consider either:

  1. Relying solely on the root-level injection for path-based noindex decisions, or
  2. Removing the root-level injection and letting individual routes handle their own noindex logic
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/routes/__root.tsx` around lines 180 - 185, The root layout is injecting a
robots noindex meta via the shouldIndexPath check which can duplicate the same
tag emitted by route-level seo() (e.g., showcase route's seo({ noindex:
loaderData?.hasNonCanonicalSearch })). Remove the root-level robots injection
(the conditional that renders <meta name="robots" content="noindex, nofollow" />
based on shouldIndexPath) and let individual routes (via their seo()
implementations such as the showcase route using
loaderData?.hasNonCanonicalSearch) control noindex behavior to avoid duplicate
meta tags.

Comment on lines +5 to +25
export const Route = createFileRoute('/robots.txt')({
server: {
handlers: {
GET: async ({ request }: { request: Request }) => {
const content = generateRobotsTxt(getSiteOrigin(request))

setResponseHeader('Content-Type', 'text/plain; charset=utf-8')
setResponseHeader(
'Cache-Control',
'public, max-age=300, must-revalidate',
)
setResponseHeader(
'CDN-Cache-Control',
'max-age=3600, stale-while-revalidate=3600',
)

return new Response(content)
},
},
},
})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Candidate static files =="
fd -a '^(robots\.txt|sitemap\.xml)$'

echo
echo "== References that may indicate static serving expectations =="
rg -n -C2 'robots\.txt|sitemap\.xml|publicDir|static' -g '!src/routeTree.gen.ts'

Repository: TanStack/tanstack.com

Length of output: 31805


🏁 Script executed:

find . -maxdepth 2 -type f \( -name "vite.config.*" -o -name "start.config.*" -o -name "tsconfig.json" -o -name "package.json" \) -exec head -50 {} +

Repository: TanStack/tanstack.com

Length of output: 4407


🏁 Script executed:

rg -n "publicDir|assetDir|public.*robots" --type ts --type json -A 2 -B 2 | head -100

Repository: TanStack/tanstack.com

Length of output: 47


🏁 Script executed:

git ls-files | grep -E "(config|vite|start)" | head -20

Repository: TanStack/tanstack.com

Length of output: 1323


🏁 Script executed:

find . -maxdepth 2 -name "netlify.toml" -o -name ".netlify.toml" | xargs cat 2>/dev/null

Repository: TanStack/tanstack.com

Length of output: 1118


🏁 Script executed:

rg -n "publicDir|public.*assets|robots" vite.config.ts

Repository: TanStack/tanstack.com

Length of output: 47


🏁 Script executed:

cat -n src/routes/robots[.]txt.ts

Repository: TanStack/tanstack.com

Length of output: 1006


🏁 Script executed:

cat -n public/robots.txt

Repository: TanStack/tanstack.com

Length of output: 159


🏁 Script executed:

cat vite.config.ts | head -150

Repository: TanStack/tanstack.com

Length of output: 3588


🏁 Script executed:

rg -n "publicDir|public" vite.config.ts -A 3 -B 3

Repository: TanStack/tanstack.com

Length of output: 47


🏁 Script executed:

rg -n "generateRobotsTxt\|getSiteOrigin" src/utils/sitemap.ts -A 5 -B 2

Repository: TanStack/tanstack.com

Length of output: 47


🏁 Script executed:

cat -n src/utils/sitemap.ts | grep -A 20 "generateRobotsTxt"

Repository: TanStack/tanstack.com

Length of output: 345


🏁 Script executed:

cat -n src/utils/sitemap.ts

Repository: TanStack/tanstack.com

Length of output: 5713


Remove the static public/robots.txt file. It shadows the dynamic route handler, preventing the request-aware origin logic and cache headers from being applied.

The static file hardcodes https://tanstack.com/sitemap.xml, whereas the dynamic handler derives the origin from env.SITE_URL or the request origin, adapting to different deployment environments. The dynamic handler also sets proper cache headers (Cache-Control, CDN-Cache-Control) that the static file lacks. Since Vite copies public/robots.txt to the build output by default and Netlify serves it before reaching the server handler, the dynamic route becomes dead code.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/routes/robots`[.]txt.ts around lines 5 - 25, Remove the static
public/robots.txt file so the dynamic route handler (Route created by
createFileRoute('/robots.txt')) can run; ensure the code using
generateRobotsTxt(getSiteOrigin(request)) and the setResponseHeader calls
(Content-Type, Cache-Control, CDN-Cache-Control) remain in place so the origin
is derived at request-time and proper caching headers are applied.

Comment on lines +74 to +76
if (!slug || slug.endsWith('/index')) {
return null
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Root index.md is not excluded from docs slugs.

The current check skips */index but still allows index, which can produce /docs/index entries unintentionally.

Proposed fix
-  if (!slug || slug.endsWith('/index')) {
+  if (!slug || slug === 'index' || slug.endsWith('/index')) {
     return null
   }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (!slug || slug.endsWith('/index')) {
return null
}
if (!slug || slug === 'index' || slug.endsWith('/index')) {
return null
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/sitemap.ts` around lines 74 - 76, The slug filtering allows the
bare "index" slug through; update the conditional that returns null to also
exclude a root "index" slug by checking slug === 'index' in addition to the
existing checks (i.e., within the block that uses the slug variable where the
current code reads if (!slug || slug.endsWith('/index')) return null). Modify
that condition to also short-circuit on slug === 'index' so both root and nested
index pages are excluded.

@tannerlinsley tannerlinsley merged commit 333b238 into main Mar 26, 2026
8 checks passed
@tannerlinsley tannerlinsley deleted the sitemap branch March 26, 2026 01:31
LadyBluenotes added a commit that referenced this pull request Mar 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants