Selective SEO Indexing Strategy
Overview
Climate Explorer uses selective indexing to let search engines index a curated subset of high-value station/region pages, while keeping the long tail of parameterized URLs noindex,follow to prevent index bloat.
Architecture
URL request → Edge Function (rewrite-meta.ts)
↓
Match against indexed-pages.json?
YES → Allow indexing, set canonical to curated params
NO → Add noindex,follow (default behavior)
Key Files
| File | Purpose |
|---|---|
netlify/edge-functions/indexed-pages.json |
Curated allowlist of stations/regions |
netlify/edge-functions/rewrite-meta.ts |
Conditionally applies noindex |
scripts/normalize-sitemap.mjs |
Injects curated URLs into sitemap.xml |
scripts/check-sitemap.mjs |
Validates sitemap structure, allowlist consistency, and optional live URL behavior |
misc/validate-indexed-pages.mjs |
Validates station names against source CSV/RDS files |
indexed-pages.json Format
{
"/dwd": {
"stations": [{ "station": "Berlin-Tempelhof", "landname": "Berlin" }],
"regions": [{ "landname": "Bayern" }]
}
}Matching Rules
- All params in a curated entry must match the URL (case-insensitive)
- Extra URL params (
resolution,start,end) are ignored during matching - The canonical URL uses exactly the curated params from
indexed-pages.json - For
/ghcnh, curated station canonicals are stable station landing pages usingstation,country, andview=map; date ranges remain user-state URLs, not canonical targets
Section Param Reference
| Section | Station param | Region param |
|---|---|---|
/dwd |
station (name) |
landname (Bundesland) |
/meteofrance |
station (name) |
department (name) |
/jma |
station (DisplayName) |
prefecture (PrecName) |
/eurometeo |
station (name) |
— |
/ghcnh |
station (name), country, view |
— |
/ghcnm |
station (name) |
— |
/imgw |
station (name) |
— |
Maintenance
Adding/Removing Stations
- Edit
netlify/edge-functions/indexed-pages.json - Validate:
node misc/validate-indexed-pages.mjs - Deploy — the edge function reads JSON at runtime
Validation
node misc/validate-indexed-pages.mjsChecks station names against source CSV/RDS files for DWD, JMA, IMGW, Météo-France, EuroMeteo, GHCNh, and GHCNm.
Sitemap Validation Workflow
Use the sitemap checker in two phases:
Before commit, run the fast local validation:
node scripts/check-sitemap.mjsThis checks that:
sitemap.xmlis valid XML- sitemap URLs are unique and HTTPS
- static sitemap URLs match generated HTML pages
- parameterized sitemap URLs exactly match
indexed-pages.json <loc>values are XML-safe
After pushing and waiting for deployment to finish, run the live verification:
node scripts/check-sitemap.mjs --fetch-allThis checks the deployed site behavior for every sitemap URL, including:
- HTTP status
- unexpected redirects
- canonical URL mismatches
noindexresponses on URLs that are intended to be indexable
For a quicker live smoke test, you can fetch only a sample:
node scripts/check-sitemap.mjs --fetch --sample 6Sitemap
Curated URLs are auto-injected into sitemap.xml during the Quarto post-render step. No manual editing needed.