Selective SEO Indexing Strategy

Overview

Climate Explorer uses selective indexing to let search engines index a curated subset of high-value station/region pages, while keeping the long tail of parameterized URLs noindex,follow to prevent index bloat.

Architecture

URL request → Edge Function (rewrite-meta.ts)
                    ↓
         Match against indexed-pages.json?
           YES → Allow indexing, set canonical to curated params
           NO  → Add noindex,follow (default behavior)

Key Files

File Purpose
netlify/edge-functions/indexed-pages.json Curated allowlist of stations/regions
netlify/edge-functions/rewrite-meta.ts Conditionally applies noindex
scripts/normalize-sitemap.mjs Injects curated URLs into sitemap.xml
scripts/check-sitemap.mjs Validates sitemap structure, allowlist consistency, and optional live URL behavior
misc/validate-indexed-pages.mjs Validates station names against source CSV/RDS files

indexed-pages.json Format

{
  "/dwd": {
    "stations": [{ "station": "Berlin-Tempelhof", "landname": "Berlin" }],
    "regions": [{ "landname": "Bayern" }]
  }
}

Matching Rules

  • All params in a curated entry must match the URL (case-insensitive)
  • Extra URL params (resolution, start, end) are ignored during matching
  • The canonical URL uses exactly the curated params from indexed-pages.json
  • For /ghcnh, curated station canonicals are stable station landing pages using station, country, and view=map; date ranges remain user-state URLs, not canonical targets

Section Param Reference

Section Station param Region param
/dwd station (name) landname (Bundesland)
/meteofrance station (name) department (name)
/jma station (DisplayName) prefecture (PrecName)
/eurometeo station (name)
/ghcnh station (name), country, view
/ghcnm station (name)
/imgw station (name)

Maintenance

Adding/Removing Stations

  1. Edit netlify/edge-functions/indexed-pages.json
  2. Validate: node misc/validate-indexed-pages.mjs
  3. Deploy — the edge function reads JSON at runtime

Validation

node misc/validate-indexed-pages.mjs

Checks station names against source CSV/RDS files for DWD, JMA, IMGW, Météo-France, EuroMeteo, GHCNh, and GHCNm.

Sitemap Validation Workflow

Use the sitemap checker in two phases:

  1. Before commit, run the fast local validation:

    node scripts/check-sitemap.mjs

    This checks that:

    • sitemap.xml is valid XML
    • sitemap URLs are unique and HTTPS
    • static sitemap URLs match generated HTML pages
    • parameterized sitemap URLs exactly match indexed-pages.json
    • <loc> values are XML-safe
  2. After pushing and waiting for deployment to finish, run the live verification:

    node scripts/check-sitemap.mjs --fetch-all

    This checks the deployed site behavior for every sitemap URL, including:

    • HTTP status
    • unexpected redirects
    • canonical URL mismatches
    • noindex responses on URLs that are intended to be indexable

For a quicker live smoke test, you can fetch only a sample:

node scripts/check-sitemap.mjs --fetch --sample 6

Sitemap

Curated URLs are auto-injected into sitemap.xml during the Quarto post-render step. No manual editing needed.