Skip to content

feat(seo): canonical, JSON-LD, sitemap directive, richer meta descriptions#533

Draft
tym83 wants to merge 3 commits intomainfrom
feat/seo-canonical-jsonld
Draft

feat(seo): canonical, JSON-LD, sitemap directive, richer meta descriptions#533
tym83 wants to merge 3 commits intomainfrom
feat/seo-canonical-jsonld

Conversation

@tym83
Copy link
Copy Markdown
Contributor

@tym83 tym83 commented May 8, 2026

Summary

Adds the missing technical-SEO scaffolding to cozystack.io. The site is currently
indexable but emits no canonical link, no structured data, and lets five
duplicate doc-version copies (v0, v1.0, v1.1, v1.2, next) compete for the same
ranking signals. As a result, only one keyword shows up in third-party SEO
indexes despite a strong inbound-link profile (kubernetes.io, ripe.net,
opennet.ru, cncf.io, fluxcd.io).

This PR addresses the highest-leverage fixes without changing site
architecture or adding any vendor-specific outbound links.

What changed

  • layouts/robots.txt — adds an explicit Sitemap: directive so Bing,
    Yandex, and other crawlers discover the sitemap reliably (the default
    Hugo-generated robots.txt was empty beyond User-agent: *).
  • layouts/partials/hooks/head-end.html:
    • Emits <link rel="canonical"> pointing to each page's permalink.
    • Emits <meta name="robots" content="noindex, follow"> on legacy doc
      versions (any params.versions[].id other than params.latest_version_id
      or next). Pages stay reachable via direct link; they no longer compete
      with the current version for ranking authority.
    • Inlines JSON-LD Organization (every page) — name, URL, logo,
      description, foundingDate, sameAs (GitHub, CNCF Landscape, Slack,
      Telegram).
    • Inlines JSON-LD WebSite on the homepage — eligibility for sitelinks
      searchbox in branded results.
    • Inlines JSON-LD BlogPosting on single blog posts — eligibility for
      Google Discover and AI Overview citation.
  • hugo.yamlparams.description rewritten to cover the platform's
    core capability surface (VMs, managed databases, S3, GPU) and the CNCF
    Sandbox status.
  • content/en/docs/v1.2/{,applications,virtualization,storage,networking,operations}/_index.md
    description frontmatter rewritten on each section index to name the
    underlying components (KubeVirt, LINSTOR, Cilium eBPF, VictoriaMetrics,
    Velero, etc.) and concrete services (PostgreSQL, MySQL, Redis, etc.).
    Each description stays under ~155 characters so SERP snippets are not
    truncated.

Why

  • The site's strongest backlinks (kubernetes.io 44 dofollow, ripe.net 102
    links, opennet.ru 107) are not converting to ranking authority because
    search engines split signals across five duplicate copies of every doc page.
    Marking legacy versions noindex consolidates that authority on the current
    version.
  • Without Organization schema, the project is not modeled as a recognized
    entity in Google's Knowledge Graph and tends not to be cited in AI Overview
    results. Inline JSON-LD is the standard way to fix this and does not
    require any change to the visual design.
  • Section index pages currently emit generic descriptions ("Operational
    guides on the storage subsystem") with no overlap with the terms users
    actually search ("LINSTOR", "Ceph", "block storage"). Replacing them with
    component-specific blurbs improves SERP CTR and entity coverage.

Verification

Reviewer can verify by serving the site locally (make serve) and checking:

  1. curl -s http://localhost:1313/ | grep -E '<link rel="canonical"|application/ld\+json'
    — should show one canonical link, one Organization JSON-LD, one WebSite
    JSON-LD on the homepage.
  2. curl -s http://localhost:1313/docs/v0/applications/postgres/ | grep 'noindex'
    — should show <meta name="robots" content="noindex, follow">.
  3. curl -s http://localhost:1313/docs/v1.2/applications/postgres/ | grep canonical
    — should show canonical to that exact URL.
  4. curl -s http://localhost:1313/blog/2026/04/cozystack-1-3-storage-aware-scheduling-linstor-gui-and-vm-default-images/ | grep BlogPosting
    — should include the BlogPosting JSON-LD.
  5. JSON-LD validity can be confirmed against
    https://validator.schema.org/ or
    https://search.google.com/test/rich-results after deploy.

Out of scope (deliberate)

  • BreadcrumbList schema for docs pages — needs careful template logic for
    versioned paths; will be a follow-up.
  • Splitting the sitemap into sitemap-index.xml + per-section sitemaps —
    follow-up.
  • Hugo upgrade from 0.160.1 — follow-up.
  • Performance pass (defer/async on jQuery + DocSearch, image format
    modernization) — follow-up.

Notes

  • The sameAs list intentionally contains only project-owned channels
    (GitHub, CNCF Landscape, Slack, Telegram). Vendor-specific social
    accounts are excluded to keep the schema entity-accurate.
  • Marked as draft pending reviewer eyes on the canonical-noindex split and
    the meta-description copy.

tym83 and others added 3 commits May 8, 2026 23:12
The default Hugo-generated robots.txt only contained `User-agent: *`
with no Sitemap declaration. Search engines could find the sitemap by
direct probe, but Bing and Yandex rely on the directive to discover it
reliably. Adding a Sitemap line makes the canonical sitemap location
explicit to all crawlers.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: tym83 <6355522@gmail.com>
…rsions

The site previously emitted no canonical link tag and no structured
data. Pages across legacy documentation versions (v0, v1.0, v1.1)
duplicate the latest version's content but had no dedup signals, so
search engines split ranking authority across all five copies.

Changes:

- Emit `<link rel="canonical">` pointing to the page's own permalink for
  every page outside legacy doc versions.
- Emit `<meta name="robots" content="noindex, follow">` on legacy
  doc-version pages (any version present in `params.versions[]` whose
  `id` is neither `params.latest_version_id` nor `next`). Pages remain
  reachable for users following links; they no longer compete with the
  current version for ranking.
- Inline JSON-LD `Organization` schema on every page so search engines
  can build a consistent knowledge entity for Cozystack (CNCF Landscape,
  GitHub, Slack, Telegram in `sameAs`).
- Inline JSON-LD `WebSite` on the homepage to expose the site's name,
  URL, and description to AI search and rich result generators.
- Inline JSON-LD `BlogPosting` on single blog posts with title,
  description, dates, author, image, and publisher — required for
  Google Discover eligibility and AI Overview citation.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: tym83 <6355522@gmail.com>
…erage

Section index pages and the site default `params.description` carried
generic blurbs ("Free Cloud Platform based on Kubernetes",
"Operational guides on the storage subsystem") with little keyword
overlap with the actual content. Search snippets from these pages
gave little context to scanning users and missed terms that real
queries use ("KubeVirt", "LINSTOR", "VictoriaMetrics", "managed
PostgreSQL", "Cilium eBPF").

Updates:

- Site default description now covers the platform's main capability
  set (VMs, managed databases, S3, GPU) and its CNCF Sandbox status.
- v1.2 docs root, applications, virtualization, storage, networking,
  and operations section indexes each get descriptions naming the
  underlying components and concrete services they document.

Each description stays under ~155 characters to fit a typical SERP
snippet without truncation.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: tym83 <6355522@gmail.com>
@netlify
Copy link
Copy Markdown

netlify Bot commented May 8, 2026

Deploy Preview for cozystack ready!

Name Link
🔨 Latest commit cb5490b
🔍 Latest deploy log https://app.netlify.com/projects/cozystack/deploys/69fe29d92d95d20008e542a9
😎 Deploy Preview https://deploy-preview-533--cozystack.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 8, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 918f7f89-b3f0-4124-9c28-38885607d82c

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/seo-canonical-jsonld

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements SEO enhancements by updating meta descriptions across documentation sections and the site configuration, adding a robots.txt file, and introducing JSON-LD structured data for Organization, WebSite, and BlogPosting types. It also includes logic for managing canonical URLs and noindex tags for legacy documentation versions. Feedback identifies a potential version mismatch in the SEO logic that could lead to the current documentation being incorrectly excluded from search results and suggests providing a fallback for blog post descriptions in the structured data.

Comment on lines +19 to +29
{{- $latestVersion := .Site.Params.latest_version_id | default "v1.2" -}}
{{- $isOldDocsVersion := false -}}
{{- range .Site.Params.versions -}}
{{- if and .id (ne .id $latestVersion) (ne .id "next") -}}
{{- if in $.RelPermalink (printf "/docs/%s/" .id) -}}
{{- $isOldDocsVersion = true -}}
{{- end -}}
{{- end -}}
{{- end -}}
{{- if $isOldDocsVersion }}
<meta name="robots" content="noindex, follow" />
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There is a discrepancy between the SEO logic and the site configuration. The hugo.yaml file (line 133) defines latest_version_id as v1.3, but this PR applies SEO improvements to the v1.2 documentation. Under the current logic in lines 21-27, all v1.2 pages will be marked with noindex (line 29), which negates the value of the new meta descriptions and JSON-LD for those pages. If v1.2 is intended to be the primary version for search engines, latest_version_id in hugo.yaml should be updated to v1.2.

"@context": "https://schema.org",
"@type": "BlogPosting",
"headline": {{ .Title | jsonify }},
"description": {{ .Description | jsonify }},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If a blog post is missing the description field in its frontmatter, the BlogPosting JSON-LD will have an empty description field. Using .Summary as a fallback ensures a valid description is always provided for search engines.

Suggested change
"description": {{ .Description | jsonify }},
"description": {{ .Description | default .Summary | jsonify }},

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant