feat(seo): canonical, JSON-LD, sitemap directive, richer meta descriptions#533
feat(seo): canonical, JSON-LD, sitemap directive, richer meta descriptions#533
Conversation
The default Hugo-generated robots.txt only contained `User-agent: *` with no Sitemap declaration. Search engines could find the sitemap by direct probe, but Bing and Yandex rely on the directive to discover it reliably. Adding a Sitemap line makes the canonical sitemap location explicit to all crawlers. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: tym83 <6355522@gmail.com>
…rsions The site previously emitted no canonical link tag and no structured data. Pages across legacy documentation versions (v0, v1.0, v1.1) duplicate the latest version's content but had no dedup signals, so search engines split ranking authority across all five copies. Changes: - Emit `<link rel="canonical">` pointing to the page's own permalink for every page outside legacy doc versions. - Emit `<meta name="robots" content="noindex, follow">` on legacy doc-version pages (any version present in `params.versions[]` whose `id` is neither `params.latest_version_id` nor `next`). Pages remain reachable for users following links; they no longer compete with the current version for ranking. - Inline JSON-LD `Organization` schema on every page so search engines can build a consistent knowledge entity for Cozystack (CNCF Landscape, GitHub, Slack, Telegram in `sameAs`). - Inline JSON-LD `WebSite` on the homepage to expose the site's name, URL, and description to AI search and rich result generators. - Inline JSON-LD `BlogPosting` on single blog posts with title, description, dates, author, image, and publisher — required for Google Discover eligibility and AI Overview citation. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: tym83 <6355522@gmail.com>
…erage
Section index pages and the site default `params.description` carried
generic blurbs ("Free Cloud Platform based on Kubernetes",
"Operational guides on the storage subsystem") with little keyword
overlap with the actual content. Search snippets from these pages
gave little context to scanning users and missed terms that real
queries use ("KubeVirt", "LINSTOR", "VictoriaMetrics", "managed
PostgreSQL", "Cilium eBPF").
Updates:
- Site default description now covers the platform's main capability
set (VMs, managed databases, S3, GPU) and its CNCF Sandbox status.
- v1.2 docs root, applications, virtualization, storage, networking,
and operations section indexes each get descriptions naming the
underlying components and concrete services they document.
Each description stays under ~155 characters to fit a typical SERP
snippet without truncation.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: tym83 <6355522@gmail.com>
✅ Deploy Preview for cozystack ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request implements SEO enhancements by updating meta descriptions across documentation sections and the site configuration, adding a robots.txt file, and introducing JSON-LD structured data for Organization, WebSite, and BlogPosting types. It also includes logic for managing canonical URLs and noindex tags for legacy documentation versions. Feedback identifies a potential version mismatch in the SEO logic that could lead to the current documentation being incorrectly excluded from search results and suggests providing a fallback for blog post descriptions in the structured data.
| {{- $latestVersion := .Site.Params.latest_version_id | default "v1.2" -}} | ||
| {{- $isOldDocsVersion := false -}} | ||
| {{- range .Site.Params.versions -}} | ||
| {{- if and .id (ne .id $latestVersion) (ne .id "next") -}} | ||
| {{- if in $.RelPermalink (printf "/docs/%s/" .id) -}} | ||
| {{- $isOldDocsVersion = true -}} | ||
| {{- end -}} | ||
| {{- end -}} | ||
| {{- end -}} | ||
| {{- if $isOldDocsVersion }} | ||
| <meta name="robots" content="noindex, follow" /> |
There was a problem hiding this comment.
There is a discrepancy between the SEO logic and the site configuration. The hugo.yaml file (line 133) defines latest_version_id as v1.3, but this PR applies SEO improvements to the v1.2 documentation. Under the current logic in lines 21-27, all v1.2 pages will be marked with noindex (line 29), which negates the value of the new meta descriptions and JSON-LD for those pages. If v1.2 is intended to be the primary version for search engines, latest_version_id in hugo.yaml should be updated to v1.2.
| "@context": "https://schema.org", | ||
| "@type": "BlogPosting", | ||
| "headline": {{ .Title | jsonify }}, | ||
| "description": {{ .Description | jsonify }}, |
There was a problem hiding this comment.
If a blog post is missing the description field in its frontmatter, the BlogPosting JSON-LD will have an empty description field. Using .Summary as a fallback ensures a valid description is always provided for search engines.
| "description": {{ .Description | jsonify }}, | |
| "description": {{ .Description | default .Summary | jsonify }}, |
Summary
Adds the missing technical-SEO scaffolding to cozystack.io. The site is currently
indexable but emits no canonical link, no structured data, and lets five
duplicate doc-version copies (v0, v1.0, v1.1, v1.2, next) compete for the same
ranking signals. As a result, only one keyword shows up in third-party SEO
indexes despite a strong inbound-link profile (kubernetes.io, ripe.net,
opennet.ru, cncf.io, fluxcd.io).
This PR addresses the highest-leverage fixes without changing site
architecture or adding any vendor-specific outbound links.
What changed
layouts/robots.txt— adds an explicitSitemap:directive so Bing,Yandex, and other crawlers discover the sitemap reliably (the default
Hugo-generated robots.txt was empty beyond
User-agent: *).layouts/partials/hooks/head-end.html:<link rel="canonical">pointing to each page's permalink.<meta name="robots" content="noindex, follow">on legacy docversions (any
params.versions[].idother thanparams.latest_version_idor
next). Pages stay reachable via direct link; they no longer competewith the current version for ranking authority.
Organization(every page) — name, URL, logo,description, foundingDate,
sameAs(GitHub, CNCF Landscape, Slack,Telegram).
WebSiteon the homepage — eligibility for sitelinkssearchbox in branded results.
BlogPostingon single blog posts — eligibility forGoogle Discover and AI Overview citation.
hugo.yaml—params.descriptionrewritten to cover the platform'score capability surface (VMs, managed databases, S3, GPU) and the CNCF
Sandbox status.
content/en/docs/v1.2/{,applications,virtualization,storage,networking,operations}/_index.md—
descriptionfrontmatter rewritten on each section index to name theunderlying components (KubeVirt, LINSTOR, Cilium eBPF, VictoriaMetrics,
Velero, etc.) and concrete services (PostgreSQL, MySQL, Redis, etc.).
Each description stays under ~155 characters so SERP snippets are not
truncated.
Why
links, opennet.ru 107) are not converting to ranking authority because
search engines split signals across five duplicate copies of every doc page.
Marking legacy versions noindex consolidates that authority on the current
version.
Organizationschema, the project is not modeled as a recognizedentity in Google's Knowledge Graph and tends not to be cited in AI Overview
results. Inline JSON-LD is the standard way to fix this and does not
require any change to the visual design.
guides on the storage subsystem") with no overlap with the terms users
actually search ("LINSTOR", "Ceph", "block storage"). Replacing them with
component-specific blurbs improves SERP CTR and entity coverage.
Verification
Reviewer can verify by serving the site locally (
make serve) and checking:curl -s http://localhost:1313/ | grep -E '<link rel="canonical"|application/ld\+json'— should show one canonical link, one Organization JSON-LD, one WebSite
JSON-LD on the homepage.
curl -s http://localhost:1313/docs/v0/applications/postgres/ | grep 'noindex'— should show
<meta name="robots" content="noindex, follow">.curl -s http://localhost:1313/docs/v1.2/applications/postgres/ | grep canonical— should show canonical to that exact URL.
curl -s http://localhost:1313/blog/2026/04/cozystack-1-3-storage-aware-scheduling-linstor-gui-and-vm-default-images/ | grep BlogPosting— should include the BlogPosting JSON-LD.
https://validator.schema.org/ or
https://search.google.com/test/rich-results after deploy.
Out of scope (deliberate)
versioned paths; will be a follow-up.
sitemap-index.xml+ per-section sitemaps —follow-up.
modernization) — follow-up.
Notes
sameAslist intentionally contains only project-owned channels(GitHub, CNCF Landscape, Slack, Telegram). Vendor-specific social
accounts are excluded to keep the schema entity-accurate.
the meta-description copy.