Skip to content

Keep recursive news scanning; revert existing articles to flat news/ (URL preservation)#698

Merged
pethers merged 9 commits intomainfrom
copilot/organize-news-articles-directory
Mar 1, 2026
Merged

Keep recursive news scanning; revert existing articles to flat news/ (URL preservation)#698
pethers merged 9 commits intomainfrom
copilot/organize-news-articles-directory

Conversation

Copy link
Contributor

Copilot AI commented Mar 1, 2026

  • Move all 935 existing articles from news/2026/02/ back to flat news/ (preserving existing shared URLs)
  • Keep recursive scanning code in vite.config.js, helpers.ts, generate-sitemap.ts, extract-news-metadata.ts
  • Revert news-evening-analysis.test.ts back to flat path.join(NEWS_DIR, file) calls; removed articlePath() helper
  • Fix sitemap log message: articles.size counts language-groups (base slugs), so label is now "news article groups" instead of "news articles"
  • Broaden "should include news articles" test URL regex to accept both flat and date-based paths (consistent with the language-variant test)
  • Normalize relative() output to POSIX separators in extract-news-metadata.ts (fixes Windows backslash issue in JSON output / slug derivation)
  • Sort readdirSync entries before iterating in generate-sitemap.ts scanDir() and sort final ArticleGroup[] by baseSlug for deterministic sitemap output
  • All 38 sitemap tests pass
Original prompt

This section details on the original issue you should resolve

<issue_title>Organize news articles into date-based directory structure for scalability</issue_title>
<issue_description>## 📋 Issue Type
Architecture / Performance

🎯 Objective

Organize the growing news article collection (949+ HTML files in a single news/ directory) into a date-based directory structure for better maintainability, build performance, and file system efficiency.

📊 Current State

  • 949 HTML files in a single flat news/ directory
  • File names follow YYYY-MM-DD-type-lang.html pattern
  • Vite discoverNewsArticles() function in vite.config.js reads all 949+ files on every build
  • scanNewsArticles() in scripts/generate-news-indexes/helpers.ts also scans all files
  • As new articles are generated daily (10+ per day across 14 languages), this will grow to 5000+ files within a year

🚀 Desired State

  • Articles organized by year/month: news/2026/02/article.html
  • Vite config updated to discover articles recursively
  • News index generation updated for hierarchical scanning
  • Sitemap and RSS generation updated for new paths
  • URL redirects/compatibility maintained for existing article links
  • Build performance improved through directory-level scanning

🔧 Implementation Approach

  1. Design directory structure: news/{year}/{month}/article.html
  2. Create migration script to move existing 949 articles
  3. Update vite.config.js discoverNewsArticles() for recursive discovery
  4. Update scripts/generate-news-indexes/helpers.ts scanNewsArticles()
  5. Update scripts/generate-sitemap.ts for new paths
  6. Update scripts/generate-rss.ts for new paths
  7. Add URL rewrite rules or redirects for backward compatibility
  8. Update article templates to generate new paths
  9. Verify all internal links still work with linkinator

🤖 Recommended Agent

code-quality-engineer — Architecture refactoring

✅ Acceptance Criteria

  • Articles organized in news/{year}/{month}/ structure
  • npm run build succeeds with new structure
  • All internal links validated with linkinator
  • News indexes correctly link to articles in subdirectories
  • RSS feed URLs correct
  • Sitemap URLs correct
  • Article generation templates produce files in correct directories

📚 References

  • Current structure: news/*.html (949 files)
  • Vite config: vite.config.js (discoverNewsArticles())
  • News scanner: scripts/generate-news-indexes/helpers.ts
  • Architecture: ARCHITECTURE.md

🏷️ Labels

type:architecture, priority:medium, component:news-generation</issue_description>

<agent_instructions>Organize news articles from the flat news/ directory into a date-based structure like news/2026/02/article.html. Update vite.config.js discoverNewsArticles(), scanNewsArticles() in generate-news-indexes/helpers.ts, sitemap and RSS generators.</agent_instructions>

Comments on the Issue (you are @copilot in this section)


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

…s/2026/02/)

- Move 935 articles from flat news/ to news/2026/02/
- Update vite.config.js discoverNewsArticles() for recursive scanning
- Update scripts/generate-news-indexes/helpers.ts scanNewsArticles() for recursive scanning with relative slug paths
- Update scripts/generate-sitemap.ts getNewsArticles() for recursive scanning
- Update scripts/extract-news-metadata.ts for recursive scanning
- Update tests to match new URL format and use articlePath() helper

Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
Copilot AI changed the title [WIP] Organize news articles into date-based structure for scalability Organize news articles into date-based directory structure (news/{year}/{month}/) Mar 1, 2026
@github-actions github-actions bot added html-css HTML/CSS changes javascript JavaScript code changes translation Translation updates rtl RTL language support (Arabic, Hebrew) performance Performance optimization testing Test coverage refactor Code refactoring news News articles and content generation size-m Medium change (50-250 lines) labels Mar 1, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 1, 2026

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

…sive scan for future files

Existing 935 articles moved back to news/ to preserve shared URLs (LinkedIn, X, etc).
The recursive scanning code in vite.config.js, helpers.ts, generate-sitemap.ts, and
extract-news-metadata.ts is retained so future articles in news/{year}/{month}/ are
automatically discovered alongside the existing flat files.

Tests updated: sitemap URL pattern matches both flat and subdirectory paths;
news-evening-analysis.test.ts reverted to direct path.join(NEWS_DIR, file) calls.

Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
Copilot AI changed the title Organize news articles into date-based directory structure (news/{year}/{month}/) Keep recursive news scanning; revert existing articles to flat news/ (URL preservation) Mar 1, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 1, 2026

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Restores URL stability by moving previously relocated news articles back into the flat news/ directory, while keeping the new recursive discovery logic so future articles can live under news/{year}/{month}/ without being missed by builds, sitemap generation, or metadata extraction.

Changes:

  • Keep recursive news-article discovery in Vite build inputs so nested news/YYYY/MM/*.html are included automatically.
  • Keep recursive scanning for sitemap generation and news index generation to support both flat and nested article layouts.
  • Broaden sitemap language URL matching in tests to accept either flat or date-based paths.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
vite.config.js Recursively discovers news article HTML files for Rollup/Vite inputs.
tests/sitemap-generation.test.ts Updates language-specific URL matching to allow optional YYYY/MM/ path segment.
scripts/generate-sitemap.ts Recursively scans news/ when building sitemap article groups and hreflang alternates.
scripts/generate-news-indexes/helpers.ts Recursively collects article files and sets metadata.slug to the relative path (supports nested folders).
scripts/extract-news-metadata.ts Recursively discovers news HTML files and builds metadata DB using relative paths.

Comment on lines +121 to 127
// Check for language-specific news articles (flat or date-based directory structure)
const languages: readonly string[] = ['en', 'sv', 'da', 'no', 'fi', 'de', 'fr', 'es', 'nl', 'ar', 'he', 'ja', 'ko', 'zh'];
const foundLanguages = new Set<string>();

languages.forEach(lang => {
const pattern = new RegExp(`news/\\d{4}-\\d{2}-\\d{2}-.+-${lang}\\.html`);
const pattern = new RegExp(`news/(?:\\d{4}/\\d{2}/)?\\d{4}-\\d{2}-\\d{2}-.+-${lang}\\.html`);
if (pattern.test(sitemapContent)) {
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The language-specific pattern was broadened for flat + date-based layouts, but the earlier "should include news articles" regex still only matches /news/YYYY-MM-DD-... and will fail once any articles are actually placed under news/YYYY/MM/. Please broaden that earlier assertion in the same way so tests reflect the intended dual-layout support.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in bcb38c2. The "should include news articles" URL regex now accepts both flat (news/YYYY-MM-DD-*) and date-based (news/YYYY/MM/YYYY-MM-DD-*) paths, matching the pattern already used in the language-variant test below it.


scanDir(NEWS_DIR);

console.log(` Found ${articles.size} news articles`);
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log line now reports articles.size, which is the number of language-groups (base slugs), not the number of article files. Since the output label still says "news articles", it can be misleading during troubleshooting; consider renaming it to "article groups" or logging both group count and total file count.

Suggested change
console.log(` Found ${articles.size} news articles`);
console.log(` Found ${articles.size} news article groups`);

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in bcb38c2. The log line now reads Found ${articles.size} news article groups to make clear that the count is language-group (base-slug) count, not individual HTML file count.

@pethers
Copy link
Member

pethers commented Mar 1, 2026

@copilot apply changes based on the comments in this thread

@github-actions
Copy link
Contributor

github-actions bot commented Mar 1, 2026

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

…st URL pattern

- generate-sitemap.ts: log message now says 'news article groups' since articles.size
  counts base-slug language-groups, not individual HTML files
- sitemap-generation.test.ts: 'should include news articles' URL regex now accepts
  both flat (news/YYYY-MM-DD-*) and date-based (news/YYYY/MM/YYYY-MM-DD-*) paths,
  consistent with the language-variant test in the same describe block

Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 1, 2026

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Comment on lines 73 to 75
const allFilePaths = collectNewsFiles(newsDir);
const files = allFilePaths.map((fp) => relative(newsDir, fp));

Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

relative(newsDir, fp) will return platform-specific separators (e.g., \ on Windows). Since file and derived slug are persisted into data/news-articles.json and used to represent URL-like paths, this can produce backslash-containing slugs and inconsistent outputs across OSes. Normalize the relative paths to POSIX-style (/) before storing them (and before computing slug).

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 338170c. Added sep to the path named imports and changed line 74 to .split(sep).join('/') so file and derived slug always use forward-slash separators in news-articles.json, regardless of OS.

Comment on lines 151 to 155
function scanDir(dir: string): void {
const entries = fs.readdirSync(dir, { withFileTypes: true });
for (const entry of entries) {
if (entry.isDirectory()) {
scanDir(path.join(dir, entry.name));
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recursive scan inserts into articles in filesystem traversal order (readdirSync), and Array.from(articles.values()) preserves that order. With recursive traversal this can make sitemap output order non-deterministic across platforms/filesystems, causing noisy diffs. Consider sorting entries (e.g., by entry.name) before iterating and/or sorting the final ArticleGroup[] by baseSlug before emitting URLs.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 338170c. readdirSync entries are now sorted by name before iterating in scanDir(), and Array.from(articles.values()) is sorted by baseSlug before being returned — making sitemap output fully deterministic across platforms and filesystems.

@pethers
Copy link
Member

pethers commented Mar 1, 2026

@copilot apply changes based on the comments in this thread

@github-actions
Copy link
Contributor

github-actions bot commented Mar 1, 2026

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

… entries deterministically

- extract-news-metadata.ts: import `sep` from 'path'; normalize relative() output
  with .split(sep).join('/') so file/slug fields in news-articles.json always use
  forward slashes regardless of OS
- generate-sitemap.ts scanDir(): sort readdirSync entries by name before iterating
  so Map insertion order is consistent across filesystems/platforms; sort the
  returned ArticleGroup[] by baseSlug for fully deterministic sitemap output

Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 1, 2026

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

@pethers pethers marked this pull request as ready for review March 1, 2026 13:05
@pethers pethers merged commit 53b8224 into main Mar 1, 2026
19 checks passed
@pethers pethers deleted the copilot/organize-news-articles-directory branch March 1, 2026 13:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

html-css HTML/CSS changes javascript JavaScript code changes news News articles and content generation performance Performance optimization refactor Code refactoring rtl RTL language support (Arabic, Hebrew) size-m Medium change (50-250 lines) testing Test coverage translation Translation updates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Organize news articles into date-based directory structure for scalability

3 participants