Skip to content

Parse Vietnamese dates with zero-padded months (#1297)#1338

Open
gaoflow wants to merge 1 commit into
scrapinghub:masterfrom
gaoflow:fix-1297-vi-zero-padded-months
Open

Parse Vietnamese dates with zero-padded months (#1297)#1338
gaoflow wants to merge 1 commit into
scrapinghub:masterfrom
gaoflow:fix-1297-vi-zero-padded-months

Conversation

@gaoflow

@gaoflow gaoflow commented Jun 16, 2026

Copy link
Copy Markdown

Fixes #1297.

Vietnamese dates with a zero-padded month such as 21 tháng 09 năm 2025 were not parsed as absolute dates. The vi locale data only listed the un-padded month forms (tháng 9), so tháng 09 did not match a month name and fell through to the generic tháng -> month token, yielding a wrong relative date:

import dateparser
dateparser.parse("21 tháng 09 năm 2025", languages=["vi"])
# -> datetime.datetime(2024, 9, 16, 8, 16, 26)  # garbage, drifts with "now"

dateparser.parse("21 tháng 9 năm 2025", languages=["vi"])  # un-padded works
# -> datetime.datetime(2025, 9, 21, 0, 0)

Fix

Add zero-padded month aliases (tháng 0N and thg 0N) for months 1-9 in the supplementary language data and regenerate vi.py. Months 10-12 already carry their two-digit forms, so they are untouched.

dateparser.parse("21 tháng 09 năm 2025", languages=["vi"])
# -> datetime.datetime(2025, 9, 21, 0, 0)

Tests

Added a test_translation case (tests/test_languages.py): param("vi", "21 tháng 09 năm 2025", "21 september 2025"). It is RED on the unpadded-only data (translates to 21 month 09 2025) and GREEN with the fix. The full tests/test_languages.py suite passes (1366 passed).

Disclosure: I prepared this fix with AI assistance under my direction; I reviewed and verified the change and the test myself.

Vietnamese month names like "tháng 09" did not parse: the locale data
only listed the un-padded forms ("tháng 9"), so a zero-padded month fell
through to the generic "tháng" -> month token and produced a wrong
relative date instead of an absolute one.

Add zero-padded aliases ("tháng 0N" / "thg 0N") for months 1-9 in the
supplementary language data and regenerate vi.py. Months 10-12 already
carry two-digit forms.
@codecov

codecov Bot commented Jun 16, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.11%. Comparing base (08c78d3) to head (291c840).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1338   +/-   ##
=======================================
  Coverage   97.11%   97.11%           
=======================================
  Files         235      235           
  Lines        2909     2909           
=======================================
  Hits         2825     2825           
  Misses         84       84           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Vietnamese dates with leading 0's not translating correctly

2 participants