Skip to content

Comments

Update CumberlandCouncil.py#1860

Open
makemelegal wants to merge 2 commits intorobbrad:masterfrom
makemelegal:master
Open

Update CumberlandCouncil.py#1860
makemelegal wants to merge 2 commits intorobbrad:masterfrom
makemelegal:master

Conversation

@makemelegal
Copy link

@makemelegal makemelegal commented Feb 19, 2026

Admittedly this was Claude but i have checked it and it appears to work as expected (with my limited capability)

Problem

The CumberlandCouncil scraper returns February collections with the
correct year (2026), but assigns year 2025 to every other month
(March through August). Home Assistant therefore only shows the
current month — future collections appear to be in the past.

Root Cause

Two bugs were found:

Bug 1 — Hardcoded year assumption
The code used this logic to assign a year to each month:

year_2026 = "2026" in text_content
if year_2026:
    current_year = "2026" if line in ["January","February"] else "2025"

This was written for a specific point in time (when the calendar
straddled Dec 2025 and Jan/Feb 2026). It now assigns 2025 to every
month other than January and February, which is wrong for the live
calendar that currently shows February through August 2026.

Bug 2 — Incomplete subtype skip
The code only skipped "Refuse" and "Recycling" subtypes:

if i + 2 < len(lines) and lines[i + 2] in ["Refuse", "Recycling"]:

The page also has "Paper" and "Green" subtypes which were not
being skipped, causing occasional parsing errors.

Fix

  1. Extract the base year from the page heading:
    "Collection calendar: February to August 2026" -> year = 2026
    This works correctly for any year, not just 2026.

  2. Track previous month number; if months go backwards
    (e.g. December -> January) increment the year.
    This handles year-boundary calendars robustly.

  3. Replace the hardcoded subtype list with a general check:
    skip any line that is not a digit and not a month name.
    This correctly handles all four subtypes on the page:
    Refuse, Recycling, Paper, Green.

Tested Against

Live URL: https://www.cumberland.gov.uk/bins-recycling-and-street-cleaning/
waste-collections/bin-collection-schedule/view/100110734613
UPRN: 100110734613

Result: All 20+ collections from February to August 2026 now return
with the correct year 2026. All four subtype variants confirmed working.

Summary by CodeRabbit

  • Bug Fixes
    • Improved parsing for multi-line calendar headings and robust handling of multi-year collection calendars.
    • Corrected month/year detection and date construction so collection dates sort and display accurately across year boundaries.
    • Enhanced bin-type/day extraction to skip non-date sublines reliably.
    • Added explicit error handling when a collection year cannot be determined.

fix: CumberlandCouncil - correct year assignment for all months - previous had accidentally hardcoded the year as 2025 for all months except Jan/Feb
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 19, 2026

📝 Walkthrough

Walkthrough

Refactors CumberlandCouncil.py's calendar parsing: introduces a module-level month-name list, extracts calendar year from a "Collection calendar" heading, computes month numbers and year rollover logic, generalizes day/bin parsing, constructs/sorts collection dates using the configured date_format, and adds error handling for undetermined years.

Changes

Cohort / File(s) Summary
Cumberland council parser
uk_bin_collection/uk_bin_collection/councils/CumberlandCouncil.py
Added _MONTH_NAMES constant; refactored URL construction/readability; parse "Collection calendar" to extract calendar_year, start_month_num, end_month_num; compute is_same_year and determine current_year with year-rollover handling; detect month names to set current_month; broadened day/bin parsing to skip subtype lines by checking numeric/month tokens; construct dates using date_format and use that for sorting; raise ValueError when collection year cannot be determined; minor loop/comment formatting changes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • fix: Cumberland Council #1764 — implements the same CumberlandCouncil.py parsing overhaul (month-name detection, year/month computation, subtype skipping, and sorting).

Poem

🐰
I hopped through headers, month-names in tow,
Found years in "Collection calendar" aglow.
When months roll over I nudge the year right,
Bins lined in order from morning to night.
A tiny rabbit cheers the parser's bright light.

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Update CumberlandCouncil.py' is vague and generic, using non-descriptive language that doesn't convey the specific fix or main change being made. Consider using a more descriptive title like 'Fix CumberlandCouncil year assignment and subtype parsing' to clearly indicate the specific problem being addressed.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
uk_bin_collection/uk_bin_collection/councils/CumberlandCouncil.py (2)

56-59: Month name list is duplicated.

The same 12-element tuple appears twice: once in the while loop's month-detection branch and again in the subtype-skip condition. Extract it to a module-level constant to keep them in sync.

♻️ Proposed refactor
+_MONTH_NAMES = [
+    "January", "February", "March", "April", "May", "June",
+    "July", "August", "September", "October", "November", "December",
+]

 ...
-            if line in [
-                "January", "February", "March", "April", "May", "June",
-                "July", "August", "September", "October", "November", "December",
-            ]:
+            if line in _MONTH_NAMES:

 ...
-                        and lines[i + 2] not in [
-                            "January", "February", "March", "April", "May", "June",
-                            "July", "August", "September", "October", "November", "December",
-                        ]
+                        and lines[i + 2] not in _MONTH_NAMES

Also applies to: 85-88

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/CumberlandCouncil.py` around
lines 56 - 59, Extract the duplicated 12-month sequence into a module-level
constant (e.g., MONTH_NAMES) and replace both inline lists used in the
month-detection branch (where `if line in [...]` is used) and the subtype-skip
condition with that constant; update any references in the CumberlandCouncil
class or its methods to use MONTH_NAMES so the two checks stay in sync.

44-50: Year extraction falls back silently and assumes start/end months share the same year.

Two concerns with this block:

  1. Silent fallback: If no "Collection calendar" line is found, or it contains no 4-digit year, current_year silently stays as datetime.now().year. This masks future format changes. Per the project guideline, an explicit exception should be raised on unexpected formats.

  2. Cross-year calendar assumption: The code extracts the last 4-digit word from the heading and assigns it to all months as the base year. A heading like "Collection calendar: November to March 2026" yields current_year = 2026, but November and December should be 2025. The subsequent rollover logic (month_num < previous_month_num) would then further corrupt the year (January rolls from 2026 → 2027 instead of landing on 2026). This only works correctly when the entire calendar falls within one calendar year (the current Feb–Aug case).

💡 Suggested fix: raise on missing year + document the single-year assumption
-        # Determine the base year from the page heading, e.g.
-        # "Collection calendar: February to August 2026"
-        # This is more reliable than checking whether "2026" appears anywhere
-        # in the page, which broke the year assignment for all non-Jan/Feb months.
-        for line in lines:
-            if "Collection calendar" in line:
-                for word in reversed(line.split()):
-                    if word.isdigit() and len(word) == 4:
-                        current_year = int(word)
-                        break
-                break
+        # Determine the base year from the page heading, e.g.
+        # "Collection calendar: February to August 2026"
+        # NOTE: Assumes the calendar does not span a year boundary
+        # (i.e. all months share the same calendar year). If a cross-year
+        # schedule is ever published, this logic must be revisited.
+        calendar_year = None
+        for line in lines:
+            if "Collection calendar" in line:
+                for word in reversed(line.split()):
+                    if word.isdigit() and len(word) == 4:
+                        calendar_year = int(word)
+                        break
+                break
+        if calendar_year is None:
+            raise ValueError(
+                "Could not determine collection year from 'Collection calendar' heading. "
+                "Page format may have changed."
+            )
+        current_year = calendar_year

Based on learnings: "prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors. This ensures format changes are detected early."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/CumberlandCouncil.py` around
lines 44 - 50, The code currently falls back silently to datetime.now().year and
assigns that single year to all months; update the "Collection calendar" parsing
loop (where current_year is set) to (1) raise a clear ValueError if no 4-digit
year is found in the "Collection calendar" line instead of silently using
datetime.now().year, and (2) handle cross-year headings by detecting the
start-month in the same header (parse the first month name in the heading) and
compute each row's year relative to that start month: keep current_year as the
header year but when assigning a year for a given month_num compare it to the
parsed start_month_num and if month_num < start_month_num subtract 1 from
current_year (use the existing previous_month_num/month_num rollover logic to
maintain consistency).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@uk_bin_collection/uk_bin_collection/councils/CumberlandCouncil.py`:
- Around line 56-59: Extract the duplicated 12-month sequence into a
module-level constant (e.g., MONTH_NAMES) and replace both inline lists used in
the month-detection branch (where `if line in [...]` is used) and the
subtype-skip condition with that constant; update any references in the
CumberlandCouncil class or its methods to use MONTH_NAMES so the two checks stay
in sync.
- Around line 44-50: The code currently falls back silently to
datetime.now().year and assigns that single year to all months; update the
"Collection calendar" parsing loop (where current_year is set) to (1) raise a
clear ValueError if no 4-digit year is found in the "Collection calendar" line
instead of silently using datetime.now().year, and (2) handle cross-year
headings by detecting the start-month in the same header (parse the first month
name in the heading) and compute each row's year relative to that start month:
keep current_year as the header year but when assigning a year for a given
month_num compare it to the parsed start_month_num and if month_num <
start_month_num subtract 1 from current_year (use the existing
previous_month_num/month_num rollover logic to maintain consistency).

further feedback from AI agent. this fixes the issues flagged and builds on previous commit
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
uk_bin_collection/uk_bin_collection/councils/CumberlandCouncil.py (2)

99-100: Inconsistent (and locale-sensitive) month-to-number conversion.

Lines 72 and 74 use _MONTH_NAMES.index(lines[j]) + 1, but line 100 re-derives the same value via datetime.strptime(line, "%B").month. strptime is locale-dependent and adds unnecessary overhead when the index into the already-checked _MONTH_NAMES list gives an identical, locale-agnostic result.

♻️ Proposed fix
-                month_num = datetime.strptime(line, "%B").month
+                month_num = _MONTH_NAMES.index(line) + 1
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/CumberlandCouncil.py` around
lines 99 - 100, The code uses datetime.strptime(line, "%B").month to get
month_num despite having already validated the token against the _MONTH_NAMES
list; replace that locale-sensitive call with a direct, locale-agnostic lookup
using _MONTH_NAMES.index(line) + 1 so month_num is computed the same way as
elsewhere (consistent with uses like _MONTH_NAMES.index(lines[j]) + 1) and
remove the unnecessary strptime import/usage in the CumberlandCouncil parsing
logic.

67-68: Redundant .strip() on an already-stripped list element.

lines is built with .strip() inside the list comprehension at line 43, so line.strip() at line 68 is a no-op.

♻️ Proposed fix
-        if line.strip().startswith("Collection calendar"):
+        if line.startswith("Collection calendar"):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/CumberlandCouncil.py` around
lines 67 - 68, The loop in the CumberlandCouncil parser redundantly calls
line.strip() because lines are already stripped when constructed; update the
conditional in the for loop (the enumerate over lines in the method that
processes the calendar lines) to use line.startswith("Collection calendar")
instead of line.strip().startswith(...). Locate the for i, line in
enumerate(lines): block in CumberlandCouncil.py and change the check to rely on
the already-stripped line variable.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@uk_bin_collection/uk_bin_collection/councils/CumberlandCouncil.py`:
- Around line 85-89: The current is_same_year logic silently treats unparsed
months as same-year because it returns True when start_month_num or
end_month_num is None; update the code around is_same_year (the variables
start_month_num, end_month_num, and is_same_year) to fail explicitly instead: if
either month number is None, raise a clear parsing/format exception (e.g.,
ValueError with context including the raw heading) or skip the cross-year branch
and raise when cross-year handling would be needed, so that unparseable headings
are surfaced rather than defaulting to same-year.
- Around line 134-144: The try/except block around building date_str and parsing
it (using date_str, collection_date, current_month, current_year, date_format
and appending dict_data to bindata["bins"]) silently swallows ValueError; change
it to surface the failure by either logging the failing date string and bin_type
then re-raising a ValueError or simply re-raising with contextual information
(e.g. include date_str and bin_type), so format regressions in
CumberlandCouncil.py are not ignored; do not keep a bare `except ValueError:
pass`.
- Line 100: Replace the star import in CumberlandCouncil.py with explicit
imports for the used names: import datetime (or from datetime import datetime as
needed) and import date_format from its defining module; then update usages like
month_num = datetime.strptime(line, "%B").month and any other references at the
locations that previously relied on the star import (the occurrences of datetime
and date_format) so they resolve to the explicitly imported symbols and avoid
the Ruff F405 ambiguity.

---

Nitpick comments:
In `@uk_bin_collection/uk_bin_collection/councils/CumberlandCouncil.py`:
- Around line 99-100: The code uses datetime.strptime(line, "%B").month to get
month_num despite having already validated the token against the _MONTH_NAMES
list; replace that locale-sensitive call with a direct, locale-agnostic lookup
using _MONTH_NAMES.index(line) + 1 so month_num is computed the same way as
elsewhere (consistent with uses like _MONTH_NAMES.index(lines[j]) + 1) and
remove the unnecessary strptime import/usage in the CumberlandCouncil parsing
logic.
- Around line 67-68: The loop in the CumberlandCouncil parser redundantly calls
line.strip() because lines are already stripped when constructed; update the
conditional in the for loop (the enumerate over lines in the method that
processes the calendar lines) to use line.startswith("Collection calendar")
instead of line.strip().startswith(...). Locate the for i, line in
enumerate(lines): block in CumberlandCouncil.py and change the check to rely on
the already-stripped line variable.

Comment on lines +85 to +89
is_same_year = (
start_month_num is None
or end_month_num is None
or end_month_num >= start_month_num
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Silent same-year fallback when month names cannot be parsed.

is_same_year evaluates to True whenever start_month_num or end_month_num is None. This means a page whose heading contains the year but not parseable month names will silently produce wrong years for a cross-year calendar instead of surfacing the problem. Per the project's explicit-failure preference, consider raising here (or at least guarding the cross-year path) so a structural change in the heading is caught immediately.

Based on learnings from the UKBinCollectionData project: prefer explicit failures (raise exceptions on unexpected formats) over silent defaults, to ensure format changes are detected early.

🛡️ Proposed fix
+        if start_month_num is None or end_month_num is None:
+            raise ValueError(
+                "Could not determine start/end months from 'Collection calendar' heading. "
+                "Page format may have changed."
+            )
+
         is_same_year = (
-            start_month_num is None
-            or end_month_num is None
-            or end_month_num >= start_month_num
+            end_month_num >= start_month_num
         )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
is_same_year = (
start_month_num is None
or end_month_num is None
or end_month_num >= start_month_num
)
if start_month_num is None or end_month_num is None:
raise ValueError(
"Could not determine start/end months from 'Collection calendar' heading. "
"Page format may have changed."
)
is_same_year = (
end_month_num >= start_month_num
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/CumberlandCouncil.py` around
lines 85 - 89, The current is_same_year logic silently treats unparsed months as
same-year because it returns True when start_month_num or end_month_num is None;
update the code around is_same_year (the variables start_month_num,
end_month_num, and is_same_year) to fail explicitly instead: if either month
number is None, raise a clear parsing/format exception (e.g., ValueError with
context including the raw heading) or skip the cross-year branch and raise when
cross-year handling would be needed, so that unparseable headings are surfaced
rather than defaulting to same-year.

if year_2026:
current_year = "2026" if line in ["January", "February"] else "2025"
if line in _MONTH_NAMES:
month_num = datetime.strptime(line, "%B").month
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, find the file
fd -t f "CumberlandCouncil.py"

Repository: robbrad/UKBinCollectionData

Length of output: 136


🏁 Script executed:

# Check the import statements and lines mentioned
cat -n uk_bin_collection/uk_bin_collection/councils/CumberlandCouncil.py | head -20

Repository: robbrad/UKBinCollectionData

Length of output: 853


🏁 Script executed:

# Check around the lines mentioned (100, 136, 140, 153)
sed -n '95,160p' uk_bin_collection/uk_bin_collection/councils/CumberlandCouncil.py | cat -n

Repository: robbrad/UKBinCollectionData

Length of output: 2671


🏁 Script executed:

# Check what's exported from common.py
cat -n uk_bin_collection/uk_bin_collection/common.py | head -100

Repository: robbrad/UKBinCollectionData

Length of output: 3297


🏁 Script executed:

# Check if datetime is imported in common.py
rg -n "^import datetime|^from datetime" uk_bin_collection/uk_bin_collection/common.py

Repository: robbrad/UKBinCollectionData

Length of output: 113


Replace star import with explicit imports of datetime and date_format.

Lines 100, 136, 140, and 153 use datetime and date_format from the star import. Explicit imports remove the Ruff F405 ambiguity and clarify dependencies.

Also applies to: 136-136, 140-140, 153-153

🧰 Tools
🪛 Ruff (0.15.1)

[error] 100-100: datetime may be undefined, or defined from star imports

(F405)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/CumberlandCouncil.py` at line
100, Replace the star import in CumberlandCouncil.py with explicit imports for
the used names: import datetime (or from datetime import datetime as needed) and
import date_format from its defining module; then update usages like month_num =
datetime.strptime(line, "%B").month and any other references at the locations
that previously relied on the star import (the occurrences of datetime and
date_format) so they resolve to the explicitly imported symbols and avoid the
Ruff F405 ambiguity.

Comment on lines 134 to 144
try:
date_str = f"{day} {current_month} {current_year}"
collection_date = datetime.strptime(date_str, "%d %B %Y")

dict_data = {
"type": bin_type,
"collectionDate": collection_date.strftime(date_format),
}
bindata["bins"].append(dict_data)
except ValueError:
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Silent except ValueError: pass masks format regressions.

The bare pass means any date-construction failure (including a silently wrong year assignment that produces an out-of-range date) is swallowed with no signal. Per the project's coding preference, this should propagate or at minimum log the failure so format changes are detected immediately rather than producing an empty result set.

Based on learnings from the UKBinCollectionData project: in uk_bin_collection/**/*.py, prefer explicit failures (raise exceptions on unexpected formats) over silent defaults or swallowed errors, and use clear exception types so format changes are detected early.

🛡️ Proposed fix
                    try:
                        date_str = f"{day} {current_month} {current_year}"
                        collection_date = datetime.strptime(date_str, "%d %B %Y")
-                    except ValueError:
-                        pass
+                    except ValueError as e:
+                        raise ValueError(
+                            f"Failed to parse collection date '{date_str}': {e}"
+                        ) from e
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
date_str = f"{day} {current_month} {current_year}"
collection_date = datetime.strptime(date_str, "%d %B %Y")
dict_data = {
"type": bin_type,
"collectionDate": collection_date.strftime(date_format),
}
bindata["bins"].append(dict_data)
except ValueError:
pass
try:
date_str = f"{day} {current_month} {current_year}"
collection_date = datetime.strptime(date_str, "%d %B %Y")
dict_data = {
"type": bin_type,
"collectionDate": collection_date.strftime(date_format),
}
bindata["bins"].append(dict_data)
except ValueError as e:
raise ValueError(
f"Failed to parse collection date '{date_str}': {e}"
) from e
🧰 Tools
🪛 Ruff (0.15.1)

[error] 136-136: datetime may be undefined, or defined from star imports

(F405)


[error] 140-140: date_format may be undefined, or defined from star imports

(F405)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@uk_bin_collection/uk_bin_collection/councils/CumberlandCouncil.py` around
lines 134 - 144, The try/except block around building date_str and parsing it
(using date_str, collection_date, current_month, current_year, date_format and
appending dict_data to bindata["bins"]) silently swallows ValueError; change it
to surface the failure by either logging the failing date string and bin_type
then re-raising a ValueError or simply re-raising with contextual information
(e.g. include date_str and bin_type), so format regressions in
CumberlandCouncil.py are not ignored; do not keep a bare `except ValueError:
pass`.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant