Remove Selenium from BarkingDagenham by PaulBrack · Pull Request #1813 · robbrad/UKBinCollectionData

PaulBrack · 2026-01-15T21:14:57Z

Use a publicly available JSON endpoint

Summary by CodeRabbit

Refactor
- Updated the Barking & Dagenham bin collection service to use a direct API integration for retrieving schedules, replacing web scraping methodology.
- Streamlined data processing to enhance performance and improve service reliability.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-15T21:15:10Z

📝 Walkthrough

Walkthrough

Replace Selenium and BeautifulSoup web scraping with direct HTTP API request to a REST endpoint. Update parse_data signature to make the page parameter optional with default None and introduce UPRN-based data retrieval. Remove postcode-based navigation and complex DOM parsing; replace with JSON response processing.

Changes

Cohort / File(s)	Summary
Web Scraping to REST API Migration `uk_bin_collection/uk_bin_collection/councils/BarkingDagenham.py`	Replace Selenium/BeautifulSoup scraping with HTTP GET request to REST API endpoint; change parse_data signature (page parameter now optional); remove postcode navigation, driver setup, and DOM parsing; add JSON parsing logic to extract bin type and collection dates from API response; normalize date strings and format as dd/mm/YYYY

Sequence Diagram

sequenceDiagram
    participant Client
    participant Parser
    participant REST API
    
    Client->>Parser: Provide UPRN
    Parser->>Parser: Validate UPRN (check_uprn)
    Parser->>REST API: GET /rest/bin/{uprn}<br/>(Accept: application/json)
    activate REST API
    REST API-->>Parser: JSON response<br/>(bin data + dates)
    deactivate REST API
    Parser->>Parser: Parse JSON payload
    Parser->>Parser: Extract bin_type and dates<br/>from response
    Parser->>Parser: Normalize and parse<br/>date strings
    Parser->>Parser: Format dates as dd/mm/YYYY
    Parser->>Client: Return bin collection data

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

fix: Cumberland Council #1764: Converts council scrapers from multi-step form or Selenium flows to direct UPRN-based retrieval with simpler parsing
feat: add direct mobile API support to North Hertfordshire District Council scraper #1715: Replaces Selenium/BeautifulSoup scrapers with direct UPRN-driven HTTP API calls and corresponding JSON parsing
Changed to use dedicated API to retrieve collection data #1742: Converts parser from Selenium/BeautifulSoup scraping to UPRN-based direct JSON API call with check_uprn validation

Poem

🐰 No more Selenium's dance, no DOM to parse with haste,
REST API calls are cleaner—embrace the JSON taste!
UPRN leads the way, dates flutter free,
Faster, simpler, sweeter—hop hop, collection spree! 🚛✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Remove Selenium from BarkingDagenham' directly and clearly describes the main change: replacing Selenium-based scraping with an HTTP API approach in the BarkingDagenham council module.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@uk_bin_collection/uk_bin_collection/councils/BarkingDagenham.py`:
- Around line 19-28: The User-Agent header passed to requests.get in
BarkingDagenham.py is truncated and ends with a trailing space; update the
"User-Agent" value in the requests.get call (the headers dict) to a complete,
standard browser User-Agent string (e.g., include browser/version and platform)
so servers that validate User-Agent accept the request; locate the requests.get
invocation in BarkingDagenham.py and replace the current truncated "User-Agent"
entry with a full string.

🧹 Nitpick comments (3)

uk_bin_collection/uk_bin_collection/councils/BarkingDagenham.py (3)
4-4: Consider explicit imports for better clarity.

The star import makes it harder to verify that check_uprn and remove_ordinal_indicator_from_date_string are defined. Static analysis tools flag this as potentially undefined (F405).

While this pattern appears common in this codebase, explicit imports would improve maintainability:
♻️ Suggested explicit imports
-from uk_bin_collection.uk_bin_collection.common import *
+from uk_bin_collection.uk_bin_collection.common import (
+    check_uprn,
+    remove_ordinal_indicator_from_date_string,
+)
10-10: Fix implicit Optional type hint.

Per PEP 484, page: str = None should explicitly indicate the None type. The page parameter is kept for interface compatibility with the parent class.
♻️ Suggested fix
-    def parse_data(self, page: str = None, **kwargs) -> dict:
+    def parse_data(self, page: str | None = None, **kwargs) -> dict:
46-58: Silent exception swallowing may hide issues.

The bare except Exception: continue silently discards parsing errors. While skipping unparseable dates is reasonable, logging would help diagnose issues with unexpected date formats.
♻️ Suggested improvement with logging
+import logging
+
+logger = logging.getLogger(__name__)
+
                 for date_text in date_strings:
                     try:
                         cleaned = remove_ordinal_indicator_from_date_string(date_text)
                         parsed = parse(cleaned, fuzzy=True)

                         data["bins"].append(
                             {
                                 "type": bin_type,
                                 "collectionDate": parsed.strftime("%d/%m/%Y"),
                             }
                         )
-                    except Exception:
+                    except (ValueError, TypeError) as e:
+                        logger.debug(f"Failed to parse date '{date_text}': {e}")
                         continue

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 274152f and 9009b97.

📒 Files selected for processing (1)

uk_bin_collection/uk_bin_collection/councils/BarkingDagenham.py

🧰 Additional context used

🧬 Code graph analysis (1)

uk_bin_collection/uk_bin_collection/councils/BarkingDagenham.py (3)

uk_bin_collection/uk_bin_collection/councils/NuneatonBedworthBoroughCouncil.py (1)

get_bin_data (63-945)

uk_bin_collection/uk_bin_collection/get_bin_data.py (1)

AbstractGetBinDataClass (43-146)

uk_bin_collection/uk_bin_collection/common.py (1)

check_uprn (67-78)

🪛 Ruff (0.14.11)

uk_bin_collection/uk_bin_collection/councils/BarkingDagenham.py

4-4: from uk_bin_collection.uk_bin_collection.common import * used; unable to detect undefined names

(F403)

10-10: Unused method argument: page

(ARG002)

10-10: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

15-15: check_uprn may be undefined, or defined from star imports

(F405)

48-48: remove_ordinal_indicator_from_date_string may be undefined, or defined from star imports

(F405)

57-58: try-except-continue detected, consider logging the exception

(S112)

57-57: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (2)

uk_bin_collection/uk_bin_collection/councils/BarkingDagenham.py (2)

14-15: LGTM!

Good use of check_uprn to validate the UPRN before proceeding with the API call.

29-32: LGTM!

Good use of raise_for_status() for HTTP error handling and safe access to the results key with a default empty list.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

coderabbitai · 2026-01-15T21:19:16Z

uk_bin_collection/uk_bin_collection/councils/BarkingDagenham.py

+            r = requests.get(
+                url,
+                headers={
+                    "User-Agent": (
+                        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
+                    ),
+                    "Accept": "application/json",
+                },
+                timeout=30,
            )


⚠️ Potential issue | 🟡 Minor

User-Agent string appears truncated.

The User-Agent header ends abruptly with a trailing space and lacks the typical browser identifier portion. This could potentially cause issues with servers that validate User-Agent strings.

♻️ Suggested complete User-Agent

r = requests.get( url, headers={ "User-Agent": ( - "Mozilla/5.0 (Windows NT 10.0; Win64; x64) " + "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36" ), "Accept": "application/json", }, timeout=30, )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

r = requests.get(

url,

headers={

"User-Agent": (

"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "

),

"Accept": "application/json",

},

timeout=30,

)

r = requests.get(

url,

headers={

"User-Agent": (

"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"

),

"Accept": "application/json",

},

timeout=30,

)

🤖 Prompt for AI Agents

In `@uk_bin_collection/uk_bin_collection/councils/BarkingDagenham.py` around lines 19 - 28, The User-Agent header passed to requests.get in BarkingDagenham.py is truncated and ends with a trailing space; update the "User-Agent" value in the requests.get call (the headers dict) to a complete, standard browser User-Agent string (e.g., include browser/version and platform) so servers that validate User-Agent accept the request; locate the requests.get invocation in BarkingDagenham.py and replace the current truncated "User-Agent" entry with a full string.

codecov · 2026-01-17T07:32:26Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.67%. Comparing base (274152f) to head (9009b97).

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #1813   +/-   ##
=======================================
  Coverage   86.67%   86.67%           
=======================================
  Files           9        9           
  Lines        1141     1141           
=======================================
  Hits          989      989           
  Misses        152      152

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Remove Selenium from BarkingDagenham

9009b97

coderabbitai bot reviewed Jan 15, 2026

View reviewed changes

PaulBrack marked this pull request as draft January 16, 2026 14:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Remove Selenium from BarkingDagenham#1813

Remove Selenium from BarkingDagenham#1813
PaulBrack wants to merge 1 commit intorobbrad:masterfrom
PaulBrack:remove_selenium_barking

PaulBrack commented Jan 15, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 15, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 15, 2026

Uh oh!

codecov bot commented Jan 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

PaulBrack commented Jan 15, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jan 17, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

PaulBrack commented Jan 15, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 15, 2026 •

edited

Loading