Remove Selenium from BarkingDagenham#1813
Conversation
📝 WalkthroughWalkthroughReplace Selenium and BeautifulSoup web scraping with direct HTTP API request to a REST endpoint. Update parse_data signature to make the page parameter optional with default None and introduce UPRN-based data retrieval. Remove postcode-based navigation and complex DOM parsing; replace with JSON response processing. Changes
Sequence DiagramsequenceDiagram
participant Client
participant Parser
participant REST API
Client->>Parser: Provide UPRN
Parser->>Parser: Validate UPRN (check_uprn)
Parser->>REST API: GET /rest/bin/{uprn}<br/>(Accept: application/json)
activate REST API
REST API-->>Parser: JSON response<br/>(bin data + dates)
deactivate REST API
Parser->>Parser: Parse JSON payload
Parser->>Parser: Extract bin_type and dates<br/>from response
Parser->>Parser: Normalize and parse<br/>date strings
Parser->>Parser: Format dates as dd/mm/YYYY
Parser->>Client: Return bin collection data
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@uk_bin_collection/uk_bin_collection/councils/BarkingDagenham.py`:
- Around line 19-28: The User-Agent header passed to requests.get in
BarkingDagenham.py is truncated and ends with a trailing space; update the
"User-Agent" value in the requests.get call (the headers dict) to a complete,
standard browser User-Agent string (e.g., include browser/version and platform)
so servers that validate User-Agent accept the request; locate the requests.get
invocation in BarkingDagenham.py and replace the current truncated "User-Agent"
entry with a full string.
🧹 Nitpick comments (3)
uk_bin_collection/uk_bin_collection/councils/BarkingDagenham.py (3)
4-4: Consider explicit imports for better clarity.The star import makes it harder to verify that
check_uprnandremove_ordinal_indicator_from_date_stringare defined. Static analysis tools flag this as potentially undefined (F405).While this pattern appears common in this codebase, explicit imports would improve maintainability:
♻️ Suggested explicit imports
-from uk_bin_collection.uk_bin_collection.common import * +from uk_bin_collection.uk_bin_collection.common import ( + check_uprn, + remove_ordinal_indicator_from_date_string, +)
10-10: Fix implicitOptionaltype hint.Per PEP 484,
page: str = Noneshould explicitly indicate theNonetype. Thepageparameter is kept for interface compatibility with the parent class.♻️ Suggested fix
- def parse_data(self, page: str = None, **kwargs) -> dict: + def parse_data(self, page: str | None = None, **kwargs) -> dict:
46-58: Silent exception swallowing may hide issues.The bare
except Exception: continuesilently discards parsing errors. While skipping unparseable dates is reasonable, logging would help diagnose issues with unexpected date formats.♻️ Suggested improvement with logging
+import logging + +logger = logging.getLogger(__name__) + for date_text in date_strings: try: cleaned = remove_ordinal_indicator_from_date_string(date_text) parsed = parse(cleaned, fuzzy=True) data["bins"].append( { "type": bin_type, "collectionDate": parsed.strftime("%d/%m/%Y"), } ) - except Exception: + except (ValueError, TypeError) as e: + logger.debug(f"Failed to parse date '{date_text}': {e}") continue
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
uk_bin_collection/uk_bin_collection/councils/BarkingDagenham.py
🧰 Additional context used
🧬 Code graph analysis (1)
uk_bin_collection/uk_bin_collection/councils/BarkingDagenham.py (3)
uk_bin_collection/uk_bin_collection/councils/NuneatonBedworthBoroughCouncil.py (1)
get_bin_data(63-945)uk_bin_collection/uk_bin_collection/get_bin_data.py (1)
AbstractGetBinDataClass(43-146)uk_bin_collection/uk_bin_collection/common.py (1)
check_uprn(67-78)
🪛 Ruff (0.14.11)
uk_bin_collection/uk_bin_collection/councils/BarkingDagenham.py
4-4: from uk_bin_collection.uk_bin_collection.common import * used; unable to detect undefined names
(F403)
10-10: Unused method argument: page
(ARG002)
10-10: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
15-15: check_uprn may be undefined, or defined from star imports
(F405)
48-48: remove_ordinal_indicator_from_date_string may be undefined, or defined from star imports
(F405)
57-58: try-except-continue detected, consider logging the exception
(S112)
57-57: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (2)
uk_bin_collection/uk_bin_collection/councils/BarkingDagenham.py (2)
14-15: LGTM!Good use of
check_uprnto validate the UPRN before proceeding with the API call.
29-32: LGTM!Good use of
raise_for_status()for HTTP error handling and safe access to theresultskey with a default empty list.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| r = requests.get( | ||
| url, | ||
| headers={ | ||
| "User-Agent": ( | ||
| "Mozilla/5.0 (Windows NT 10.0; Win64; x64) " | ||
| ), | ||
| "Accept": "application/json", | ||
| }, | ||
| timeout=30, | ||
| ) |
There was a problem hiding this comment.
User-Agent string appears truncated.
The User-Agent header ends abruptly with a trailing space and lacks the typical browser identifier portion. This could potentially cause issues with servers that validate User-Agent strings.
♻️ Suggested complete User-Agent
r = requests.get(
url,
headers={
"User-Agent": (
- "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
),
"Accept": "application/json",
},
timeout=30,
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| r = requests.get( | |
| url, | |
| headers={ | |
| "User-Agent": ( | |
| "Mozilla/5.0 (Windows NT 10.0; Win64; x64) " | |
| ), | |
| "Accept": "application/json", | |
| }, | |
| timeout=30, | |
| ) | |
| r = requests.get( | |
| url, | |
| headers={ | |
| "User-Agent": ( | |
| "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36" | |
| ), | |
| "Accept": "application/json", | |
| }, | |
| timeout=30, | |
| ) |
🤖 Prompt for AI Agents
In `@uk_bin_collection/uk_bin_collection/councils/BarkingDagenham.py` around lines
19 - 28, The User-Agent header passed to requests.get in BarkingDagenham.py is
truncated and ends with a trailing space; update the "User-Agent" value in the
requests.get call (the headers dict) to a complete, standard browser User-Agent
string (e.g., include browser/version and platform) so servers that validate
User-Agent accept the request; locate the requests.get invocation in
BarkingDagenham.py and replace the current truncated "User-Agent" entry with a
full string.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1813 +/- ##
=======================================
Coverage 86.67% 86.67%
=======================================
Files 9 9
Lines 1141 1141
=======================================
Hits 989 989
Misses 152 152 ☔ View full report in Codecov by Sentry. |
Use a publicly available JSON endpoint
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.