Skip to content

msenior85/dla_piper

Repository files navigation

Scrape Data Protection Laws

A web scraping project built using Scrapy to extract data from https://www.dlapiperdataprotection.com.

This is a solution project to the below upwork job.

Screenshot of the job description from upwork

🚀 Getting Started

1. Clone the Repo

git clone https://github.com/msenior85/dla_piper.git
cd dla_piper

2. Install uv if not already done

If you are on Linux or macOS

curl -Ls https://astral.sh/uv/install.sh | sh

Or using Homebrew (macOS)

brew install astral-sh/tap/uv

If you are on Windows

powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

3. Run the spider

uv run scrapy crawl laws

4. Export scraped data (optional)

uv run scrapy crawl laws -O laws.csv

The Spider will generate a csv file (laws.csv) in the root of the project.

🧰 Configuration

Update middlewares.py file to use your own proxy string or comment out the line if you don't have one.

import os


class CustomProxyMiddleware:
    def __init__(self):
        self.proxy = os.getenv("proxy_us") # use your own proxy string'

    def process_request(self, request, spider):
        request.meta["proxy"] = self.proxy

Alternatively if you have no proxy, disable the proxy middleware by commenting the below line in the settings.py file

DOWNLOADER_MIDDLEWARES = {
    "dla_piper.middlewares.CustomProxyMiddleware": 543, # comment out this line
}

📦 CSV Output Example

country description last_modified
Algeria Law No. 18-07 of 10 June 2018 on protection of natural persons in personal data processing (“Law No. 18-07”). 20 January 2025
Armenia Personal Data Protection Law as of 18.05.2015, number ՀՕ-49-Ն. 28 January 2025

🛡️ License

MIT License – see LICENSE for details.

About

A web scraping project built using Scrapy to extract data from https://www.dlapiperdataprotection.com

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages