A simple amazon scraper to extract product details and prices from Amazon.com using Python Requests and Selectorlib.
Full article at ScrapeHero Tutorials
There are three simple scrapers in this project.
- Product Detail Page Scraper:
bin/product_detail.py - Product Detail Page Spider Scraper:
bin/product_detail_spider.py - Product Reviews Page Scraper:
bin/product_reviews.py - Product Reviews Page Spider Scraper:
bin/product_reviews_spider.py - Search Results Page Scraper:
bin/product_search_results.py
Step 1: Clone repo.
$ git clone https://github.com/adrianmarino/amazon-scraper.git
$ cd amazon-scraperStep 2: Create environment.
$ cd amazon-scraper
$ conda env create -f environment.ymlStep 1: Enable project environment.
$ conda activate amazon-scraperStep 2: Configure fields to scrap into config files:
config/product_detail_selectors.yml: Map ccs/xpath selectors to json fields for product details scrapping.config/product_detail_urls: Urls used bybin/product_detail.pyscrapper.config/product_reviews_selectors.yml: Map ccs/xpath selectors to json fields for product reviews scrapping.config/product_reviews_urls: Urls used bybin/product_reviews.pyscrapper.config/product_search_results_selectors.yml: Map ccs/xpath selectors to json fields for product search result scrapping.config/product_search_results_urls: Urls used bybin/product_detail.pyandbin/product_detail_spider.pyscrapper.
Notes
bin/product_detail_spider.pyget urls specified intoconfig/product_search_results_urlsand use bothconfig/product_search_results_selectors.ymlandconfig/product_detail_selectors.ymlto scrap product details. The result is a file by product inoutputpath.bin/product_reviews_spider.pyget urls specified intooutput/[PRODUCT_ID | PRODUCT_ID_varaint_PRODUCT_ID.json]files and useconfig/product_reviews_selectors.ymlto scrap product reviews. The result is a file by product inoutputpath.
Step 3: From terminal execute any of next commands:
$ python bin/product_detail.py$ python bin/product_reviews.py$ python bin/product_search_results.py$ python bin/product_detail_spider.py$ python bin/product_reviews_spider.pyNotes
bin/product_reviews_spider.pyrequired runbin/product_detail_spider.pyfirst.bin/product_reviews_spider.pygenerate product review files frombin/product_detail_spider.pyresult files.
Step 4: Scrapped data is downloaded into output directory. One file by product details and one file by search results.
- Proxies lists:
- Setup proxies under
src/scrapper/scrapper_factory.py