Skip to content

NE7K/Ml-crawlers-lab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

152 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿค– ml-crawlers-lab Repository

A comprehensive collection of Python practice projects focused on web crawling, data automation, AI integration, data analysis, and visualization. This repository includes modular examples demonstrating real-world applications such as scraping popular websites, processing stock and financial data, integrating OpenAI, and analyzing datasets using pandas and matplotlib.

๐Ÿ›  Technologies Used

Python 3.x

Web Crawling: Selenium, BeautifulSoup, requests

Automation & Scheduling: time, threading, os

Data Handling: json, dotenv, pandas, openpyxl

Visualization: matplotlib

AI & LLMs: openai, LangChain

Excel & Image Processing: Pillow (PIL), xlsxwriter

๐Ÿ“Œ Key Topics Covered

๐Ÿ“ฆ Amazon Website Crawler Extract product and context data from Amazon using requests and HTML parsing.

๐Ÿ“ธ Instagram User Info Crawler Automate login and extract user content with environment variables for secure auth.

๐ŸŸข Naver Auto Login & Blog Crawler Automate login, handle CAPTCHA, and scrape blog content using scrolling logic.

๐Ÿช™ CoinOne Cryptocurrency Price Crawler Real-time price scraping and storage in JSON.

๐Ÿ“ˆ Korea Stock Price Crawler Scrape and parse South Korean stock market data with multi-threaded logic.

๐Ÿงต Multithreading Practice Apply threading to speed up crawling and reduce blocking I/O time.

๐Ÿง  OpenAI Vision & Text Integration Use gpt-4, dall-e, and vision APIs to analyze images and generate text.

๐Ÿ“Š Data Analysis with Pandas Clean, group, and analyze data with pandas and visualize with matplotlib.

๐Ÿงฎ Regression Analysis From basic linear regression to advanced model comparison using datasets.

๐Ÿงฑ Object-Oriented Programming Class structure and object management examples (CreateObject.py).

๐Ÿ“ File I/O and Management Read/write handling for .txt, .json, .xlsx and directory ops.

๐Ÿ–ผ Image Resizing Automation Process and resize images using the PIL library.

โฑ Time-Based Execution & Control Scripts using time.sleep(), timing execution, or simulating delays.

๐Ÿ“‚ Project Structure

Python-Study-Repository/
โ”‚
โ”œโ”€โ”€ Ai/                                      # OpenAI API and translation-related scripts
โ”‚   โ”œโ”€โ”€ .env
โ”‚   โ”œโ”€โ”€ english.xlsx                         # Input Excel file for translation
โ”‚   โ”œโ”€โ”€ OpenAi_img.py                        # OpenAI Vision API (image-to-text)
โ”‚   โ”œโ”€โ”€ OpenAi_text.py                       # OpenAI Text API usage
โ”‚   โ”œโ”€โ”€ output.xlsx                          # Output Excel file with translated results
โ”‚   โ”œโ”€โ”€ test.jpg                             # Test image for Vision API
โ”‚   โ”œโ”€โ”€ Translate_exel.py                    # Excel translation handler
โ”‚   โ”œโ”€โ”€ Translate_http.py                    # HTTP-based translation script
โ”‚   โ””โ”€โ”€ Translate.py                         # General-purpose translation logic
โ”‚
โ”œโ”€โ”€ Langchain/                               # LangChain-based LLM experiment scripts
โ”‚   โ””โ”€โ”€ Langchain_1.py                       # Sample test using LangChain + ChatModel
โ”‚
โ”œโ”€โ”€ Pandas/                                  # Data analysis using pandas
โ”‚   โ”œโ”€โ”€ credit.csv                           # Sample dataset
โ”‚   โ”œโ”€โ”€ Pandas_1.py                          # Basic DataFrame operations
โ”‚   โ”œโ”€โ”€ Pandas_2.py                          # Grouping and aggregation
โ”‚   โ”œโ”€โ”€ Pandas_3.py                          # Cleaning and filtering
โ”‚   โ”œโ”€โ”€ PandasAnalyze.py                     # Custom analysis logic
โ”‚   โ””โ”€โ”€ product.xlsx                         # Excel-based product dataset
โ”‚
โ”œโ”€โ”€ Visualization/                           # Data visualization and regression analysis
โ”‚   โ”œโ”€โ”€ california_housing.csv               # Dataset for regression examples
โ”‚   โ”œโ”€โ”€ income.txt                           # Example dataset (income)
โ”‚   โ”œโ”€โ”€ matplot_Graph.py                     # Graph drawing using matplotlib
โ”‚   โ”œโ”€โ”€ regression_analysis_1.py             # Basic linear regression analysis
โ”‚   โ”œโ”€โ”€ regression_analysis_2.py             # Multiple regression
โ”‚   โ”œโ”€โ”€ regression_analysis_3.py             # Regression using scikit-learn
โ”‚   โ”œโ”€โ”€ regression_analysis_4.py             # Comparison of regression metrics (Rยฒ, MSE)
โ”‚   โ”œโ”€โ”€ regression_analysis_5.py             # Comparing different regression models
โ”‚   โ””โ”€โ”€ StockData.py                         # Regression visualization on stock data
โ”‚
โ”œโ”€โ”€ WebCrawler/                              # Web scraping scripts by target site
โ”‚   โ”œโ”€โ”€ Amazon/
โ”‚   โ”‚   โ”œโ”€โ”€ WebContext.txt                   # Crawler context or notes
โ”‚   โ”‚   โ””โ”€โ”€ WebsiteCrawle.py                 # Amazon crawler
โ”‚   โ”‚
โ”‚   โ”œโ”€โ”€ Instagram/
โ”‚   โ”‚   โ”œโ”€โ”€ .env
โ”‚   โ”‚   โ”œโ”€โ”€ .gitignore
โ”‚   โ”‚   โ””โ”€โ”€ InstagramCrawle.py               # Instagram user data crawler
โ”‚   โ”‚
โ”‚   โ”œโ”€โ”€ Naver/
โ”‚   โ”‚   โ”œโ”€โ”€ .env
โ”‚   โ”‚   โ”œโ”€โ”€ .gitignore
โ”‚   โ”‚   โ”œโ”€โ”€ BlogScrollCrawler.py             # Naver blog scroll crawler
โ”‚   โ”‚   โ””โ”€โ”€ Logincaptcha.py                  # Login + CAPTCHA automation
โ”‚   โ”‚
โ”‚   โ””โ”€โ”€ Stock/
โ”‚       โ”œโ”€โ”€ CoinCrawler.py                   # Coin market crawler
โ”‚       โ”œโ”€โ”€ CrawlerResult.txt                # Example output of crawling
โ”‚       โ”œโ”€โ”€ MultiThread.py                   # Web crawling using multithreading
โ”‚       โ”œโ”€โ”€ project.py                       # Stock crawling project entry
โ”‚       โ”œโ”€โ”€ StockCrawler.py                  # Stock data crawler
โ”‚       โ”œโ”€โ”€ test.json                        # Sample output in JSON format
โ”‚       โ””โ”€โ”€ WebCrawler.py                    # General-purpose web crawler
โ”‚
โ”œโ”€โ”€ Others/                                  # Miscellaneous files
โ”‚
โ”œโ”€โ”€ testFile/                                # Temporary test scripts or data
โ”œโ”€โ”€ testFile2/                               # Another test directory
โ”‚
โ”œโ”€โ”€ CreateObject.py                          # Object-oriented programming example
โ”œโ”€โ”€ FileControll.py                          # File read/write control
โ”œโ”€โ”€ ImageResizing.py                         # Image resize operations
โ”œโ”€โ”€ TimeProcess.py                           # Script demonstrating time-related operations
โ”‚
โ”œโ”€โ”€ .gitignore
โ””โ”€โ”€ README.md


๐Ÿ“ƒ License

This project is open for educational and personal use. No specific license is applied.

About

๐Ÿค– projects for web crawling, automation, and machine learning with Python.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages