GitHub - ibalgo/insider-trading-prediction: Prediction of insider trading trends on Polymarket.

Polymarket Async Data Pipeline

A high-performance asynchronous ETL (Extract, Transform, Load) pipeline built in Python to fetch, process, and archive historical trade data from the Polymarket CLOB (Central Limit Order Book).

🚀 Overview

This project was designed to overcome the limitations of synchronous data fetching. By leveraging asyncio and aiohttp, the pipeline can fetch trade history for hundreds of markets simultaneously, reducing total execution time by over 90% compared to traditional sequential methods.

🛠 Tech Stack

Language: Python 3.10+

Concurrency: asyncio (Event Loop, Coroutines, Tasks)

Networking: aiohttp (Asynchronous HTTP client with connection pooling)

Data Analysis: pandas (Vectorized data transformation)

Persistence: CSV (Optimized for time-series storage)

🏗 Architecture & Logic

Asynchronous Design

The core of the pipeline utilizes a non-blocking I/O pattern:

Event Loop Management: The script initializes a single aiohttp.ClientSession to take advantage of TCP/SSL connection pooling.

Task Orchestration: Market IDs are mapped into a list of coroutine objects.

Concurrency: asyncio.gather() schedules these coroutines on the event loop, overlapping the network latency of hundreds of requests.

Graceful Error Handling: Implements status code checks (e.g., HTTP 429 for rate limiting) and exponential backoff to ensure pipeline stability.

Data Processing Flow

Extraction: Fetches raw JSON trade data from the Polymarket API.

Transformation: Uses Pandas to normalize JSON structures, convert Unix timestamps to localized UTC-ISO formats, and calculate price movements.

Loading: Deduplicates data and persists it to a structured CSV format for downstream quantitative analysis.

📈 Performance Benchmarks

Method

Markets Fetched

Approx. Execution Time

Synchronous

200

~180 Seconds

Asynchronous (This Project)

200

~12 Seconds

⚙️ Setup & Usage

Prerequisites

Python 3.10 or higher

pip install aiohttp pandas

Running the Pipeline

python main.py

🧠 Core Competencies Demonstrated

Asynchronous Programming: Expert use of async/await, gather, and ClientSession context managers.

Resource Management: Efficient handling of network sockets and memory during high-concurrency tasks.

API Integration: Robust communication with RESTful endpoints including query parameterization and header management.

Data Engineering: Designing scalable pipelines that handle "dirty" API data and convert it into clean, analysis-ready formats.

Developed for quantitative analysis and market research on the Polymarket ecosystem.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
__pycache__		__pycache__
.DS_Store		.DS_Store
README.md		README.md
historical_trades.csv		historical_trades.csv
historical_trades_20260112.csv		historical_trades_20260112.csv
historical_trades_20260113.csv		historical_trades_20260113.csv
historical_trades_20260114.csv		historical_trades_20260114.csv
historical_trades_20260115.csv		historical_trades_20260115.csv
main.py		main.py
nobel-peace-prize-winner-2025_trades.csv		nobel-peace-prize-winner-2025_trades.csv
pipeline_debug.log		pipeline_debug.log
requirements.txt		requirements.txt
sync_pipeline.py		sync_pipeline.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages