Skip to content

ibalgo/insider-trading-prediction

Repository files navigation

Polymarket Async Data Pipeline

A high-performance asynchronous ETL (Extract, Transform, Load) pipeline built in Python to fetch, process, and archive historical trade data from the Polymarket CLOB (Central Limit Order Book).

🚀 Overview

This project was designed to overcome the limitations of synchronous data fetching. By leveraging asyncio and aiohttp, the pipeline can fetch trade history for hundreds of markets simultaneously, reducing total execution time by over 90% compared to traditional sequential methods.

🛠 Tech Stack

Language: Python 3.10+

Concurrency: asyncio (Event Loop, Coroutines, Tasks)

Networking: aiohttp (Asynchronous HTTP client with connection pooling)

Data Analysis: pandas (Vectorized data transformation)

Persistence: CSV (Optimized for time-series storage)

🏗 Architecture & Logic

Asynchronous Design

The core of the pipeline utilizes a non-blocking I/O pattern:

Event Loop Management: The script initializes a single aiohttp.ClientSession to take advantage of TCP/SSL connection pooling.

Task Orchestration: Market IDs are mapped into a list of coroutine objects.

Concurrency: asyncio.gather() schedules these coroutines on the event loop, overlapping the network latency of hundreds of requests.

Graceful Error Handling: Implements status code checks (e.g., HTTP 429 for rate limiting) and exponential backoff to ensure pipeline stability.

Data Processing Flow

Extraction: Fetches raw JSON trade data from the Polymarket API.

Transformation: Uses Pandas to normalize JSON structures, convert Unix timestamps to localized UTC-ISO formats, and calculate price movements.

Loading: Deduplicates data and persists it to a structured CSV format for downstream quantitative analysis.

📈 Performance Benchmarks

Method

Markets Fetched

Approx. Execution Time

Synchronous

200

~180 Seconds

Asynchronous (This Project)

200

~12 Seconds

⚙️ Setup & Usage

Prerequisites

Python 3.10 or higher

pip install aiohttp pandas

Running the Pipeline

python main.py

🧠 Core Competencies Demonstrated

Asynchronous Programming: Expert use of async/await, gather, and ClientSession context managers.

Resource Management: Efficient handling of network sockets and memory during high-concurrency tasks.

API Integration: Robust communication with RESTful endpoints including query parameterization and header management.

Data Engineering: Designing scalable pipelines that handle "dirty" API data and convert it into clean, analysis-ready formats.

Developed for quantitative analysis and market research on the Polymarket ecosystem.

About

Prediction of insider trading trends on Polymarket.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages