A powerful tool designed for structured extraction of entries, topics, users, and keyword-based content from Ekşi Sözlük. This scraper helps analysts, researchers, and developers collect clean, organized data from one of Turkey’s largest discussion platforms. Use it to automate deep content retrieval, trend monitoring, or dataset generation at scale.
Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for eksi-sozluk-scraper you've just found your team — Let’s Chat. 👆👆
The scraper automates collecting entries, topics, user submissions, and keyword-based search results from Ekşi Sözlük. It eliminates the complexity of navigating pagination, filters, and geo-restrictions by providing a streamlined interface for structured data extraction.
- Enables structured access to content without an official API.
- Supports keyword, author, date-range, and topic-specific extraction.
- Offers flexible pagination, result limits, and sorting modes.
- Delivers clean, ready-to-use JSON data suited for analytics and research.
- Designed for scalable and customizable extraction workflows.
| Feature | Description |
|---|---|
| Flexible keyword search | Retrieve entries using a keyword, with optional filters for author, dates, and sorting. |
| Topic & entry scraping | Extract detailed content from specific entries or topic pages. |
| User-centric scraping | Collect all entries from specific Ekşi Sözlük users for profiling and research. |
| Pagination control | Limit scraping to a specific number of pages or intervals. |
| Result limiting | Set a maximum number of items per run for controlled data extraction. |
| Custom mapping & extension functions | Inject your own logic to format or enhance scraped items. |
| Proxy support | Bypass geo-restrictions using custom proxies. |
| Field Name | Field Description |
|---|---|
| type | Identifies the scraped object type (e.g., entry). |
| issueUrl | URL of the topic or issue where the entry belongs. |
| entryId | Unique identifier for the entry. |
| authorId | Unique ID of the entry author. |
| authorName | Username of the entry author. |
| authorUrl | Profile URL of the entry author. |
| authorAvatar | Direct link to the author’s profile image. |
| issueTitle | Title of the related topic/issue. |
| entryUrl | URL of the specific entry. |
| content | Full HTML-formatted entry content. |
| date | Publish date of the entry. |
| favoriteCount | Number of times the entry was favorited. |
[
{
"type": "entry",
"issueUrl": "https:/www.eksisozluk.com/eksi-sozluk-hakkindaki-akademik-calismalar--2131734",
"entryId": "110286085",
"authorId": "8097",
"authorName": "ssg",
"authorUrl": "https:/www.eksisozluk.com/biri/ssg",
"authorAvatar": "https://img.ekstat.com/profiles/ssg-638271629469815122.jpg",
"issueTitle": "ekşi sözlük hakkındaki akademik çalışmalar",
"entryUrl": "https:/www.eksisozluk.com/entry/110286085",
"content": "web-based macroseismic intensity study in turkey...",
"date": "16.07.2020 21:19",
"favoriteCount": "17"
}
]
Eksi Sozluk Scraper/
├── src/
│ ├── main.js
│ ├── crawler/
│ │ ├── eksisozluk_parser.js
│ │ └── utils_date.js
│ ├── helpers/
│ │ └── formatter.js
│ ├── outputs/
│ │ └── export_manager.js
│ └── config/
│ └── settings.example.json
├── data/
│ ├── input.sample.json
│ └── sample_output.json
├── package.json
├── requirements.txt
└── README.md
- Researchers extract topic-based discussions for linguistic, sociological, or sentiment analysis.
- Digital journalists gather historical entries to support investigative stories or trend timelines.
- Marketing analysts track public sentiment around brands, celebrities, or events.
- Developers integrate real-time Ekşi Sözlük content into custom dashboards or monitoring tools.
- Data scientists build datasets for machine learning models focused on classification, NLP, or topic modeling.
Q1: Do I need proxies to run the scraper? Yes. Ekşi Sözlük strongly enforces geo-targeting. Using Turkish IP proxies ensures stable access and prevents request failures.
Q2: Can I scrape only specific pages of a topic?
Yes. Provide any topic page URL inside startUrls and set endPage to the number of pages you want. You can also target page intervals by specifying a starting page and a higher end page.
Q3: How do I limit the number of results?
Use the maxItems parameter to cap the total number of items returned, useful for large search results or performance-driven workflows.
Q4: Can I customize the output structure?
Yes. Use extendOutputFunction or customMapFunction to modify or enrich each extracted item according to your own logic.
Primary Metric: Processes ~100 listings in about 2 minutes under stable proxy conditions, offering rapid extraction for large topic archives.
Reliability Metric: Consistently maintains high success rates when operating through Turkish IP addresses, minimizing blocked requests.
Efficiency Metric: Optimized for low compute consumption, typically ranging between 0.01–0.15 units per hundred listings depending on pagination depth.
Quality Metric: Delivers highly complete datasets, capturing entry content, metadata, author details, and timestamps with strong accuracy.
