Skip to content

depeelalgussz/eksi-sozluk-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

Eksi Sozluk Scraper

A powerful tool designed for structured extraction of entries, topics, users, and keyword-based content from Ekşi Sözlük. This scraper helps analysts, researchers, and developers collect clean, organized data from one of Turkey’s largest discussion platforms. Use it to automate deep content retrieval, trend monitoring, or dataset generation at scale.

Bitbash Banner

Telegram   WhatsApp   Gmail   Website

Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for eksi-sozluk-scraper you've just found your team — Let’s Chat. 👆👆

Introduction

The scraper automates collecting entries, topics, user submissions, and keyword-based search results from Ekşi Sözlük. It eliminates the complexity of navigating pagination, filters, and geo-restrictions by providing a streamlined interface for structured data extraction.

Why This Scraper Matters

  • Enables structured access to content without an official API.
  • Supports keyword, author, date-range, and topic-specific extraction.
  • Offers flexible pagination, result limits, and sorting modes.
  • Delivers clean, ready-to-use JSON data suited for analytics and research.
  • Designed for scalable and customizable extraction workflows.

Features

Feature Description
Flexible keyword search Retrieve entries using a keyword, with optional filters for author, dates, and sorting.
Topic & entry scraping Extract detailed content from specific entries or topic pages.
User-centric scraping Collect all entries from specific Ekşi Sözlük users for profiling and research.
Pagination control Limit scraping to a specific number of pages or intervals.
Result limiting Set a maximum number of items per run for controlled data extraction.
Custom mapping & extension functions Inject your own logic to format or enhance scraped items.
Proxy support Bypass geo-restrictions using custom proxies.

What Data This Scraper Extracts

Field Name Field Description
type Identifies the scraped object type (e.g., entry).
issueUrl URL of the topic or issue where the entry belongs.
entryId Unique identifier for the entry.
authorId Unique ID of the entry author.
authorName Username of the entry author.
authorUrl Profile URL of the entry author.
authorAvatar Direct link to the author’s profile image.
issueTitle Title of the related topic/issue.
entryUrl URL of the specific entry.
content Full HTML-formatted entry content.
date Publish date of the entry.
favoriteCount Number of times the entry was favorited.

Example Output

[
  {
    "type": "entry",
    "issueUrl": "https:/www.eksisozluk.com/eksi-sozluk-hakkindaki-akademik-calismalar--2131734",
    "entryId": "110286085",
    "authorId": "8097",
    "authorName": "ssg",
    "authorUrl": "https:/www.eksisozluk.com/biri/ssg",
    "authorAvatar": "https://img.ekstat.com/profiles/ssg-638271629469815122.jpg",
    "issueTitle": "ekşi sözlük hakkındaki akademik çalışmalar",
    "entryUrl": "https:/www.eksisozluk.com/entry/110286085",
    "content": "web-based macroseismic intensity study in turkey...",
    "date": "16.07.2020 21:19",
    "favoriteCount": "17"
  }
]

Directory Structure Tree

Eksi Sozluk Scraper/
├── src/
│   ├── main.js
│   ├── crawler/
│   │   ├── eksisozluk_parser.js
│   │   └── utils_date.js
│   ├── helpers/
│   │   └── formatter.js
│   ├── outputs/
│   │   └── export_manager.js
│   └── config/
│       └── settings.example.json
├── data/
│   ├── input.sample.json
│   └── sample_output.json
├── package.json
├── requirements.txt
└── README.md

Use Cases

  • Researchers extract topic-based discussions for linguistic, sociological, or sentiment analysis.
  • Digital journalists gather historical entries to support investigative stories or trend timelines.
  • Marketing analysts track public sentiment around brands, celebrities, or events.
  • Developers integrate real-time Ekşi Sözlük content into custom dashboards or monitoring tools.
  • Data scientists build datasets for machine learning models focused on classification, NLP, or topic modeling.

FAQs

Q1: Do I need proxies to run the scraper? Yes. Ekşi Sözlük strongly enforces geo-targeting. Using Turkish IP proxies ensures stable access and prevents request failures.

Q2: Can I scrape only specific pages of a topic? Yes. Provide any topic page URL inside startUrls and set endPage to the number of pages you want. You can also target page intervals by specifying a starting page and a higher end page.

Q3: How do I limit the number of results? Use the maxItems parameter to cap the total number of items returned, useful for large search results or performance-driven workflows.

Q4: Can I customize the output structure? Yes. Use extendOutputFunction or customMapFunction to modify or enrich each extracted item according to your own logic.


Performance Benchmarks and Results

Primary Metric: Processes ~100 listings in about 2 minutes under stable proxy conditions, offering rapid extraction for large topic archives.

Reliability Metric: Consistently maintains high success rates when operating through Turkish IP addresses, minimizing blocked requests.

Efficiency Metric: Optimized for low compute consumption, typically ranging between 0.01–0.15 units per hundred listings depending on pagination depth.

Quality Metric: Delivers highly complete datasets, capturing entry content, metadata, author details, and timestamps with strong accuracy.

Book a Call Watch on YouTube

Review 1

"Bitbash is a top-tier automation partner, innovative, reliable, and dedicated to delivering real results every time."

Nathan Pennington
Marketer
★★★★★

Review 2

"Bitbash delivers outstanding quality, speed, and professionalism, truly a team you can rely on."

Eliza
SEO Affiliate Expert
★★★★★

Review 3

"Exceptional results, clear communication, and flawless delivery.
Bitbash nailed it."

Syed
Digital Strategist
★★★★★

Releases

No releases published

Packages

 
 
 

Contributors