Skip to content

WasifAsad/Bangla-News-Text-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

📰 Bangla News Text Analysis & Topic Modeling

Made with R

📌 Objective

The goal of this project is to perform complete text analysis on Bangla news articles scraped from the NTV BD news portal.
It demonstrates practical skills in:

  • Web scraping Bangla news articles
  • Text preprocessing for Bangla language
  • Visual analytics (word frequency, word clouds, bar plots)
  • Topic modeling with Latent Dirichlet Allocation (LDA)

🛠 Features

  • Web Scraping: Extracted Bangla news article texts from multiple category pages on NTV BD.
  • Text Preprocessing: Removed punctuation, digits, extra spaces, and common Bangla stopwords.
  • Word Frequency Analysis: Visualized top words with WordCloud and frequency bar plots.
  • Topic Modeling (LDA): Discovered hidden topics and visualized top words per topic.
  • Document–Topic Distribution: Interpreted results with stacked bar plots for each article.

📊 Workflow

  1. Scraping → Collect articles from NTV BD using rvest.
  2. Preprocessing → Clean Bangla text for analysis.
  3. Word Frequency Analysis → Generate word cloud & bar plots.
  4. Topic Modeling → Apply LDA to discover topics.
  5. Visualization & Interpretation → Topic distributions and insights.

🧰 Tech Stack

  • Language: R
  • Libraries:
    • rvest – Web scraping
    • readr – Reading data files
    • dplyr – Data manipulation
    • purrr – Functional programming tools
    • tidyr – Data tidying
    • stringr – String processing
    • tm – Text mining
    • tokenizers – Tokenization
    • wordcloud – Word cloud generation
    • RColorBrewer – Color palettes for visualizations
    • topicmodels – LDA topic modeling
    • tidytext – Text mining using tidy data principles
    • ggplot2 – Data visualization

🚀 How to Run

  1. Clone the repo:

    git clone https://github.com/WasifAsad/Bangla-News-Text-Analysis.git
    cd bangla-news-text-analysis
  2. Open the .R scripts or R Markdown files.

  3. Install required libraries:

    install.packages(c("rvest", "readr", "dplyr", "purrr", "tidyr", "stringr","tm", "tokenizers", "wordcloud", "RColorBrewer","topicmodels", "tidytext", "ggplot2"))
    
  4. Run script : Bangla-Text-Analysis.R

N.B. :

  • You can change your news portal by replacing the website in base_url.
  • Also change the output directories (Word Cloud, .csv file, etc.) according to your own folder paths.
  • Also you can add other Bangla stop words.

📸 Sample Outputs (For the specific day that we ran the code on)

Word Cloud

wordcloud

Top Words Visualization per Topic

barplot_topic1 barplot_topic2 barplot_topic3 barplot_topic4 barplot_topic5

📜 License

This project is licensed under the MIT License. Feel free to use, modify, and distribute this project.


📧 Contact

For inquiries or support, feel free to reach out:
Authors:
Wasif Asad Alvi
Md. Tamjid Hossain
Rifat Talukdar
Md. Tanziul Haque
Email: [wasifasad35@gmail.com]
GitHub: WasifAsad

About

Text analysis, word frequency visualization, and topic modeling on Bangla news articles from NTV BD.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages