The goal of this project is to perform complete text analysis on Bangla news articles scraped from the NTV BD news portal.
It demonstrates practical skills in:
- Web scraping Bangla news articles
- Text preprocessing for Bangla language
- Visual analytics (word frequency, word clouds, bar plots)
- Topic modeling with Latent Dirichlet Allocation (LDA)
- Web Scraping: Extracted Bangla news article texts from multiple category pages on NTV BD.
- Text Preprocessing: Removed punctuation, digits, extra spaces, and common Bangla stopwords.
- Word Frequency Analysis: Visualized top words with WordCloud and frequency bar plots.
- Topic Modeling (LDA): Discovered hidden topics and visualized top words per topic.
- Document–Topic Distribution: Interpreted results with stacked bar plots for each article.
- Scraping → Collect articles from NTV BD using
rvest. - Preprocessing → Clean Bangla text for analysis.
- Word Frequency Analysis → Generate word cloud & bar plots.
- Topic Modeling → Apply LDA to discover topics.
- Visualization & Interpretation → Topic distributions and insights.
- Language: R
- Libraries:
rvest– Web scrapingreadr– Reading data filesdplyr– Data manipulationpurrr– Functional programming toolstidyr– Data tidyingstringr– String processingtm– Text miningtokenizers– Tokenizationwordcloud– Word cloud generationRColorBrewer– Color palettes for visualizationstopicmodels– LDA topic modelingtidytext– Text mining using tidy data principlesggplot2– Data visualization
-
Clone the repo:
git clone https://github.com/WasifAsad/Bangla-News-Text-Analysis.git cd bangla-news-text-analysis -
Open the .R scripts or R Markdown files.
-
Install required libraries:
install.packages(c("rvest", "readr", "dplyr", "purrr", "tidyr", "stringr","tm", "tokenizers", "wordcloud", "RColorBrewer","topicmodels", "tidytext", "ggplot2"))
-
Run script : Bangla-Text-Analysis.R
- You can change your news portal by replacing the website in base_url.
- Also change the output directories (Word Cloud, .csv file, etc.) according to your own folder paths.
- Also you can add other Bangla stop words.
This project is licensed under the MIT License. Feel free to use, modify, and distribute this project.
For inquiries or support, feel free to reach out:
Authors:
Wasif Asad Alvi
Md. Tamjid Hossain
Rifat Talukdar
Md. Tanziul Haque
Email: [wasifasad35@gmail.com]
GitHub: WasifAsad