Welcome to the R Programming with Visualizations repository! This project provides a comprehensive guide to using R for data analysis, focusing on creating impactful visualizations. Whether you're a beginner or an experienced R user, you'll find practical examples and best practices.
Key features of the repository:
- 🔹Introduces R programming and its strengths in statistical computing and graphics.
- 🔹Covers prerequisites and setup instructions for R, RStudio, and essential R packages.
- 🔹Provides step-by-step guides for data preparation, including cleaning and transformation using packages like dplyr and tidyr.
- 🔹Demonstrates basic and advanced visualization techniques, such as scatter plots, bar charts, histograms, boxplots, heatmaps, and time series plots, primarily with ggplot2.
- 🔹Offers guidance on customizing plots with themes, colors, and annotations.
- 🔹Explains how to create interactive visualizations using Plotly and Shiny, including sample code for building interactive web apps.
- 🔹Shares best practices for effective visualization and communication of data insights.
- 🔹Includes sample projects (e.g., EDA on the Iris dataset, COVID-19 time series visualization, and interactive dashboards) and references for further learning.
The repository is well-structured, making it a valuable resource for learning and applying R programming concepts related to data visualization.
- 📚 Introduction
- 📚 Getting Started
- 📈 Data Preparation
- 📊 Basic Visualizations
- 🎨 Advanced Visualizations
- 🧩 Customizing Plots
- 🎨 Interactive Visualizations
- 📊 Statistical Visualizations
- 📦 Best Practices
- ✨ Sample Projects
- 🔗 References
🚀 R is a powerful language for statistical computing and graphics. Visualization is a crucial aspect of data analysis, aiding in understanding complex data and communicating results effectively.
✨ R is a powerful, open-source programming language and environment specifically designed for statistical computing and data analysis. Widely used by statisticians, data scientists, and researchers worldwide.
✨ With a syntax that is easy to learn and an extensive collection of packages, R enables users to perform tasks ranging from simple data summaries to complex modeling and machine learning. Its visualization capabilities are unmatched in the data science ecosystem.
- R (version 4.0 or higher recommended) Download link
- RStudio (optional but recommended) Download link
- Anaconda Navigator- R Studio Download link
- Basic understanding of R syntax
Install the core packages for visualization:
install.packages(c("ggplot2", "plotly", "dplyr", "tidyr", "readr"))For interactive and advanced plots, install:
install.packages(c("shiny", "leaflet", "DT", "reshape2", "viridis"))For statistical visualizations, install:
install.packages(c("ggstatsplot", "correlation", "bayesplot", "tidyverse"))# Load libraries
library(ggplot2)
library(dplyr)
# Create sample data
df <- data.frame(
x = rnorm(100),
y = rnorm(100),
group = rep(c("A", "B", "C", "D"), 25)
)
# Quick visualization
ggplot(df, aes(x = x, y = y, color = group)) +
geom_point(size = 3, alpha = 0.7) +
theme_minimal() +
ggtitle("Quick Start Visualization")Proper data preparation is essential for effective visualization.
library(dplyr)
library(tidyr)
# Read data
data <- read.csv("data/sample_data.csv")
# Basic cleaning
clean_data <- data %>%
filter(!is.na(Value)) %>%
mutate(Category = as.factor(Category)) %>%
arrange(Date)# Group and summarize
summary_stats <- data %>%
group_by(Category) %>%
summarise(
mean_value = mean(Value, na.rm = TRUE),
median_value = median(Value, na.rm = TRUE),
sd_value = sd(Value, na.rm = TRUE),
count = n(),
.groups = 'drop'
)
# Handle missing values
data_imputed <- data %>%
replace_na(list(Value = mean(data$Value, na.rm = TRUE))) %>%
mutate(across(where(is.character), ~replace_na(., "Unknown")))
# Pivot data for different formats
wide_format <- data %>%
pivot_wider(
names_from = Category,
values_from = Value,
values_fill = 0
)
long_format <- data %>%
pivot_longer(
cols = -Date,
names_to = "variable",
values_to = "value"
)# Summary statistics
head(clean_data)
summary(clean_data)
str(clean_data)
# Correlation analysis
library(correlation)
cor_matrix <- correlation::correlation(clean_data)
print(cor_matrix)library(ggplot2)
# Basic scatter plot
ggplot(clean_data, aes(x = Variable1, y = Variable2, color = Category)) +
geom_point(size = 3, alpha = 0.6) +
labs(title = "Scatter Plot Example", x = "Variable 1", y = "Variable 2") +
theme_minimal()
# Scatter plot with trend line
ggplot(clean_data, aes(x = Variable1, y = Variable2)) +
geom_point(aes(color = Category), size = 3, alpha = 0.6) +
geom_smooth(method = "lm", se = TRUE, color = "blue") +
facet_wrap(~Category) +
labs(title = "Scatter Plot with Trend Lines")# Simple bar chart
ggplot(clean_data, aes(x = Category, fill = Category)) +
geom_bar() +
labs(title = "Bar Chart Example", x = "Category", y = "Count") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
# Grouped bar chart
ggplot(clean_data, aes(x = Category1, fill = Category2)) +
geom_bar(position = "dodge") +
labs(title = "Grouped Bar Chart", x = "Category 1", y = "Count") +
scale_fill_brewer(palette = "Set2")
# Stacked bar chart
ggplot(clean_data, aes(x = Category1, fill = Category2)) +
geom_bar(position = "stack") +
labs(title = "Stacked Bar Chart", y = "Count")
# Count summary bar chart
summary_df <- clean_data %>%
count(Category, sort = TRUE)
ggplot(summary_df, aes(x = reorder(Category, -n), y = n, fill = Category)) +
geom_col() +
geom_text(aes(label = n), vjust = -0.5) +
labs(title = "Category Counts", x = "Category", y = "Count")# Basic histogram
ggplot(clean_data, aes(x = Value)) +
geom_histogram(bins = 30, fill = "blue", color = "black", alpha = 0.7) +
labs(title = "Histogram Example", x = "Value", y = "Frequency") +
theme_minimal()
# Histogram with density curve
ggplot(clean_data, aes(x = Value)) +
geom_histogram(aes(y = after_stat(density)), bins = 30, fill = "blue", alpha = 0.6) +
geom_density(color = "red", size = 1) +
labs(title = "Histogram with Density Curve")
# Overlapped histograms by group
ggplot(clean_data, aes(x = Value, fill = Category)) +
geom_histogram(bins = 25, alpha = 0.5, position = "identity") +
labs(title = "Overlapped Histograms by Category") +
scale_fill_brewer(palette = "Set1")
# Faceted histograms
ggplot(clean_data, aes(x = Value)) +
geom_histogram(bins = 20, fill = "steelblue", color = "black") +
facet_wrap(~Category) +
labs(title = "Histograms by Category")# Basic line plot
ggplot(clean_data, aes(x = Date, y = Value, color = Category)) +
geom_line(size = 1) +
geom_point(size = 2) +
labs(title = "Line Plot Example", x = "Date", y = "Value") +
theme_minimal()
# Multiple line plots with different styles
ggplot(clean_data, aes(x = Date, y = Value, color = Category, linetype = Category)) +
geom_line(size = 1) +
scale_linetype_manual(values = c("solid", "dashed", "dotted")) +
labs(title = "Multi-Category Line Plot")# Basic boxplot
ggplot(clean_data, aes(x = Category, y = Value, fill = Category)) +
geom_boxplot(alpha = 0.7) +
labs(title = "Boxplot Example", x = "Category", y = "Value") +
theme_minimal()
# Boxplot with individual points
ggplot(clean_data, aes(x = Category, y = Value, fill = Category)) +
geom_boxplot(alpha = 0.6) +
geom_jitter(width = 0.2, alpha = 0.3) +
labs(title = "Boxplot with Data Points") +
scale_fill_brewer(palette = "Set2")
# Violin plot (similar to boxplot but shows distribution)
ggplot(clean_data, aes(x = Category, y = Value, fill = Category)) +
geom_violin(alpha = 0.7) +
geom_boxplot(width = 0.2, alpha = 0.9) +
labs(title = "Violin Plot with Boxplot")library(reshape2)
# Create matrix data
matrix_data <- clean_data %>%
group_by(Category1, Category2) %>%
summarise(Value = mean(Value, na.rm = TRUE), .groups = 'drop') %>%
pivot_wider(names_from = Category2, values_from = Value, values_fill = 0)
# Basic heatmap
heatmap_matrix <- as.matrix(matrix_data[,-1])
rownames(heatmap_matrix) <- matrix_data$Category1
heatmap(heatmap_matrix, Rowv = NA, Colv = NA, scale = "column",
col = viridis::viridis(100))
# ggplot2 heatmap (more customizable)
library(viridis)
heatmap_df <- clean_data %>%
group_by(Category1, Category2) %>%
summarise(Value = mean(Value, na.rm = TRUE), .groups = 'drop')
ggplot(heatmap_df, aes(x = Category2, y = Category1, fill = Value)) +
geom_tile() +
scale_fill_viridis_c() +
labs(title = "Heatmap Visualization") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))# Basic time series
ggplot(clean_data, aes(x = Date, y = Value)) +
geom_line(color = "darkgreen", size = 1) +
geom_point(color = "darkgreen", size = 2) +
labs(title = "Time Series Plot", x = "Date", y = "Value") +
theme_minimal()
# Time series with moving average
library(zoo)
clean_data <- clean_data %>%
arrange(Date) %>%
mutate(MA_7 = rollmean(Value, k = 7, fill = NA))
ggplot(clean_data, aes(x = Date)) +
geom_line(aes(y = Value, color = "Original"), alpha = 0.5) +
geom_line(aes(y = MA_7, color = "7-Day MA"), size = 1) +
scale_color_manual(values = c("Original" = "gray", "7-Day MA" = "blue")) +
labs(title = "Time Series with Moving Average")
# Time series with confidence interval
ggplot(clean_data, aes(x = Date, y = Value)) +
geom_ribbon(aes(ymin = Value - sd(Value), ymax = Value + sd(Value)),
alpha = 0.3, fill = "blue") +
geom_line(color = "darkblue", size = 1) +
labs(title = "Time Series with Confidence Band")# Facet wrap
ggplot(clean_data, aes(x = Variable1, y = Variable2, color = Subcategory)) +
geom_point(size = 2) +
facet_wrap(~Category, scales = "free") +
labs(title = "Faceted Scatter Plots") +
theme_minimal()
# Facet grid
ggplot(clean_data, aes(x = Date, y = Value)) +
geom_line() +
facet_grid(Category1 ~ Category2) +
labs(title = "Grid Faceted Time Series") +
theme_minimal()# Basic density plot
ggplot(clean_data, aes(x = Value, fill = Category)) +
geom_density(alpha = 0.5) +
labs(title = "Density Plot by Category") +
theme_minimal()
# 2D density plot
ggplot(clean_data, aes(x = Variable1, y = Variable2)) +
geom_density_2d() +
labs(title = "2D Density Plot") +
theme_minimal()
# 2D density with fill
ggplot(clean_data, aes(x = Variable1, y = Variable2)) +
stat_density_2d(aes(fill = after_stat(level)), geom = "polygon") +
scale_fill_viridis_c() +
labs(title = "2D Density Plot with Color")# Apply different themes
base_plot <- ggplot(clean_data, aes(x = Variable1, y = Variable2, color = Category)) +
geom_point(size = 3)
# Minimal theme
base_plot + theme_minimal()
# Black and white theme
base_plot + theme_bw()
# Classic theme
base_plot + theme_classic()
# Dark theme
base_plot + theme_dark()
# Custom theme
custom_theme <- function() {
theme(
plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
plot.subtitle = element_text(size = 12, hjust = 0.5, color = "gray40"),
axis.title = element_text(size = 12, face = "bold"),
axis.text = element_text(size = 10),
legend.position = "right",
panel.background = element_rect(fill = "white"),
panel.grid.major = element_line(color = "gray90"),
panel.grid.minor = element_blank()
)
}
base_plot + custom_theme()# Using built-in palettes
ggplot(clean_data, aes(x = Category, fill = Category)) +
geom_bar() +
scale_fill_brewer(palette = "Set1") +
labs(title = "Brewer Palette - Set1")
ggplot(clean_data, aes(x = Category, fill = Category)) +
geom_bar() +
scale_fill_brewer(palette = "Pastel2") +
labs(title = "Brewer Palette - Pastel2")
# Viridis palette
ggplot(clean_data, aes(x = Variable1, y = Variable2, color = Value)) +
geom_point(size = 3) +
scale_color_viridis_c() +
labs(title = "Viridis Color Scale")
# Custom color palette
custom_colors <- c("A" = "#FF6B6B", "B" = "#4ECDC4", "C" = "#45B7D1", "D" = "#FFA07A")
ggplot(clean_data, aes(x = Category, fill = Category)) +
geom_bar() +
scale_fill_manual(values = custom_colors) +
labs(title = "Custom Color Palette")# Add text annotations
ggplot(clean_data, aes(x = Variable1, y = Variable2)) +
geom_point(aes(color = Category), size = 3) +
geom_text(aes(label = ID), nudge_y = 0.05, size = 3) +
labs(title = "Plot with Text Labels")
# Add statistical annotations
mean_val <- mean(clean_data$Value)
ggplot(clean_data, aes(x = Date, y = Value)) +
geom_line() +
geom_hline(yintercept = mean_val, linetype = "dashed", color = "red") +
annotate("text", x = min(clean_data$Date), y = mean_val + 5,
label = paste("Mean:", round(mean_val, 2)), hjust = 0) +
labs(title = "Time Series with Annotations")
# Add shaded regions
ggplot(clean_data, aes(x = Date, y = Value)) +
geom_rect(aes(xmin = as.Date("2023-01-01"), xmax = as.Date("2023-03-31"),
ymin = -Inf, ymax = Inf), fill = "yellow", alpha = 0.2) +
geom_line() +
labs(title = "Plot with Highlighted Region")# Detailed customization
ggplot(clean_data, aes(x = Variable1, y = Variable2)) +
geom_point(aes(color = Category, size = Value), alpha = 0.6) +
scale_size_continuous(range = c(2, 8)) +
scale_x_continuous(breaks = seq(0, 100, by = 20), limits = c(0, 100)) +
scale_y_continuous(trans = "log10") +
labs(
title = "Advanced Customization",
subtitle = "Multiple customization techniques",
x = "Variable 1 (custom scale)",
y = "Variable 2 (log scale)",
color = "Category",
size = "Value"
) +
theme_minimal() +
theme(
plot.title = element_text(size = 14, face = "bold"),
plot.subtitle = element_text(size = 10, color = "gray50"),
legend.position = "bottom",
legend.box = "horizontal"
)library(plotly)
# Convert ggplot to interactive
p <- ggplot(clean_data, aes(x = Variable1, y = Variable2, color = Category)) +
geom_point(size = 3)
ggplotly(p)
# Direct plotly creation
plot_ly(clean_data, x = ~Variable1, y = ~Variable2, color = ~Category,
type = 'scatter', mode = 'markers',
marker = list(size = 8)) %>%
layout(title = "Interactive Scatter Plot",
xaxis = list(title = "Variable 1"),
yaxis = list(title = "Variable 2"))
# Interactive line plot with hover info
plot_ly(clean_data, x = ~Date, y = ~Value, color = ~Category,
type = 'scatter', mode = 'lines+markers',
hovertemplate = '<b>%{fullData.name}</b><br>Date: %{x}<br>Value: %{y}<extra></extra>') %>%
layout(title = "Interactive Time Series",
xaxis = list(title = "Date"),
yaxis = list(title = "Value"))
# Interactive bar chart
plot_ly(data = summary_stats, x = ~Category, y = ~mean_value, type = 'bar',
marker = list(color = ~mean_value, colorscale = 'Viridis')) %>%
layout(title = "Interactive Bar Chart",
yaxis = list(title = "Mean Value"))library(shiny)
# Basic Shiny app
ui <- fluidPage(
titlePanel("Interactive Data Explorer"),
sidebarLayout(
sidebarPanel(
sliderInput("bins", "Number of bins:", min = 5, max = 50, value = 30),
selectInput("category", "Select Category:",
choices = unique(clean_data$Category)),
br(),
actionButton("reset", "Reset Filters")
),
mainPanel(
tabsetPanel(
tabPanel("Histogram", plotOutput("histPlot")),
tabPanel("Summary", tableOutput("summary"))
)
)
)
)
server <- function(input, output, session) {
filtered_data <- reactive({
if (input$category == "All") {
clean_data
} else {
clean_data %>% filter(Category == input$category)
}
})
output$histPlot <- renderPlot({
hist(filtered_data()$Value,
breaks = input$bins,
col = 'skyblue',
border = 'white',
main = paste("Histogram -", input$category),
xlab = "Value")
})
output$summary <- renderTable({
filtered_data() %>%
summarise(
Mean = mean(Value, na.rm = TRUE),
Median = median(Value, na.rm = TRUE),
SD = sd(Value, na.rm = TRUE),
Count = n()
)
})
observeEvent(input$reset, {
updateSliderInput(session, "bins", value = 30)
updateSelectInput(session, "category", selected = "All")
})
}
shinyApp(ui = ui, server = server)
# Advanced Shiny with Multiple Visualizations
advanced_ui <- fluidPage(
theme = shinytheme::shinytheme("flatly"),
titlePanel("Advanced Data Dashboard"),
sidebarLayout(
sidebarPanel(
dateRangeInput("daterange", "Date range:",
start = min(clean_data$Date),
end = max(clean_data$Date)),
checkboxGroupInput("categories", "Categories:",
choices = unique(clean_data$Category),
selected = unique(clean_data$Category)),
hr(),
downloadButton("downloadData", "Download Data")
),
mainPanel(
tabsetPanel(
tabPanel("Overview",
h4("Key Statistics"),
tableOutput("stats")),
tabPanel("Visualizations",
plotOutput("scatterPlot"),
plotOutput("timeSeriesPlot")),
tabPanel("Data Table",
DT::dataTableOutput("dataTable"))
)
)
)
)
advanced_server <- function(input, output, session) {
filtered <- reactive({
clean_data %>%
filter(Date >= input$daterange[1],
Date <= input$daterange[2],
Category %in% input$categories)
})
output$stats <- renderTable({
filtered() %>%
group_by(Category) %>%
summarise(Mean = mean(Value), SD = sd(Value), Count = n())
})
output$scatterPlot <- renderPlot({
ggplot(filtered(), aes(x = Variable1, y = Variable2, color = Category)) +
geom_point(size = 3) +
theme_minimal()
})
output$timeSeriesPlot <- renderPlot({
ggplot(filtered(), aes(x = Date, y = Value, color = Category)) +
geom_line() +
facet_wrap(~Category) +
theme_minimal()
})
output$dataTable <- DT::renderDataTable({
filtered()
})
output$downloadData <- downloadHandler(
filename = "filtered_data.csv",
content = function(file) {
write.csv(filtered(), file, row.names = FALSE)
}
)
}
shinyApp(ui = advanced_ui, server = advanced_server)library(ggstatsplot)
# Summary statistics plot
ggstatsplot::ggbetweenstats(
data = clean_data,
x = Category,
y = Value,
title = "Between-group Comparisons"
)
# Correlation plot
ggstatsplot::ggscatterstats(
data = clean_data,
x = Variable1,
y = Variable2,
title = "Correlation Analysis"
)
# Distribution plot
ggstatsplot::gghistostats(
data = clean_data,
x = Value,
title = "Distribution Analysis"
)# Summary with confidence intervals
summary_ci <- clean_data %>%
group_by(Category) %>%
summarise(
mean = mean(Value),
se = sd(Value) / sqrt(n()),
ci_lower = mean - 1.96 * se,
ci_upper = mean + 1.96 * se,
.groups = 'drop'
)
# Plot with error bars
ggplot(summary_ci, aes(x = Category, y = mean, fill = Category)) +
geom_col() +
geom_errorbar(aes(ymin = ci_lower, ymax = ci_upper), width = 0.2) +
labs(title = "Mean with 95% Confidence Intervals") +
theme_minimal()- ✅ Always clean and explore your data first using
summary(),head(), andstr(). - ✅ Choose the right type of plot for your data and audience - consider data types and relationships.
- ✅ Label axes and legends clearly with descriptive titles and units.
- ✅ Use color and size strategically to highlight key information without overwhelming viewers.
- ✅ Avoid clutter and unnecessary decorations - less is often more.
- ✅ Test your visualizations with different screen sizes for responsiveness.
- ✅ Document your code with comments explaining complex visualizations.
- ✅ Use consistent color schemes across related plots for easier comparison.
- ✅ Include data sources and creation dates in your visualizations.
- ✅ Consider accessibility - use colorblind-friendly palettes when possible.
- Exploratory Data Analysis of Iris Dataset
- COVID-19 Time Series Visualization
- Interactive Dashboard using Shiny
- R for Data Science
- The ggplot2 Book
- R Graph Gallery
- Shiny Documentation
- Plotly R Documentation
- Data Visualization with R
Happy visualizing! 🎉 Feel free to explore, modify, and build upon these examples in your own projects.