Skip to content

vyasdeepti/R-Programming-with-Visualizations

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Visitor Count

R Programming with Visualizations 🎨

Welcome to the R Programming with Visualizations repository! This project provides a comprehensive guide to using R for data analysis, focusing on creating impactful visualizations. Whether you're a beginner or an experienced R user, you'll find practical examples and best practices.

Key features of the repository:

  • 🔹Introduces R programming and its strengths in statistical computing and graphics.
  • 🔹Covers prerequisites and setup instructions for R, RStudio, and essential R packages.
  • 🔹Provides step-by-step guides for data preparation, including cleaning and transformation using packages like dplyr and tidyr.
  • 🔹Demonstrates basic and advanced visualization techniques, such as scatter plots, bar charts, histograms, boxplots, heatmaps, and time series plots, primarily with ggplot2.
  • 🔹Offers guidance on customizing plots with themes, colors, and annotations.
  • 🔹Explains how to create interactive visualizations using Plotly and Shiny, including sample code for building interactive web apps.
  • 🔹Shares best practices for effective visualization and communication of data insights.
  • 🔹Includes sample projects (e.g., EDA on the Iris dataset, COVID-19 time series visualization, and interactive dashboards) and references for further learning.

The repository is well-structured, making it a valuable resource for learning and applying R programming concepts related to data visualization.

Table of Contents 📝

  1. 📚 Introduction
  2. 📚 Getting Started
  3. 📈 Data Preparation
  4. 📊 Basic Visualizations
  5. 🎨 Advanced Visualizations
  6. 🧩 Customizing Plots
  7. 🎨 Interactive Visualizations
  8. 📊 Statistical Visualizations
  9. 📦 Best Practices
  10. Sample Projects
  11. 🔗 References

Introduction to R Programming

🚀 R is a powerful language for statistical computing and graphics. Visualization is a crucial aspect of data analysis, aiding in understanding complex data and communicating results effectively.

✨ R is a powerful, open-source programming language and environment specifically designed for statistical computing and data analysis. Widely used by statisticians, data scientists, and researchers worldwide.

✨ With a syntax that is easy to learn and an extensive collection of packages, R enables users to perform tasks ranging from simple data summaries to complex modeling and machine learning. Its visualization capabilities are unmatched in the data science ecosystem.


Getting Started

Prerequisites

Installation

Install the core packages for visualization:

install.packages(c("ggplot2", "plotly", "dplyr", "tidyr", "readr"))

For interactive and advanced plots, install:

install.packages(c("shiny", "leaflet", "DT", "reshape2", "viridis"))

For statistical visualizations, install:

install.packages(c("ggstatsplot", "correlation", "bayesplot", "tidyverse"))

Quick Start Example

# Load libraries
library(ggplot2)
library(dplyr)

# Create sample data
df <- data.frame(
  x = rnorm(100),
  y = rnorm(100),
  group = rep(c("A", "B", "C", "D"), 25)
)

# Quick visualization
ggplot(df, aes(x = x, y = y, color = group)) +
  geom_point(size = 3, alpha = 0.7) +
  theme_minimal() +
  ggtitle("Quick Start Visualization")

Data Preparation

Proper data preparation is essential for effective visualization.

Basic Data Cleaning

library(dplyr)
library(tidyr)

# Read data
data <- read.csv("data/sample_data.csv")

# Basic cleaning
clean_data <- data %>%
  filter(!is.na(Value)) %>%
  mutate(Category = as.factor(Category)) %>%
  arrange(Date)

Advanced Data Manipulation

# Group and summarize
summary_stats <- data %>%
  group_by(Category) %>%
  summarise(
    mean_value = mean(Value, na.rm = TRUE),
    median_value = median(Value, na.rm = TRUE),
    sd_value = sd(Value, na.rm = TRUE),
    count = n(),
    .groups = 'drop'
  )

# Handle missing values
data_imputed <- data %>%
  replace_na(list(Value = mean(data$Value, na.rm = TRUE))) %>%
  mutate(across(where(is.character), ~replace_na(., "Unknown")))

# Pivot data for different formats
wide_format <- data %>%
  pivot_wider(
    names_from = Category,
    values_from = Value,
    values_fill = 0
  )

long_format <- data %>%
  pivot_longer(
    cols = -Date,
    names_to = "variable",
    values_to = "value"
  )

Data Exploration

# Summary statistics
head(clean_data)
summary(clean_data)
str(clean_data)

# Correlation analysis
library(correlation)
cor_matrix <- correlation::correlation(clean_data)
print(cor_matrix)

Basic Visualizations

Scatter Plot

library(ggplot2)

# Basic scatter plot
ggplot(clean_data, aes(x = Variable1, y = Variable2, color = Category)) +
  geom_point(size = 3, alpha = 0.6) +
  labs(title = "Scatter Plot Example", x = "Variable 1", y = "Variable 2") +
  theme_minimal()

# Scatter plot with trend line
ggplot(clean_data, aes(x = Variable1, y = Variable2)) +
  geom_point(aes(color = Category), size = 3, alpha = 0.6) +
  geom_smooth(method = "lm", se = TRUE, color = "blue") +
  facet_wrap(~Category) +
  labs(title = "Scatter Plot with Trend Lines")

Bar Chart

# Simple bar chart
ggplot(clean_data, aes(x = Category, fill = Category)) +
  geom_bar() +
  labs(title = "Bar Chart Example", x = "Category", y = "Count") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

# Grouped bar chart
ggplot(clean_data, aes(x = Category1, fill = Category2)) +
  geom_bar(position = "dodge") +
  labs(title = "Grouped Bar Chart", x = "Category 1", y = "Count") +
  scale_fill_brewer(palette = "Set2")

# Stacked bar chart
ggplot(clean_data, aes(x = Category1, fill = Category2)) +
  geom_bar(position = "stack") +
  labs(title = "Stacked Bar Chart", y = "Count")

# Count summary bar chart
summary_df <- clean_data %>%
  count(Category, sort = TRUE)

ggplot(summary_df, aes(x = reorder(Category, -n), y = n, fill = Category)) +
  geom_col() +
  geom_text(aes(label = n), vjust = -0.5) +
  labs(title = "Category Counts", x = "Category", y = "Count")

Histogram

# Basic histogram
ggplot(clean_data, aes(x = Value)) +
  geom_histogram(bins = 30, fill = "blue", color = "black", alpha = 0.7) +
  labs(title = "Histogram Example", x = "Value", y = "Frequency") +
  theme_minimal()

# Histogram with density curve
ggplot(clean_data, aes(x = Value)) +
  geom_histogram(aes(y = after_stat(density)), bins = 30, fill = "blue", alpha = 0.6) +
  geom_density(color = "red", size = 1) +
  labs(title = "Histogram with Density Curve")

# Overlapped histograms by group
ggplot(clean_data, aes(x = Value, fill = Category)) +
  geom_histogram(bins = 25, alpha = 0.5, position = "identity") +
  labs(title = "Overlapped Histograms by Category") +
  scale_fill_brewer(palette = "Set1")

# Faceted histograms
ggplot(clean_data, aes(x = Value)) +
  geom_histogram(bins = 20, fill = "steelblue", color = "black") +
  facet_wrap(~Category) +
  labs(title = "Histograms by Category")

Line Plot

# Basic line plot
ggplot(clean_data, aes(x = Date, y = Value, color = Category)) +
  geom_line(size = 1) +
  geom_point(size = 2) +
  labs(title = "Line Plot Example", x = "Date", y = "Value") +
  theme_minimal()

# Multiple line plots with different styles
ggplot(clean_data, aes(x = Date, y = Value, color = Category, linetype = Category)) +
  geom_line(size = 1) +
  scale_linetype_manual(values = c("solid", "dashed", "dotted")) +
  labs(title = "Multi-Category Line Plot")

Advanced Visualizations

Boxplot

# Basic boxplot
ggplot(clean_data, aes(x = Category, y = Value, fill = Category)) +
  geom_boxplot(alpha = 0.7) +
  labs(title = "Boxplot Example", x = "Category", y = "Value") +
  theme_minimal()

# Boxplot with individual points
ggplot(clean_data, aes(x = Category, y = Value, fill = Category)) +
  geom_boxplot(alpha = 0.6) +
  geom_jitter(width = 0.2, alpha = 0.3) +
  labs(title = "Boxplot with Data Points") +
  scale_fill_brewer(palette = "Set2")

# Violin plot (similar to boxplot but shows distribution)
ggplot(clean_data, aes(x = Category, y = Value, fill = Category)) +
  geom_violin(alpha = 0.7) +
  geom_boxplot(width = 0.2, alpha = 0.9) +
  labs(title = "Violin Plot with Boxplot")

Heatmap

library(reshape2)

# Create matrix data
matrix_data <- clean_data %>%
  group_by(Category1, Category2) %>%
  summarise(Value = mean(Value, na.rm = TRUE), .groups = 'drop') %>%
  pivot_wider(names_from = Category2, values_from = Value, values_fill = 0)

# Basic heatmap
heatmap_matrix <- as.matrix(matrix_data[,-1])
rownames(heatmap_matrix) <- matrix_data$Category1
heatmap(heatmap_matrix, Rowv = NA, Colv = NA, scale = "column", 
        col = viridis::viridis(100))

# ggplot2 heatmap (more customizable)
library(viridis)

heatmap_df <- clean_data %>%
  group_by(Category1, Category2) %>%
  summarise(Value = mean(Value, na.rm = TRUE), .groups = 'drop')

ggplot(heatmap_df, aes(x = Category2, y = Category1, fill = Value)) +
  geom_tile() +
  scale_fill_viridis_c() +
  labs(title = "Heatmap Visualization") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Time Series Plot

# Basic time series
ggplot(clean_data, aes(x = Date, y = Value)) +
  geom_line(color = "darkgreen", size = 1) +
  geom_point(color = "darkgreen", size = 2) +
  labs(title = "Time Series Plot", x = "Date", y = "Value") +
  theme_minimal()

# Time series with moving average
library(zoo)

clean_data <- clean_data %>%
  arrange(Date) %>%
  mutate(MA_7 = rollmean(Value, k = 7, fill = NA))

ggplot(clean_data, aes(x = Date)) +
  geom_line(aes(y = Value, color = "Original"), alpha = 0.5) +
  geom_line(aes(y = MA_7, color = "7-Day MA"), size = 1) +
  scale_color_manual(values = c("Original" = "gray", "7-Day MA" = "blue")) +
  labs(title = "Time Series with Moving Average")

# Time series with confidence interval
ggplot(clean_data, aes(x = Date, y = Value)) +
  geom_ribbon(aes(ymin = Value - sd(Value), ymax = Value + sd(Value)), 
              alpha = 0.3, fill = "blue") +
  geom_line(color = "darkblue", size = 1) +
  labs(title = "Time Series with Confidence Band")

Faceted Plots

# Facet wrap
ggplot(clean_data, aes(x = Variable1, y = Variable2, color = Subcategory)) +
  geom_point(size = 2) +
  facet_wrap(~Category, scales = "free") +
  labs(title = "Faceted Scatter Plots") +
  theme_minimal()

# Facet grid
ggplot(clean_data, aes(x = Date, y = Value)) +
  geom_line() +
  facet_grid(Category1 ~ Category2) +
  labs(title = "Grid Faceted Time Series") +
  theme_minimal()

Density Plot

# Basic density plot
ggplot(clean_data, aes(x = Value, fill = Category)) +
  geom_density(alpha = 0.5) +
  labs(title = "Density Plot by Category") +
  theme_minimal()

# 2D density plot
ggplot(clean_data, aes(x = Variable1, y = Variable2)) +
  geom_density_2d() +
  labs(title = "2D Density Plot") +
  theme_minimal()

# 2D density with fill
ggplot(clean_data, aes(x = Variable1, y = Variable2)) +
  stat_density_2d(aes(fill = after_stat(level)), geom = "polygon") +
  scale_fill_viridis_c() +
  labs(title = "2D Density Plot with Color")

Customizing Plots

Themes and Styling

# Apply different themes
base_plot <- ggplot(clean_data, aes(x = Variable1, y = Variable2, color = Category)) +
  geom_point(size = 3)

# Minimal theme
base_plot + theme_minimal()

# Black and white theme
base_plot + theme_bw()

# Classic theme
base_plot + theme_classic()

# Dark theme
base_plot + theme_dark()

# Custom theme
custom_theme <- function() {
  theme(
    plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
    plot.subtitle = element_text(size = 12, hjust = 0.5, color = "gray40"),
    axis.title = element_text(size = 12, face = "bold"),
    axis.text = element_text(size = 10),
    legend.position = "right",
    panel.background = element_rect(fill = "white"),
    panel.grid.major = element_line(color = "gray90"),
    panel.grid.minor = element_blank()
  )
}

base_plot + custom_theme()

Color Palettes

# Using built-in palettes
ggplot(clean_data, aes(x = Category, fill = Category)) +
  geom_bar() +
  scale_fill_brewer(palette = "Set1") +
  labs(title = "Brewer Palette - Set1")

ggplot(clean_data, aes(x = Category, fill = Category)) +
  geom_bar() +
  scale_fill_brewer(palette = "Pastel2") +
  labs(title = "Brewer Palette - Pastel2")

# Viridis palette
ggplot(clean_data, aes(x = Variable1, y = Variable2, color = Value)) +
  geom_point(size = 3) +
  scale_color_viridis_c() +
  labs(title = "Viridis Color Scale")

# Custom color palette
custom_colors <- c("A" = "#FF6B6B", "B" = "#4ECDC4", "C" = "#45B7D1", "D" = "#FFA07A")

ggplot(clean_data, aes(x = Category, fill = Category)) +
  geom_bar() +
  scale_fill_manual(values = custom_colors) +
  labs(title = "Custom Color Palette")

Labels and Annotations

# Add text annotations
ggplot(clean_data, aes(x = Variable1, y = Variable2)) +
  geom_point(aes(color = Category), size = 3) +
  geom_text(aes(label = ID), nudge_y = 0.05, size = 3) +
  labs(title = "Plot with Text Labels")

# Add statistical annotations
mean_val <- mean(clean_data$Value)

ggplot(clean_data, aes(x = Date, y = Value)) +
  geom_line() +
  geom_hline(yintercept = mean_val, linetype = "dashed", color = "red") +
  annotate("text", x = min(clean_data$Date), y = mean_val + 5, 
           label = paste("Mean:", round(mean_val, 2)), hjust = 0) +
  labs(title = "Time Series with Annotations")

# Add shaded regions
ggplot(clean_data, aes(x = Date, y = Value)) +
  geom_rect(aes(xmin = as.Date("2023-01-01"), xmax = as.Date("2023-03-31"),
                ymin = -Inf, ymax = Inf), fill = "yellow", alpha = 0.2) +
  geom_line() +
  labs(title = "Plot with Highlighted Region")

Advanced Customization

# Detailed customization
ggplot(clean_data, aes(x = Variable1, y = Variable2)) +
  geom_point(aes(color = Category, size = Value), alpha = 0.6) +
  scale_size_continuous(range = c(2, 8)) +
  scale_x_continuous(breaks = seq(0, 100, by = 20), limits = c(0, 100)) +
  scale_y_continuous(trans = "log10") +
  labs(
    title = "Advanced Customization",
    subtitle = "Multiple customization techniques",
    x = "Variable 1 (custom scale)",
    y = "Variable 2 (log scale)",
    color = "Category",
    size = "Value"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 14, face = "bold"),
    plot.subtitle = element_text(size = 10, color = "gray50"),
    legend.position = "bottom",
    legend.box = "horizontal"
  )

Interactive Visualizations

Plotly

library(plotly)

# Convert ggplot to interactive
p <- ggplot(clean_data, aes(x = Variable1, y = Variable2, color = Category)) +
  geom_point(size = 3)
ggplotly(p)

# Direct plotly creation
plot_ly(clean_data, x = ~Variable1, y = ~Variable2, color = ~Category,
        type = 'scatter', mode = 'markers',
        marker = list(size = 8)) %>%
  layout(title = "Interactive Scatter Plot",
         xaxis = list(title = "Variable 1"),
         yaxis = list(title = "Variable 2"))

# Interactive line plot with hover info
plot_ly(clean_data, x = ~Date, y = ~Value, color = ~Category,
        type = 'scatter', mode = 'lines+markers',
        hovertemplate = '<b>%{fullData.name}</b><br>Date: %{x}<br>Value: %{y}<extra></extra>') %>%
  layout(title = "Interactive Time Series",
         xaxis = list(title = "Date"),
         yaxis = list(title = "Value"))

# Interactive bar chart
plot_ly(data = summary_stats, x = ~Category, y = ~mean_value, type = 'bar',
        marker = list(color = ~mean_value, colorscale = 'Viridis')) %>%
  layout(title = "Interactive Bar Chart",
         yaxis = list(title = "Mean Value"))

Shiny Apps

library(shiny)

# Basic Shiny app
ui <- fluidPage(
  titlePanel("Interactive Data Explorer"),
  sidebarLayout(
    sidebarPanel(
      sliderInput("bins", "Number of bins:", min = 5, max = 50, value = 30),
      selectInput("category", "Select Category:", 
                  choices = unique(clean_data$Category)),
      br(),
      actionButton("reset", "Reset Filters")
    ),
    mainPanel(
      tabsetPanel(
        tabPanel("Histogram", plotOutput("histPlot")),
        tabPanel("Summary", tableOutput("summary"))
      )
    )
  )
)

server <- function(input, output, session) {
  filtered_data <- reactive({
    if (input$category == "All") {
      clean_data
    } else {
      clean_data %>% filter(Category == input$category)
    }
  })
  
  output$histPlot <- renderPlot({
    hist(filtered_data()$Value, 
         breaks = input$bins, 
         col = 'skyblue', 
         border = 'white',
         main = paste("Histogram -", input$category),
         xlab = "Value")
  })
  
  output$summary <- renderTable({
    filtered_data() %>%
      summarise(
        Mean = mean(Value, na.rm = TRUE),
        Median = median(Value, na.rm = TRUE),
        SD = sd(Value, na.rm = TRUE),
        Count = n()
      )
  })
  
  observeEvent(input$reset, {
    updateSliderInput(session, "bins", value = 30)
    updateSelectInput(session, "category", selected = "All")
  })
}

shinyApp(ui = ui, server = server)

# Advanced Shiny with Multiple Visualizations
advanced_ui <- fluidPage(
  theme = shinytheme::shinytheme("flatly"),
  
  titlePanel("Advanced Data Dashboard"),
  
  sidebarLayout(
    sidebarPanel(
      dateRangeInput("daterange", "Date range:", 
                     start = min(clean_data$Date),
                     end = max(clean_data$Date)),
      checkboxGroupInput("categories", "Categories:",
                         choices = unique(clean_data$Category),
                         selected = unique(clean_data$Category)),
      hr(),
      downloadButton("downloadData", "Download Data")
    ),
    
    mainPanel(
      tabsetPanel(
        tabPanel("Overview",
                 h4("Key Statistics"),
                 tableOutput("stats")),
        tabPanel("Visualizations",
                 plotOutput("scatterPlot"),
                 plotOutput("timeSeriesPlot")),
        tabPanel("Data Table",
                 DT::dataTableOutput("dataTable"))
      )
    )
  )
)

advanced_server <- function(input, output, session) {
  filtered <- reactive({
    clean_data %>%
      filter(Date >= input$daterange[1],
             Date <= input$daterange[2],
             Category %in% input$categories)
  })
  
  output$stats <- renderTable({
    filtered() %>%
      group_by(Category) %>%
      summarise(Mean = mean(Value), SD = sd(Value), Count = n())
  })
  
  output$scatterPlot <- renderPlot({
    ggplot(filtered(), aes(x = Variable1, y = Variable2, color = Category)) +
      geom_point(size = 3) +
      theme_minimal()
  })
  
  output$timeSeriesPlot <- renderPlot({
    ggplot(filtered(), aes(x = Date, y = Value, color = Category)) +
      geom_line() +
      facet_wrap(~Category) +
      theme_minimal()
  })
  
  output$dataTable <- DT::renderDataTable({
    filtered()
  })
  
  output$downloadData <- downloadHandler(
    filename = "filtered_data.csv",
    content = function(file) {
      write.csv(filtered(), file, row.names = FALSE)
    }
  )
}

shinyApp(ui = advanced_ui, server = advanced_server)

Statistical Visualizations

Statistical Plots with ggstatsplot

library(ggstatsplot)

# Summary statistics plot
ggstatsplot::ggbetweenstats(
  data = clean_data,
  x = Category,
  y = Value,
  title = "Between-group Comparisons"
)

# Correlation plot
ggstatsplot::ggscatterstats(
  data = clean_data,
  x = Variable1,
  y = Variable2,
  title = "Correlation Analysis"
)

# Distribution plot
ggstatsplot::gghistostats(
  data = clean_data,
  x = Value,
  title = "Distribution Analysis"
)

Confidence Intervals and Error Bars

# Summary with confidence intervals
summary_ci <- clean_data %>%
  group_by(Category) %>%
  summarise(
    mean = mean(Value),
    se = sd(Value) / sqrt(n()),
    ci_lower = mean - 1.96 * se,
    ci_upper = mean + 1.96 * se,
    .groups = 'drop'
  )

# Plot with error bars
ggplot(summary_ci, aes(x = Category, y = mean, fill = Category)) +
  geom_col() +
  geom_errorbar(aes(ymin = ci_lower, ymax = ci_upper), width = 0.2) +
  labs(title = "Mean with 95% Confidence Intervals") +
  theme_minimal()

Best Practices

  • ✅ Always clean and explore your data first using summary(), head(), and str().
  • ✅ Choose the right type of plot for your data and audience - consider data types and relationships.
  • ✅ Label axes and legends clearly with descriptive titles and units.
  • ✅ Use color and size strategically to highlight key information without overwhelming viewers.
  • ✅ Avoid clutter and unnecessary decorations - less is often more.
  • ✅ Test your visualizations with different screen sizes for responsiveness.
  • ✅ Document your code with comments explaining complex visualizations.
  • ✅ Use consistent color schemes across related plots for easier comparison.
  • ✅ Include data sources and creation dates in your visualizations.
  • ✅ Consider accessibility - use colorblind-friendly palettes when possible.

Sample Projects


References


Happy visualizing! 🎉 Feel free to explore, modify, and build upon these examples in your own projects.

About

R Programming with Visualizations is a comprehensive guide for using R to perform data analysis and create impactful visualizations. The repository covers R programming basics, data preparation, and a wide range of visualization techniques.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages