## Hands-on Virtual Workshop on R for Data Science & Machine Learning

### We will cover the sections

Probability Distributions & Inferential Statistics

in the session Oct – Dec, 21 (Batch-2)

## Section Title: Probability Distributions

#### Day-1: Environment Setup

- Difference between R and RStudio
- What is R?
- What is RStudio?

- Why should you use RStudio?
- Uninstalling the older version of R
- Why should you install R before installing RStudio?
- Uninstalling the older version of RStudio
- Downloading R from https://www.r-project.org/
- Installing R
- Downloading RStudio from https://www.rstudio.com/
- Installing RStudio
- Introducing a cloud drive for uploading classwork & homework
- Solving learners problems
- Saving the classwork on Cloud Drive
- Collecting the classwork & homework links
- Assigning the handout of the day
- Recording the attendance of the learners

#### Day-2: An Introduction to the For Loop

- Understanding the for loop
- Vectors and variables in R
- General Concept of the For Loop
- Storing the values resulted from a for loop

- Collecting the Day-1 handout & Assigning the Day-2 handout
- Solving learners problems
- Checking the classwork/homework of the learners
- Taking the attendance of the learners

#### Day-3: Central limit theorem demonstration using R (Part-1)

- What is uniform distribution?
- Introduction to the Central Limit Theorem
- Creating a uniform distribution of 10000 numbers
- Creating a vector of containing a uniform distribution of 10000 numbers
- Creating a histogram for the 10000 numbers
- Calculating mean & standard deviation for the numbers
- Understanding standard deviation
- Understanding standard error of the mean
- Taking a specific number of samples from the population
- Calculating the mean of the samples
- Generating 1000 sample means using the “For Loop”
- Creating a histogram for the 1000 sample means
- Collecting the Day-2 handout & Assigning the Day-3 handout
- Solving learners problems
- Checking the classwork/homework of the learners
- Taking the attendance of the learners

#### Day-4: Central limit theorem demonstration using R (Part-2)

- Explaining the central limit theorem gradually increasing the sample size
- Plotting a 2 by 2 histogram for summerizing the whole thing
- Adding a density curve over the histogram
- Collecting the Day-3 handout & Assigning the Day-4 handout
- Solving learners problems
- Checking the classwork/homework of the learners
- Taking the attendance of the learners

#### Day-5: Some Important Concepts of a Normal Probability Distribution

- Some important characteristics of normal probability distribution curve
- How a normal distribution curve is defined by the mean and the standard deviation?
- Some more properties of a normal probability distribution
- Equation for the normal distribution curve
- What is Z score?
- Introducing the Z table for standard normal distribution
- Collecting the Day-4 handout & Assigning the Day-5 handout
- Solving learners problems
- Checking the classwork/homework of the learners
- Taking the attendance of the learners

#### Day-6: Four Important Functions for Normal Probability Distribution

- Understanding the function rnorm() and it’s R implementation
- Understanding the function pnorm() and it’s R implementation
- Understanding the function qnorm() and it’s R implementation
- Understanding the function dnorm() and it’s R implementation
- Plotting normal distribution using R functions
- Saving the classwork
- Collecting the Day-5 handout & Assigning the Day-6 handout
- Solving learners problems
- Checking the classwork/homework of the learners
- Taking the attendance of the learners

#### Day-7: Binomial Probability Distribution (Theory Part)

- Difference between continuous and discrete data
- Probability distribution for the discrete data
- Properties of binomial distribution
- Probability distribution of flipping two coins at the same time
- Formula for binomial distribution
- What is the probability of getting 1 head when you flip a coin 10 times?

- Collecting the Day-6 handout & Assigning the Day-7 handout
- Solving learners problems
- Checking the classwork/homework of the learners
- Taking the attendance of the learners

#### Day-8: Introducing the “Visualize” Package for the Normal Probability Distribution

- What is a R package?
- Installation of the “visualize” package
- Importing a package
- Understanding the “visualize” package for the standard normal distribution
- Understanding the “section” argument for the standard normal distribution
- Understanding the the visualize package for a different normal distribution
- Understanding the “section” argument for a different normal distribution
- Collecting the Day-7 handout & Assigning the Day-8 handout
- Solving learners problems
- Checking the classwork/homework of the learners
- Taking the attendance of the learners

#### Day-9: Four Important Functions for Binomial Probability Distribution

- Understanding the function rbinom() and it’s R implementation
- Understanding the function pbinom() and it’s R implementation
- Collecting the Day-8 handout & Assigning the Day-9 handout
- Solving learners problems
- Checking the classwork/homework of the learners
- Taking the attendance of the learners

*Next Topics*

- Binomial Distribution in R
- Poisson Distribution in R
- t Distribution and t Scores in R

## Section Title: Inferential Statistics

- One-Sample t-Test in R
- Two-Sample t-Test in R
- Mann Whitney U aka Wilcoxon Rank-Sum Test in R
- Bootstrap Hypothesis Testing in R
- Bootstrap Confidence Interval with R
- Permutation Hypothesis Test in R
- Paired t-Test in R
- Wilcoxon Signed-Rank Test in R
- ANOVA, Multiple Comparisons & Kruskal Wallis in R
- Chi-Square Test, Fishers Exact Test, and Cross Tabulations in R
- Calculate Odds Ratio and Relative Risk in R
- Correlation and Covariance in R

## Section Title: R Programming Basics

*Day – 1: Environment Setup*

- Difference between R and RStudio
- What is R?
- What is RStudio?

- Why should you use RStudio?
- Uninstalling the older version of R
- Why should you install R before installing RStudio?
- Uninstalling the older version of RStudio
- Downloading R from https://www.r-project.org/
- Installing R
- Downloading RStudio from https://www.rstudio.com/
- Installing RStudio
- Handout issue
- Uploading the classwork on Cloud Drive

*Day – 2: Basic Arithmetic Functions & Coding*

- Assigning a value to an object in R
- Printing a value of an object in R
- Case sensitivity
- Overwriting a value to an object
- R workspace memory
- Observing the list of the objects using the ls() command
- Removing an object from the workspace memory using the rm() command
- Object naming rules: https://www.w3schools.com/r/r_variables.asp
- Assigning character values to objects
- Performing arithmetic operations in R: addition, subtraction, multiplication, division, square, square root, log, exponent, log of the other bases
- Calculating absolute value using abs() command
- Incomplete commands in R
- Accessing the previously entered commands using the Arrow Keys
- Writing notes or comments in R
- Uploading the classwork on Cloud Drive
- Assigning the Day-2 handout
- Solving learners problems

*Day – 3: Creating Vectors, Matrices, & Performing Some Simple Operations on Them*

- Clearing the console of RStudio
- Creating a vector of numbers using the c() command
- Creating a vector of character elements using the c() command
- Creating a sequence of integer values
- Creating a sequence of integer/noninteger values using the seq() command
- Creating a vector of repeated numbers or characters using the rep() command
- Repeating a sequence of integer values multiple times
- Repeating a sequence of noninteger values multiple times
- Repeating a sequence of characters multiple times
- Adding/subtracting/multiplying/dividing a value to each element of a vector
- Extracting elements from a vector
- Creating a matrix of values
- Storing a matrix in an object
- Extracting elements from a matrix
- Adding/subtracting/multiplying/dividing a value to each element of a matrix
- Saving all the console inputs in a text file
- Uploading the classwork on Cloud Drive
- Assigning the Day-3 handout
- Solving learners problems

*Day – 4: Importing & Copying Data from Excel to R*

- Downloading the necessary resources
- Importing a CSV file to RStudio
- Using the read.csv() command
- Using the read.table() command

- Importing a tab-delimited text file to RStudio
- Using the read.delim() command
- Using the read.table() command

- Import/Read Data from Excel (both xls and xlsx formats) into R using RStudio (readxl package)
- Solving learners problems
- Uploading the classwork on Cloud Drive
- Assigning the Day-4 handout

*Day – 5: Checking the Imported Data & Working with Variables*

- Downloading the necessary resources
- Importing a dataset
- Understanding the dataset
- Checking the dimensions of the dataset using the dim() command
- Observing the first 6 rows using the head() command
- Observing the last 6 rows using the tail() command
- Observing the other rows of the dataset using square brackets
- Observing the variable names using the names() command
- Extracting a variable from a dataset
- Attaching a dataset in the workspace memory using the attach() command
- Unattaching a dataset using the detach() command
- Checking the variable type using the class() command
- Observing the categories of a variable using the levels() command
- Converting a character type variable to a factor type variable using the as.factor() command
- Observing the general summary of the dataset using the summary() command
- Changing the data type of a variable while importing a dataset
- Uploading the classwork on Cloud Drive
- Assigning the Day-5 handout

*Day – 6: Subsetting Data Based on Conditions & Logical Statements*

- Downloading the necessary resources
- Observing the number of observations in an object or variable using the length() command
- Subsetting data using square brackets for a single variable
- Subsetting data for a variable for other variables (Calculating mean age only for male or female)
- Creating an object for specific categories
- Creating an object subsetting data from two variables (Creating a data frame for the over 15 years old females/males)
- Creating a logical vector or variable
- Creating a logical vector or variable using the as.numeric() command
- Creating a logical vector for multiple conditions
- Attaching a logical vector in a column-wise fashion to the original dataset using the cbind() command
- Uploading the classwork on Cloud Drive
- Assigning the Day-6 handout

*Day – 7: Setting Up a Working Directory*

- Downloading the necessary resources
- Observing the current working directory using the getwd() command
- Changing the current working directory using the setwd() command
- Changing the current working directory from the RStudio menu
- Saving the current workspace using the save.image() command
- Saving the current workspace from the RStudio menu
- Clearing workspace from the RStudio menu
- Loading the previous workspace image using the load() command
- Loading the previous workspace image using the RStudio menu
- Uploading the classwork on Cloud Drive
- Assigning the Day-7 handout

*Day – 8: History, Scripts, & Installing Packages*

- Downloading the necessary resources
- Loading history from an existing file
- Sending commands from history to console
- Sending commands from history to script
- Removing the selected history entries
- Clearing all history entries
- Creating, opening, and saving R scripts
- Running commands from R script
- Installing a new package using the install.packages() command
- Removing a package using the remove.packages() command
- Installing/Removing packages from the RStudio menu
- Uploading the classwork on Cloud Drive
- Assigning the Day-8 handout
- Solving learners problems

*Day – 9: Customizing the Look of RStudio & Introducing Apply Function*

- Downloading the necessary resources
- Changing the default working directory
- Changing the appearance of RStudio
- Customizing the pane layout
- Changing the primary CRAN repository
- Introducing the apply function
- Uploading the classwork on Cloud Drive
- Assigning the Day-9 handout
- Solving learners problems

*Day – 10: More with APPLY Function*

- Downloading the necessary resources
- Calculating percentiles for each column
- Creating a plot of each column using a line
- Calculating the SUM of each row
- Calculating the SUM of each row using the rowSums() command
- Creating a plot against the market value of each day
- Adding some nice colored points to the plots
- Uploading the classwork on Cloud Drive
- Assigning the Day-10 handout
- Solving learners problems

*Day – 11: tapply() Function*

- Downloading the necessary resources
- How to use the tapply() function
- How to use tapply() function to subsets of a variable or vector
- Use of the simplify argument in tapply() function
- Including the summary function in the tapply() function
- Applying the quantile function in the tapply() function
- Passing a list of factors to the INDEX argument of the tapply() function
- Uploading the classwork on Cloud Drive
- Assigning the Day-11 handout
- Solving learners problems

*Day – 12: R Data Frames*

- Downloading the necessary resources
- Introduction to R Data Frames
- Learn the Basics of Data Frames in R
- Learn how to grab data from a Dataframe in R
- Get an overview of the variety of operations you can use on a Data Frame in R
- Data Frame Training Exercise
- Uploading the classwork on Cloud Drive
- Assigning the Day handout
- Solving learners problems

## Section Title: Data Visualization and Descriptive Statistics with R

*Day – 1: Making Barcharts & Piecharts*

- Downloading the necessary resources
- What is a bar chart?
- Creating a frequency table using the table() command
- Calculating the relative frequency of the categories of a categorical variable
- Producing a bar chart using the barplot() command
- Adding a title and labels to a bar chart
- Rotating the values of the y-axis using the las argument
- Changing the labels of the bars in a bar chart using the names.arg argument
- Rotating the bar chart horizontally using the horiz argument
- Producing a pie chart using the pie() command
- Adding a title to the pie chart using the main argument
- Adding a box around the pie chart using the box() command
- Uploading the classwork on Cloud Drive
- Assigning the Day-12 handout
- Solving learners problems

*Day – 2: Making Boxplots & Stratified Boxplots*

- Downloading the necessary resources
- Producing a boxplot using the boxplot() command
- Understanding the boxplot using the quantile() command
- Adding labels to the boxplot
- Setting limits to the y-axis of the boxplot using the ylim argument
- Rotating the values of the y-axis using the las command
- Comparing groups of a categorical variable using boxplots
- Producing a boxplot for one or more groups of a categorical variable
- Creating stratified boxplots
- Uploading the classwork on Cloud Drive
- Assigning the Day-13 handout
- Solving learners problems

*Day – 3: Histograms in R*

- Downloading the necessary resources
- Understanding the hist() command from the help menu
- Producing a histogram using the hist() command
- Changing the histogram plot from the default values
- Converting the y-axis from frequency to probability density
- An alternative way of converting the y-axis from frequency to probability density
- Changing the y-axis limit using the ylim argument
- Changing the x-axis limit using the xlim argument
- Changing the bin width using the breaks argument
- Specifying the breakpoints using the breaks argument
- Using the seq() command in the breaks argument
- Changing the axis labels and title of the histogram
- Rotating the y-axis labels using the las argument
- Adding a density curve over the histogram using the lines command
- Changing the color and thickness of the density curve using the col and lwd arguments respectively
- Uploading the classwork on Cloud Drive
- Assigning the Day-14 handout
- Solving learners problems

*Day – 4: Making Stacked Barcharts, Clustered Barcharts, & Scatterplots*

- Downloading the necessary resources
- Graphically examining the relationship between two categorical variables using barplots
- Producing a contingency table before producing a barchart using the table() command
- Producing a barplot using the created contingency table
- Transforming a stacked barchart to a clustered barchart using the beside argument
- Adding legends to a barplot using the legend.text argument
- Transforming legends from their defaults
- Changing the defaults of a barchart
- Running a Pearson Correlation for getting an idea about the strength of the linear relationship between two numeric variables
- Creating a scatter plot using the plot command
- Changing the defaults of a scatterplot
- Resizing the plotting dots from the defaults using the cex argument
- Changing the plotting characters using the pch argument
- Adding a linear regression line in a scatterplot
- Solving learners problems
- Uploading the classwork on Cloud Drive
- Assigning the Day-15 handout
- Taking learners attendance

*Day – 5: Producing Numeric Summaries for Specific Variables*

- Summarizing a categorical variable
- Creating a frequency table for a categorical variable using the table() command
- Observing the proportions of the categories of a categorical variable
- Producing a contingency table for two variables

- Summarizing a numeric variable
- Calculating arithmetic mean using the mean() command
- Calculating trim means using the trim argument
- Calculating the median using the median() command
- Calculating the variance of a variable using the var() command
- Calculating the standard deviation of a variable using the sd() command
- Calculating the minimum observation using the min() command
- Calculating the maximum observation using the max() command
- Calculating the range of a variable using the range command
- Calculating percentile using the quantile() command
- Calculating the summation of all the observed values of a variable using the sum() command
- Calculating the Pearson correlation coefficient using the cor() command
- Calculating Spearman correlation using the method argument in the cor() command

- Solving learners problems
- Uploading the classwork on Cloud Drive
- Assigning the Day-16 handout
- Taking learners attendance

*Day – 6: Customizing and Modifying the Look of Plots in RStudio (Part I)*

- Creating a simple scatter plot
- Changing the font and value labels size of a plot using the “cex” argument
- Changing the fonts of a plot using the “font” argument
- Changing colors on plots using the “col” argument
- Changing plotting character using the “pch” argument
- Adding a regression line to the scatter plot using the “abline” command
- Changing the regression lines color, type, and width
- Solving learners problems
- Uploading the classwork on Cloud Drive
- Assigning the Day-17 handout
- Taking learners attendance

*Day – 7: Customizing and Modifying the Look of Plots in RStudio (Part II)*

- Downloading the necessary practice files
- Identifying the category of a variable on the same plot using plotting characters & colors
- Creating separate plots on one screen in R
- Relabeling the axis of a plot in R
- Solving learners problems
- Uploading the classwork on Cloud Drive
- Assigning the Day-18 handout
- Taking learners attendance

*Day – 8: Adding Text to Plots & Modifying Text in R*

- Adding text to a plot using the text() command
- Controlling the text position in the x and y coordinates using the “adj” argument
- Changing the size, color, and font of the text
- Creating a horizontal line across the mean of the y-axis
- Adding text for the horizontal line
- Adding text to the margins of the plot
- Uploading the classwork on Cloud Drive
- Assigning the Day-19 handout
- Taking learners attendance

*Day – 9: Adding Legends to Plots in R*

- Adding legend to the plot using the legend() command
- Customizing legends using the pch argument
- Removing the box from the legend using the “bty” argument
- Adding legends for lines
- Changing the line types in the legend
- Uploading the classwork on Cloud Drive
- Assigning the handout
- Taking learners attendance

*Day – 10: Getting Started with Data Visualization in R with ggplot2 by creating scatterplots *

- Installing the necessary packages
- Importing the necessary libraries
- Observing the list of the existing R data sets
- Understanding the background of an existing dataset
- Creating a simple geometric canvas/map using ggplot
- The short process of creating a simple geometric canvas/map using ggplot
- Working with more example datasets for understanding ggplot
- Working with the “pipe” operator
- Uploading the classwork on Cloud Drive
- Assigning the handout
- Taking learners attendance

*Day – 11: Creating boxplots using ggplot2*

- Creating boxplots
- Adding points to the boxplots
- Changing the size and color of the points based on the values of other variables
- Changing the transparency of the points
- Flipping the orientation of the boxplots
- Producing separate boxplots based on a categorical variable
- Changing the theme of a boxplot
- Adding a title to the boxplot
- Uploading the classwork on Cloud Drive
- Assigning the handout
- Taking learners attendance

*Day – 12: Creating histograms using ggplot2*

- Installing the package “ggplot2movies”
- Introducing the Rstudio ggplot cheat sheet
- Importing the necessary libraries
- Viewing the “movies” dataset
- Creating an object for the main ggplot aesthetic
- Creating a histogram in the main ggplot aesthetic
- Setting specific bin width for the histogram
- Changing the color of the histogram’s bins’ border
- Changing the bins’ fill color of the histogram
- Changing the transparency of the fill color
- Changing the labels of the plot
- Adding a tittle to the plot
- Uploading the classwork on Cloud Drive
- Assigning the handout
- Taking learners attendance

## Section Title: Data Preprocessing in R

## Section Title: Regression

- Simple Linear Regression in R
- Checking Linear Regression Assumptions in R
- Multiple Linear Regression in R
- Changing Numeric Variable to Categorical in R
- Creating Dummy Variables or Indicator Variables in R
- Change Reference (Baseline) Category in Regression Model with R
- Including Variables/ Factors in Regression with R, Part I
- Including Variables/ Factors in Regression with R Part II
- Multiple Linear Regression with Interaction in R
- Interpreting Interaction in Linear Regression with R
- Partial F-Test for Variable Selection in Linear Regression with R
- Polynomial Regression in R
- Multiple Linear Regression in R
- Polynomial Regression in R
- Support Vector Regression in R
- Decision Tree Regression in R
- Random Forest Regression in R
- Evaluating Regression Models Performance
- Regression Model Selection in R

## Section Title: Classification

- Logistic Regression in R
- K-Nearest Neighbors in R
- Support Vector Machine in R
- Kernel SVM in R
- Naive Bayes in R
- Decision Tree Classification in R
- Random Forest Classification in R
- Evaluating Classification Models Performance

## Section Title: Clustering

- K-Means Clustering in R
- Hierarchical Clustering in R

## Section Title: Association Rule Learning

- Apriori in R
- Eclat in R

## Section Title: Reinforcement Learning

- Upper Confidence Bound in R
- Thompson Sampling

## Section Title: Natural Language Processing in R

## Section Title: Deep Learning

- Artificial Neural Networks in R
- Convolutional Neural Networks

## Section Title: Dimensionality Reduction

- Principal Component Analysis
- Linear Discriminant Analysis
- Kernel PCA

## Section Title: Model Selection & Boosting

- Model Selection
- XGBoost