Michael Harris Michael Harris

Regression with Cross-Validation in R

Cross-validation is a statistical method used to estimate the performance of a model on unseen data. It is widely used for model validation in both classification and regression problems. In this post, we will explore how to perform cross-validation for regression models in R using packages such as caret and glmnet.

Read More
Michael Harris Michael Harris

Stepwise Regression with BIC in R

Stepwise regression is a popular method used for selecting a subset of predictor variables by either adding or removing them from the model based on certain criteria. In this blog post, we will learn how to perform stepwise regression in R using the Bayesian Information Criterion (BIC) as the selection criterion.

Read More
Michael Harris Michael Harris

Solving Calculus Problems in R: Limits, Derivatives, and Integrals

Calculus is at the core of many scientific, engineering, and statistical problems. Fortunately, R, a powerful programming language for data analysis and computation, can also solve calculus problems like limits, derivatives, and integrals. In this post, we'll explore how to tackle these problems using R.

Read More
Michael Harris Michael Harris

Simulating Random Processes and Sampling in R: A Comprehensive Guide

One of R’s great strengths is its ability to simulate random processes and perform various types of sampling. Whether you're running Monte Carlo simulations, bootstrapping, or just generating random numbers, R provides powerful tools for these tasks. In this blog post, we'll dive into simulating random processes and how to perform sampling in R.

Read More
Michael Harris Michael Harris

Mastering Function Writing in R: A Guide to Creating Reusable Code

Functions are an essential part of programming in R, allowing you to create reusable code that can streamline workflows, simplify complex operations, and make your scripts more organized. In this blog post, we’ll walk through how to write functions in R, covering everything from the basic syntax to more advanced topics like using default arguments and return values.

Read More
Michael Harris Michael Harris

Neural Networks with R: Predictive Modeling using nnet and Regression Comparisons

Neural networks provide a powerful tool for predictive modeling, capable of capturing complex relationships in data. In this blog post, we will explore how to implement a neural network using the nnet package in R. We will build a neural network model to predict gender, education, and age based on personality test data. Furthermore, we will compare the neural network's performance to traditional regression models using Root Mean Squared Error (RMSE).

Read More
Michael Harris Michael Harris

Predictive Modeling with H2O: Comparing H2O AutoML and Traditional Regression

Predictive modeling is a core technique in data science, and using machine learning frameworks can greatly improve both the accuracy and speed of model development. In this blog post, we explore how to use the h2o package in R to automate the model building process with H2O's AutoML, and compare it with traditional regression models. We'll also calculate performance metrics using the Root Mean Squared Error (RMSE).

Read More
Michael Harris Michael Harris

Using R to Automate Markdown Code

I recently learned that Google requires affiliate links to be “nofollow” links. Compared to the often default “follow” links, “nofollow” links essentially remove the website endorsement typically implied by websites linking to each other.

Read More
Michael Harris Michael Harris

Missing Data Imputation for Machine Learning

I go over methods for data imputation for training machine learning models. These techniques are inappropriate for hypothesis testing because they do not account for the uncertainty in the imputed data. However, if you are training a neural network on thousands of rows of data, and have missingness, these methods could be a good solution.

Read More
Michael Harris Michael Harris

Visually Determining Normality in R

Much of what we do in statistics requires that the data we are using be normally distributed. This prolific assumption requires that we either visually inspect the data or use a hypothesis test. While hypothesis tests like the Shapiro-Wilk test offer a clear-cut decision, it is sometimes preferred to simply visually inspect the data.

Read More
Michael Harris Michael Harris

Performing T-Tests in R

A guide to performing many different types of t-tests including: one-sample, two-sample assuming equal variance, two-sample assuming unequal variance, and two-sample dependent measures.

Read More
Michael Harris Michael Harris

Naive Bayes Classification in R

Naive Bayes is a computationally simple, but incredibly effective method for classification. In this tutorial, I will show you how to run this model and determine the classification accuracy of the model.

Read More
Michael Harris Michael Harris

Regression by Sampling

We may encounter situations where we can not store the data set and calculations required for regression models in RAM. This article presents a technique where we can estimate regression coefficients by sampling from a data set and running smaller regression calculations.

Read More
Michael Harris Michael Harris

Scrambling the letters of a message with R

One of the most powerful aspects of R is that it has a diverse set of random number generators. We can use these R tools to create methods of obscuring a message in what appears to be meaningless strings of text (cryptography).

Read More
Michael Harris Michael Harris

Writing while loops in R

A while statement will run as long as the conditional statement it is given evaluates as true. Naturally, the basic setup then is to make a conditional statement that will change over time and eventually evaluate as false when it no longer needs to keep running.

Read More
Michael Harris Michael Harris

Using For Loops in R

One of the most initially confusing aspects of R is using loops to repeat a chunk of code. I hope you will find this tutorial helpful and start to understand the power loops give to our code.

Read More