Creating pipe functions with variable pass-through

I was initially planning to write a basic pipe operator tutorial explaining the base R and magrittr pipe operators, but I found an article written about them by Hadley Wickham that is better than what I would have written. So, if you are interested in that, you can check out the article here: https://www.tidyverse.org/blog/2023/04/base-vs-magrittr-pipe/. Instead, I became interested in making functions that describe data being fed through a pipe stream without changing it. Let me start with an example to illustrate what I mean. Let's start with a simple pipe, where the rnorm() function on the left is fed into the first argument of the mean() function on the right.

X <- rnorm(10) |> mean()

The value of X will be the mean of rnorm(), which means I will not have the original data to work with in subsequent operations. Alternatively, I could assign rnorm() to its own variable and keep running descriptive functions on that assigned variable.

X <- rnorm(10)

M <- X |> mean()

S <- X |> sd()

It can become tedious to do this, after a few lines, as what we really want to do is get information about the X variable above without changing it. Luckily, it is rather easy to define functions that do this by simply having it print what we want to know and return the original input using the return() function. I have done this for a few descriptive functions below as examples of what can be done.

# Individual Descriptive Statistics pipe pass-through

mean_pipe <- function(x){

print(paste("The mean is: ",mean(x)))

return(x)}

median_pipe <- function(x){

print(paste("The median is: ",median(x)))

return(x)}

sd_pipe <- function(x){

print(paste("The sd is: ",sd(x)))

return(x)}

# Multiple Descriptive Statistics pipes

quantile_pipe <- function(x){

print(quantile(x))

return(x)}

summary_pipe <- function(x){

print(summary(x))

return(x)}

# Distributional graphs pass-through

hist_pipe <- function(x){

hist(x)

return(x)}

qqnorm_pipe <- function(x){

qqnorm(x)

abline(0,1)

return(x)}

Now, it is possible to describe the data in countless ways all in a single string of commands, since the data remains unchanged through the pipe (with the exception of sort() which does not impact the remaining functions).

X <- rnorm(100) |>

sort() |>

mean_pipe() |>

median_pipe() |>

sd_pipe() |>

summary_pipe() |>

hist_pipe() |>

qqnorm_pipe()

I acknowledge that this functionality can also be achieved using the %T>% pipe in the magrittr package. In the code below, I have replaced the mean_pipe() function with this solution. The operator before the mean command is the %T>% pipe, and I needed to wrap what I wanted in { } using the “.” to indicate where I wanted the output of the previous sort function to be placed. This is also a perfectly reasonable solution, although I feel that it makes the code more difficult to read and modify compared to making a dedicated function.

library(magrittr)

X <- rnorm(100) |>

sort() %T>%

{print(paste("The mean is: ",mean(.)))} |>

median_pipe()…

If this seems too unwieldy, you can even package all of these commands into a single function that gives all the information you want to know in a single line.

Pipe_Info <- function(x){

x |>

mean_pipe() |>

median_pipe() |>

sd_pipe() |>

summary_pipe() |>

hist_pipe() |>

qqnorm_pipe()

}

# The function in use

X2 <- rnorm(100) |> Pipe_Info()

An obvious drawback is that if we want to save a value such as the mean of the variable, these functions as written do not allow that. We can use the assign() function to correct this by modifying the functions above to create the variables we need. I also changed the print function to make what it prints more meaningful based on the “name” argument specified.

# Mean and Standard deviation assignment Pipes

mean_assign_pipe <- function(x, name = "mean"){

M <- mean(x)

print(paste("The value of",name,"is:",round(M,2)))

assign(name, M, envir = .GlobalEnv)

return(x)}

sd_assign_pipe <- function(x, name = "sd"){

S <- sd(x)

print(paste("The value of",name,"is:",round(S,2)))

assign(name, S, envir = .GlobalEnv)

return(x)}

Now, when we use these modified functions, the values are printed to the console, and variables are created based on what is provided in the “name” arguments. The code below will create a variable “X3” with the original data, a variable “mean” with the mean of the data, and a variable “sd” with the standard deviation of the data.

X3 <- rnorm(10) |>

mean_assign_pipe() |>

sd_assign_pipe()

This can even be expanded to automatically loop through the columns of a data set a create variables for the means and standard deviations of the columns (although I recognize the apply family of functions are probably a better idea). Below I have a loop that takes each numeric column of the “iris” data set and assigns meaningfully named variables for their means and standard deviations.

data(iris)

for(i in 1:4){

iris[,i] |>

mean_assign_pipe(name = paste(names(iris)[i],".Mean",sep="")) |>

sd_assign_pipe(name = paste(names(iris)[i],".Sd",sep=""))

}

To be sure, this will not revolutionize the world of R programming, but you may find an application where this coding practice makes sense as pipe operators are integrated with base R. Of course, another option is to use Tidyverse functions that would probably avoid this solution altogether.

Previous
Previous

Performing T-Tests in R

Next
Next

Naive Bayes Classification in R