Mastering Function Writing in R: A Guide to Creating Reusable Code

Sep 26

Functions are an essential part of programming in R, allowing you to create reusable code that can streamline workflows, simplify complex operations, and make your scripts more organized. In this blog post, we’ll walk through how to write functions in R, covering everything from the basic syntax to more advanced topics like using default arguments and return values.

1. Basic Syntax of a Function in R

At its core, a function in R is defined using the function keyword. The syntax is simple and looks like this:

my_function <- function(arguments){
    # Code that performs the task
    return(output) # Return the result
}

Let's break this down:

my_function: The name of the function you create. This can be anything descriptive of the task the function performs.
arguments: A list of inputs the function will take. These can be variables, data frames, vectors, etc.
return(output): The output or result that the function provides when it is called. This can be any R object.

2. Writing Your First Function

Let's write a simple function that calculates the mean of a vector, ignoring any missing values (NA).

# A simple function to calculate mean ignoring NAs
mean_ignore_na <- function(x){
    mean_value <- mean(x, na.rm = TRUE)  # Calculate the mean, ignoring NAs
    return(mean_value)                   # Return the result
}

Here’s how the function works:

x: The vector passed to the function.
mean_value: The function calculates the mean while ignoring any missing values by setting na.rm = TRUE.
The function returns the calculated mean using return(mean_value).

We can test the function by passing in a vector:

# Test the function
test_vector <- c(1, 2, NA, 4, 5)
mean_ignore_na(test_vector) # Output: 3

The function successfully calculates the mean while ignoring the NA value.

3. Adding Default Arguments

One of the great features of R functions is the ability to define default arguments. Let’s enhance our mean_ignore_na function by adding a default value for the na.rm argument, allowing the user to control whether missing values should be ignored or not.

# Adding a default argument
mean_ignore_na <- function(x, na.rm = TRUE){
    mean_value <- mean(x, na.rm = na.rm)  # Calculate the mean based on na.rm
    return(mean_value)                    # Return the result
}

Now, the user can decide whether to remove NA values or not by specifying na.rm:

# Test the function
mean_ignore_na(test_vector)      # Output: 3 (ignores NA by default)
mean_ignore_na(test_vector, na.rm = FALSE) # Output: NA (does not ignore NA)

The function behaves differently depending on whether na.rm is TRUE or FALSE. By setting a default, we give flexibility without requiring users to always specify the argument.

4. Returning Multiple Values from a Function

In R, you can return multiple values from a function by using a list. For example, let’s create a function that returns both the mean and standard deviation of a vector:

# Function to return both mean and standard deviation
mean_and_sd <- function(x, na.rm = TRUE){
    mean_value <- mean(x, na.rm = na.rm)  # Calculate mean
    sd_value <- sd(x, na.rm = na.rm)      # Calculate standard deviation
    
    return(list(mean = mean_value, sd = sd_value)) # Return a list of results
}

This function computes the mean and standard deviation and returns both in a list. Here’s how to use it:

# Test the function
results <- mean_and_sd(test_vector)

# Accessing results
results$mean  # Output: 3
results$sd    # Output: 1.825742

The function outputs a list, which allows you to access both the mean and standard deviation separately.

5. Vectorized Functions

One of R's strengths is vectorization, allowing you to apply functions over entire vectors or matrices without needing to write explicit loops. Let’s create a vectorized function that scales a vector to a range between 0 and 1:

# A vectorized function to scale a vector between 0 and 1
scale_01 <- function(x){
    scaled <- (x - min(x)) / (max(x) - min(x))  # Vectorized scaling
    return(scaled)
}

Here, R handles the entire vector x at once, and the operations (x - min(x)) and (max(x) - min(x)) are performed for all elements of the vector without requiring an explicit loop. Let’s see this in action:

# Test the function
test_vector <- c(1, 2, 3, 4, 5)
scaled_vector <- scale_01(test_vector)
scaled_vector # Output: 0.00 0.25 0.50 0.75 1.00

The function scales the values of the vector between 0 and 1, but it leaves NA values unchanged, allowing for unexpected function behavior. We can add na.rm as an argument to improve the function:

# Improved version with na.rm
scale_01 <- function(x, na.rm = TRUE){
    if (na.rm){x <- x[!is.na(x)]}  # Remove NA values if na.rm is TRUE
    scaled <- (x - min(x)) / (max(x) - min(x))  # Vectorized scaling
    return(scaled)
}

This improved version allows us to handle missing values more gracefully by removing them when necessary.

6. More Advanced: Functions Inside Functions

In R, functions can be nested inside other functions. This allows for greater modularity and reuse of code. Let’s look at an example where we define a helper function inside another function:

# Function with a helper function inside
outer_function <- function(x){
    # Inner helper function to square a number
    square <- function(y){
        return(y^2)
    }
    
    squared_values <- square(x)  # Use the inner function
    return(squared_values)
}

This function squares its input using the inner helper function:

# Test the function
outer_function(1:5)  # Output: 1 4 9 16 25

Nesting functions can be useful when you want to encapsulate specific tasks within larger, more complex operations.

7. Best Practices for Writing Functions

Here are some best practices to keep in mind when writing functions in R:

Use descriptive names: Choose function names that describe the task the function performs.
Keep functions focused: A function should perform a single task or a set of closely related tasks.
Document arguments: Use comments or documentation strings to explain what each argument does, especially if they have default values.
Return meaningful outputs: Ensure that the return value of your function is useful and well-structured, especially if it returns multiple results (use lists or data frames).

Conclusion

Writing functions in R is a powerful way to create reusable, organized, and efficient code. By mastering the basics and learning how to incorporate default arguments, return multiple values, and use vectorized operations, you can significantly improve the quality of your R scripts. Functions help modularize your code, making it easier to debug, test, and maintain over time. Happy coding!

Michael Harris

Mastering Function Writing in R: A Guide to Creating Reusable Code

1. Basic Syntax of a Function in R

2. Writing Your First Function

3. Adding Default Arguments

4. Returning Multiple Values from a Function

5. Vectorized Functions

6. More Advanced: Functions Inside Functions

7. Best Practices for Writing Functions

Conclusion

Simulating Random Processes and Sampling in R: A Comprehensive Guide

Neural Networks with R: Predictive Modeling using nnet and Regression Comparisons

Your source for trusted R tutorials and resources!