The “fill” Function in R

  • Package: tidyr

  • Purpose: To fill missing values in a column with the most recent non-missing value.

  • General Class: Data Reshaping

  • Required Argument(s):

    • data: A data frame to manipulate.

    • cols: Columns to fill.

  • Notable Optional Arguments:

    • None.

  • Example (with Explanation):

  • # Load necessary packages
    library(tidyr)

    # Define Variables
    set.seed(1)
    ID = 1:10
    Missing_Values = rnorm(10)
    Dependent_Value = 2*Missing_Values + rnorm(10)

    # Make some missing values and create the data frame
    Missing_Values[c(2,4,6,8)] <- NA
    data <- data.frame(ID, Missing_Values, Dependent_Value)

    # Show data frame
    print(data)

    # Fill the values using arrange and fill
    Filled_data <- data %>%

    # Organize the data based on the dependent variable
    arrange(Dependent_Value) %>%

    # Fill values based on where it is in reference to the dependent variable
    fill(Missing_Values) %>%

    # Reorganize the data back to being ordered by the ID variable
    arrange(ID)

    # Print the filled data
    print(Filled_data)

  • In this example, the fill function from the tidyr package is used to fill missing values in the ‘Value’ column of the sample data frame data. The arrange function was used to make the filling of missing values more sensible, as the filled in data was based on the relative ordering of a dependent variable, rather than the ordering of a irrelevant ID variable. This is not how I would recommend imputing missing values, you would be better served using methods that comprehensively acknowledge the relationships between variables to make imputation decisions. You may want to look into the following packages for data imputation: mice, Amelia, and missForest. I also have a tutorial on missing data imputation for machine learning applications at: https://www.statswithr.com/tutorials/missing-data-imputation-for-machine-learning

Previous
Previous

The “drop_na” Function in R

Next
Next

The “extract” Function in R