The “extract” Function in R

  • Package: tidyr

  • Purpose: To extract substrings from a character vector based on a regular expression pattern.

  • General Class: Data Reshaping

  • Required Argument(s):

    • data: The data frame containing the column to extract from.

    • col: The name of the column to extract from.

    • regex: The regular expression pattern to use for extraction.

  • Notable Optional Arguments:

    • into: The names of new columns to create to store the extracted substrings.

    • remove: Whether to remove the original column after extraction. The default is TRUE.

  • Example (with Explanation):

  • # Load necessary packages
    library(tidyr)

    # Create a sample data frame
    data <- data.frame(
    ID = 1:3,
    Name = c("John Doe", "Jane Smith", "Bob Johnson")
    )

    # Extract the first and last names from the 'Name' column
    # The tidyr package is explicitly given because my system used...
    # the extract function from the wrong package by default.
    result <- tidyr::extract(data, col = Name, into = c("First_Name", "Last_Name"), regex = "(\\w+) (\\w+)")

    # Display the result
    print(result)

  • In this example, the extract function from the tidyr package is used to extract the first and last names from the ‘Name’ column in the sample data frame data. The regular expression pattern "(\\w+) (\\w+)" is used to capture two groups of word characters representing the first and last names. The extracted substrings are stored in new columns ‘First_Name’ and ‘Last_Name’. The result is a new data frame result containing the extracted first and last names.

Previous
Previous

The “fill” Function in R

Next
Next

The “expand” Function in R