The “extract” Function in R
Package: tidyr
Purpose: To extract substrings from a character vector based on a regular expression pattern.
General Class: Data Reshaping
Required Argument(s):
data: The data frame containing the column to extract from.
col: The name of the column to extract from.
regex: The regular expression pattern to use for extraction.
Notable Optional Arguments:
into: The names of new columns to create to store the extracted substrings.
remove: Whether to remove the original column after extraction. The default is TRUE.
Example (with Explanation):
# Load necessary packages
library(tidyr)
# Create a sample data frame
data <- data.frame(
ID = 1:3,
Name = c("John Doe", "Jane Smith", "Bob Johnson")
)
# Extract the first and last names from the 'Name' column
# The tidyr package is explicitly given because my system used...
# the extract function from the wrong package by default.
result <- tidyr::extract(data, col = Name, into = c("First_Name", "Last_Name"), regex = "(\\w+) (\\w+)")
# Display the result
print(result)In this example, the extract function from the tidyr package is used to extract the first and last names from the ‘Name’ column in the sample data frame data. The regular expression pattern "(\\w+) (\\w+)" is used to capture two groups of word characters representing the first and last names. The extracted substrings are stored in new columns ‘First_Name’ and ‘Last_Name’. The result is a new data frame result containing the extracted first and last names.