The “distinct” Function in R

The “distinct” Function in R

Feb 3

Written By Michael Harris

Package: dplyr

Purpose: To identify and extract unique rows in a data frame or tibble based on specified columns.

General Class: Data Manipulation

Required Argument(s):

data: The data frame or tibble to process.

...: Columns to use for identifying unique rows.

Notable Optional Arguments:

.keep_all: If TRUE, keeps all columns; if FALSE, keeps only the columns used for distinctness.

Example (with Explanation):
# Load necessary packages
library(dplyr)

# Create a sample data frame
data <- data.frame(
category = c("A", "B", "A", "B", "A"),
value = c(10, 15, 8, 12, 10)
)

# Extract distinct rows based on the 'category' and 'value' columns
distinct_result <- distinct(data, category, value)

# Display the distinct rows
print(distinct_result)
In this example, the distinct function from the dplyr package is used to extract unique rows from the sample data frame. The uniqueness is determined based on the values in the category and value columns. The result is a data frame containing only the distinct rows, which is useful for identifying and extracting unique observations from a larger dataset.