The “randomForest” Function in R
Package: randomForest
Purpose: Fit a random forest model for classification or regression tasks.
General class: Model
Required argument(s):
x: A data frame or matrix of predictor variables.
y: A vector of response values (target variable).
Notable optional arguments:
ntree: Number of trees to grow (default is 500).
mtry: Number of variables randomly sampled as candidates at each split (default is √p for classification and p/3 for regression).
importance: Logical; whether to assess variable importance (default is FALSE).
proximity: Logical; whether to compute proximity measures (default is FALSE).
oob.prox: Logical; whether to use out-of-bag proximity (default is FALSE).
Example:
# Load the required library
library(randomForest)
# Load the iris dataset
data(iris)
# Fit a random forest model to predict species based on other variables
rf_model <- randomForest(Species ~ ., data = iris, ntree = 100, importance = TRUE)
# Print the model summary
print(rf_model)
# View the importance of each predictor variable
print(importance(rf_model))In this example, the randomForest function is used to build a random forest model to predict the species of iris flowers based on the other variables in the iris dataset. The ntree argument specifies that 100 trees should be grown, and importance is set to TRUE to assess the importance of each predictor variable. After fitting the model, the summary and variable importance are printed. This function is widely used for its effectiveness in classification and regression tasks due to its ability to handle large datasets and complex interactions.