The “h2o.gbm” Function in R
Package: h2o
Purpose: Performs Gradient Boosting Machine (GBM) modeling for classification and regression tasks.
General class: Modeling
Required argument(s):
x: A vector containing the names or indices of the predictor variables.
y: The name or index of the response variable.
training_frame: The H2OFrame object containing the training data.
Notable optional arguments:
validation_frame: The H2OFrame object containing the validation data.
nfolds: Specifies the number of folds for cross-validation.
distribution: The distribution family for the response. Defaults to “AUTO” which chooses the appropriate distribution based on the response data.
ntrees: The number of trees in the GBM.
Example:
# Load the h2o library
library(h2o)
# Initialize h2o
h2o.init()
# Import data
data <- h2o.importFile("https://s3.amazonaws.com/h2o-public-test-data/smalldata/iris/iris_wheader.csv")
# Define predictor and response variables
predictors <- c("sepal_len", "sepal_wid", "petal_len", "petal_wid")
response <- "class"
# Split the data into train and test sets
split <- h2o.splitFrame(data, ratios = 0.8, seed = 123)
train <- h2o.assign(split[[1]], "train")
test <- h2o.assign(split[[2]], "test")
# Train the GBM model
gbm_model <- h2o.gbm(x = predictors, y = response, training_frame = train, validation_frame = test, ntrees = 50)
# View the model summary
summary(gbm_model)This example demonstrates how to train a GBM model using the h2o.gbm function from the h2o package. The function takes predictor variables (x), response variable (y), and training data (training_frame). Optional arguments like validation_frame and ntrees can also be specified. Finally, the summary function is used to view the model summary.