The “h2o.randomForest” Function in R

  • Package: h2o

  • Purpose: Performs Random Forest modeling for classification and regression tasks.

  • General class: Modeling

  • Required argument(s):

    • x: A vector containing the names or indices of the predictor variables.

    • y: The name or index of the response variable.

    • training_frame: The H2OFrame object containing the training data.

  • Notable optional arguments:

    • validation_frame: The H2OFrame object containing the validation data.

    • ntrees: The number of trees to grow in the forest. Defaults to 50.

    • max_depth: The maximum depth of each tree in the forest.

    • mtries: The number of variables randomly sampled as candidates at each split.

  • Example:

  • # Load the h2o library
    library(h2o)

    # Initialize h2o
    h2o.init()

    # Import data
    data <- h2o.importFile("https://s3.amazonaws.com/h2o-public-test-data/smalldata/iris/iris_wheader.csv")

    # Define predictor and response variables
    predictors <- c("sepal_len", "sepal_wid", "petal_len", "petal_wid")
    response <- "class"

    # Split the data into train and test sets
    split <- h2o.splitFrame(data, ratios = 0.8, seed = 123)
    train <- h2o.assign(split[[1]], "train")
    test <- h2o.assign(split[[2]], "test")

    # Train the Random Forest model
    rf_model <- h2o.randomForest(x = predictors, y = response, training_frame = train, validation_frame = test,
    ntrees = 100, max_depth = 20, mtries = -1)

    # View the model summary
    summary(rf_model)

  • This example demonstrates how to train a Random Forest model using the h2o.randomForest function from the h2o package. The function takes predictor variables (x), response variable (y), and training data (training_frame). Optional arguments like ntrees, max_depth, and mtries can also be specified. Finally, the summary function is used to view the model summary.

Previous
Previous

The “h2o.splitFrame” Function in R

Next
Next

The “h2o.deeplearning” Function in R