The “h2o.kmeans” Function in R

  • Package: h2o

  • Purpose: Clustering

  • General class: Algorithm

  • Required argument(s):

    • training_frame: The numeric data frame or H2OFrame containing the data to be clustered.

  • Notable optional arguments:

    • k: The number of clusters. The default is 1.

    • seed: Seed for random initialization of centroids. The default is -1 (time-based random seed).

    • max_iterations: The maximum number of iterations. The default is 10.

  • Example:

  • # Load necessary package
    library(h2o)
    library(ggplot2)

    # Initialize H2O
    h2o.init()

    # Load a sample dataset
    data <- as.h2o(iris)

    # Perform K-means clustering with 3 clusters
    model <- h2o.kmeans(data, k = 3)

    # View the cluster centers
    print(model@model$centers)

    # Plot the clustering
    centers = model@model$centers
    names(centers) <- c("Centroid",paste("PC",1:(ncol(centers)-2),sep = ""),"Species")

    ggplot() +
    geom_point(data = iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
    geom_point(data = centers, aes(x = PC1, y = PC2), color = "black", size = 3, shape = 17) +
    labs(title = "K-means Clustering", x = "PC1", y = "PC2")

  • Explanation: This example demonstrates how to perform K-means clustering on the Iris dataset using the h2o.kmeans function from the h2o package. The k parameter is set to 3 to specify the number of clusters. The cluster centers are then printed and plotted to examine the resulting clusters.

Previous
Previous

The “h2o.gbm” Function in R

Next
Next

The “h2o.table” Function in R