The “h2o.kmeans” Function in R

Mar 14

training_frame: The numeric data frame or H2OFrame containing the data to be clustered.

k: The number of clusters. The default is 1.
seed: Seed for random initialization of centroids. The default is -1 (time-based random seed).
max_iterations: The maximum number of iterations. The default is 10.

Example:
# Load necessary package
library(h2o)
library(ggplot2)

# Initialize H2O
h2o.init()

# Load a sample dataset
data <- as.h2o(iris)

# Perform K-means clustering with 3 clusters
model <- h2o.kmeans(data, k = 3)

# View the cluster centers
print(model@model$centers)

# Plot the clustering
centers = model@model$centers
names(centers) <- c("Centroid",paste("PC",1:(ncol(centers)-2),sep = ""),"Species")

ggplot() +
geom_point(data = iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
geom_point(data = centers, aes(x = PC1, y = PC2), color = "black", size = 3, shape = 17) +
labs(title = "K-means Clustering", x = "PC1", y = "PC2")
Explanation: This example demonstrates how to perform K-means clustering on the Iris dataset using the h2o.kmeans function from the h2o package. The k parameter is set to 3 to specify the number of clusters. The cluster centers are then printed and plotted to examine the resulting clusters.

The “h2o.gbm” Function in R