The “h2o.kmeans” Function in R
Package: h2o
Purpose: Clustering
General class: Algorithm
Required argument(s):
training_frame: The numeric data frame or H2OFrame containing the data to be clustered.
Notable optional arguments:
k: The number of clusters. The default is 1.
seed: Seed for random initialization of centroids. The default is -1 (time-based random seed).
max_iterations: The maximum number of iterations. The default is 10.
Example:
# Load necessary package
library(h2o)
library(ggplot2)
# Initialize H2O
h2o.init()
# Load a sample dataset
data <- as.h2o(iris)
# Perform K-means clustering with 3 clusters
model <- h2o.kmeans(data, k = 3)
# View the cluster centers
print(model@model$centers)
# Plot the clustering
centers = model@model$centers
names(centers) <- c("Centroid",paste("PC",1:(ncol(centers)-2),sep = ""),"Species")
ggplot() +
geom_point(data = iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
geom_point(data = centers, aes(x = PC1, y = PC2), color = "black", size = 3, shape = 17) +
labs(title = "K-means Clustering", x = "PC1", y = "PC2")Explanation: This example demonstrates how to perform K-means clustering on the Iris dataset using the h2o.kmeans function from the h2o package. The k parameter is set to 3 to specify the number of clusters. The cluster centers are then printed and plotted to examine the resulting clusters.