Monday, December 31, 2018

在 R 進行 K-Means clustering 分析

之前說了好久的 K-Means clustering 分析終於趁過年放假公佈指令.
使用 R 的好處是將這些複雜的數學計算簡化成指令, 而只要專注於資料分析及選擇正確的分析方法. 參考資料網站的詳盡程度值得讚許, 也值得各位好好仔細學習.

###
###  Load data and remove 1st and 2nd column
###
data <- DATA_SOURCE_NAME[-1]
data <- data[-1]
View(data)

###
### Normalization
###
scale(data, center = TRUE, scale = TRUE)
data <- scale(data, center = TRUE, scale = TRUE)
View(data)
write.table(data, file="DATA_Z-Score.csv", sep=",")

###
### K = 2
###
km <- kmeans(data, centers = 2, nstart = 10)
require(factoextra)
fviz_cluster(km, data = data, geom = c("point", "text"), ellipse.type = "norm")
(WSS <- km$tot.withinss) + (BSS <- km$betweenss) + (TSS <- BSS + WSS) + (ratio <- WSS / TSS)

> (WSS <- km$tot.withinss)
[1] 56253.93
> + (BSS <- km$betweenss)
[1] 22738.07
> + (TSS <- BSS + WSS)
[1] 78992
> + (ratio <- WSS / TSS)
[1] 0.7121472

outdata <- table(DATA_SOURCE_NAME$Label, km$cluster)
write.table(outdata, file="DATA_Z-Score-2.csv", sep=",")


###
### K = 3
###
km <- kmeans(data, centers = 3, nstart = 10)
require(factoextra)
fviz_cluster(km, data = data, geom = c("point", "text"), ellipse.type = "norm")
(WSS <- km$tot.withinss) + (BSS <- km$betweenss) + (TSS <- BSS + WSS) + (ratio <- WSS / TSS)

> (WSS <- km$tot.withinss)
[1] 40904.54
> + (BSS <- km$betweenss)
[1] 38087.46
> + (TSS <- BSS + WSS)
[1] 78992
> + (ratio <- WSS / TSS)
[1] 0.5178315

outdata <- table(DATA_SOURCE_NAME$Label, km$cluster)
write.table(outdata, file="DATA_Z-Score-3.csv", sep=",")

Ref:
R系列筆記

No comments: