I need a deterministic clustering method to team values in distributions that might be one of two people random, normal or log-normal. Google greatly turns increase k-means, i m sorry isn"t deterministic. Resolving the stochastic entry (e.g. R"s set.seed) is less desirable than approaches that always return similar results for a details set, so ns can start to understand and predict your behaviour. Does such a clustering an approach exist?



I can allude you to an algorithm and also a household of algorithms:

EDIT: also, there are some approaches for deterministic initialization that K-Means clusters, such together this one.

You are watching: Is k-means an example of a deterministic algorithm?


Hierarchical Agglomerative Clustering is deterministic other than for tied distances when not making use of single-linkage.

DBSCAN is deterministic, other than for permutation that the data collection in rarely cases.

k-means is deterministic except for initialization. You can initialize with the an initial k objects, then it is deterministic, too.

PAM like k-means.

... But there is most likely 100 more clustering algorithms!


All the algorithms, by definition, room deterministic offered their inputs. Any kind of algorithm that supplies pseudo-random number is deterministic given the seed.

K-means, the you provided as example, starts v randomly favored cluster centroids for this reason to discover optimal ones. As well as the initialization, the algorithm is totally deterministic, together you have the right to make sure looking in ~ it"s pseudocode:


Nothing prohibits friend from starting with non-random centroids. We usage random centroids so to make sure that bad chosen beginning points would certainly not lead united state to bad results. The very same with other "random" algorithms: you have the right to use castle in "deterministic" fashion, yet in most instances this is not a wise thing to do.

In instance of k-means the algorithm deterministically minimizes the within-cluster amount of squares to discover the optimal clustering solution. Unfortunately, that is perceptible to how the algorithm to be initialized. Clustering troubles in most instances do not have clear-cut solutions, due to the fact that of that we often want to usage randomized procedures to robustify them.

Imagine that you provided some deterministic ordered clustering algorithm. Imagine that it goes v your data sequentially, starting from the first observation. What would happen if the an initial case to be an outlier? on the various other hand, if girlfriend initialized it numerous times at arbitrarily points, the procedure would be less prone to difficulties with data.

See more: Where Does The Word Car Come From Carrus, — Learn Even More About Cars

Moreover, if you run non-"deterministic" algorithm lot of times and then use majority vote to choose for each case the class that appeared most commonly among the results, climate the final output will likewise by very deterministic.