Yinsen Miao Tech Blogs

A Cluster Algorithm by Local Density

##Introduction## The approach bases on the idea that cluster centers are characterized by a higher density than their neighbors and by a relatively large distance from points with higher density. Clusters with an arbitrary shape are easily detected by approaches on local density of data points.

###Assumption### Cluster centers are surrounded by neighbors with lower local density and that they are at a relative large distance from any points with a higher local density.

###Process 1### For each data point , we compute two quantities: its local density and its distance from points of higher density. Both these quantities depend on the distance between data points, which are assumed to satisfy the triangular inequality.

The local density is defined as

where

and is a cutoff distance. Basically, is equal to the number of points that are closer than to point . The algorithm is sensitive only to the relative maginitude of in different points, implying that, for large data sets, the results of the analysis are robust with respect to the choice of .

is measured by computing the minimum distance between the point and other points with higher density than point :

For the point with highest density $\ref{aa}$, we conventionally take

After the cluster centers have been found, each remaining point is assigned to the same cluster as its nearest neighbor of higher density.

  1. Exactly the same in the Alex Rodriguez and Alessandro Laio’s paper Clustering by fast search and find of density peaks.