# 14.4 - Agglomerative Hierarchical Clustering

14.4 - Agglomerative Hierarchical Clustering

## Combining Clusters in the Agglomerative Approach

In the agglomerative hierarchical approach, we define each data point as a cluster and combine existing clusters at each step. Here are four different methods for this approach:

1. Single Linkage: In single linkage, we define the distance between two clusters as the minimum distance between any single data point in the first cluster and any single data point in the second cluster. On the basis of this definition of distance between clusters, at each stage of the process we combine the two clusters with the smallest single linkage distance.

2. Complete Linkage: In complete linkage, we define the distance between two clusters to be the maximum distance between any single data point in the first cluster and any single data point in the second cluster. On the basis of this definition of distance between clusters, at each stage of the process we combine the two clusters that have the smallestcomplete linkage distance.

3. Average Linkage: In average linkage, we define the distance between two clusters to be the average distance between data points in the first cluster and data points in the second cluster. On the basis of this definition of distance between clusters, at each stage of the process we combine the two clusters that have the smallest average linkage distance.

4. Centroid Method: In centroid method, the distance between two clusters is the distance between the two mean vectors of the clusters. At each stage of the process we combine the two clusters that have the smallest centroid distance.

5. Ward’s Method: This method does not directly define a measure of distance between two points or clusters. It is an ANOVA based approach. One-way univariate ANOVAs are done for each variable with groups defined by the clusters at that stage of the process. At each stage, two clusters merge that provide the smallest increase in the combined error sum of squares.

In the following table the mathematical form of the distances are provided. The graph gives a geometric interpretation.

Note! None of these four methods is uniformly the best. In practice, it’s advisable to try several methods and then compare the results to form an overall judgment about the final formation of clusters.

Notationally, define

• $\boldsymbol{X _ { 1 , } X _ { 2 , } , \dots , X _ { k }}$ = Observations from cluster 1
• $\boldsymbol{Y _ { 1 , } Y _ { 2 , } , \dots , Y _ { l }}$ = Observations from cluster 2
• d(x,y) = Distance between a subject with observation vector x and a subject with observation vector y

#### Linkage Methods or Measuring Association d12 Between Clusters 1 and 2

Single Linkage Complete Linkage $d_{12} = \displaystyle \min_{i,j}\text{ } d(\mathbf{X}_i, \mathbf{Y}_j)$ This is the distance between the closest members of the two clusters. $d_{12} = \displaystyle \max_{i,j}\text{ } d(\mathbf{X}_i, \mathbf{Y}_j)$ This is the distance between the members that are farthest apart (most dissimilar) $d_{12} = \frac{1}{kl}\sum\limits_{i=1}^{k}\sum\limits_{j=1}^{l}d(\mathbf{X}_i, \mathbf{Y}_j)$ This method involves looking at the distances between all pairs and averages all of these distances. This is also called UPGMA - Unweighted Pair Group Mean Averaging. $d_{12} = d(\bar{\mathbf{x}},\bar{\mathbf{y}})$ This involves finding the mean vector location for each of the clusters and taking the distance between the two centroids.

The following video shows the the linkage method types listed on the right for a visual representation of how the distances are determined for each method.

 [1] Link ↥ Has Tooltip/Popover Toggleable Visibility