9.39. K Medoids

Group (Subgroup)

DREAM3D Review (Clustering)

Description

Warning: The randomnes in this filter is not currently consistent between operating systems even if the same seed is used. Specifically between Unix and Windows. This does not affect the results, but the IDs will not correspond. For example if the Cluster Identifier at index one on Linux is 1 it could be 2 on Windows, the overarching clusters will be the same, but their IDs will be different.

This Filter applies the k medoids algorithm to an Attribute Array. K medoids is a clustering algorithm that assigns to each point of the Attribute Array a cluster Id. The user must specify the number of clusters in which to partition the array. Specifically, a k medoids partitioning is such that each point in the data set is associated with the cluster that minimizes the sum of the pair-wise distances between the data points and their associated cluster centers (medoids). This approach is analogous to k means, but uses actual data points (the medoids) as the cluster exemplars instead of the means. Medoids in this context refer to the data point in each cluster that is most like all other data points, i.e., that data point whose average distance to all other data points in the cluster is smallest. Unlike k means, since pair-wise distances are minimized instead of variance, any arbirtary concept of “distance” may be used; this Filter allows for the selection of a variety of distance metrics.

This Filter uses the Voronoi iteration algorithm to produce the clustering. The algorithm is iterative and proceeds as follows:

  1. Choose k points at random to serve as the initial cluster medoids

  2. Associate each point to the closest medoid

  3. Until convergence, repeat the following steps:

  • For each cluster, change the medoid to the point in that cluster that minimizes the sum of distances between that point and all other points in the cluster

  • Reassign each point to the closest medoid

Convergence is defined as when the medoids no longer change position. Since the algorithm is iterative, it only serves as an approximation, and may result in different classifications on each execution with the same input data. The user may opt to use a mask to ignore certain points; where the mask is false, the points will be placed in cluster 0.

Note: In SIMPLNX there is no explicit positional subtyping for Attribute Matrix, so the next section should be treated as a high-level understanding of what is being created. Naming the Attribute Matrix to include the type listed on the respective line in the ‘Attribute Matrix Created’ column is encouraged to help with readability and comprehension.

A clustering algorithm can be considered a kind of segmentation; this implementation of k medoids does not rely on the Geometry on which the data lie, only the topology of the space that the array itself forms. Therefore, this Filter has the effect of creating either Features or Ensembles depending on the kind of array passed to it for clustering. If an Element array (e.g., voxel-level Cell data) is passed to the Filter, then Features are created (in the previous example, a Cell Feature Attribute Matrix will be created). If a Feature array is passed to the Filter, then an Ensemble Attribute Matrix** is created. The following table shows what type of Attribute Matrix is created based on what sort of array is used for clustering:

Attribute Matrix Source

Attribute Matrix Created

Generic

Generic

Vertex

Vertex Feature

Edge

Edge Feature

Face

Face Feature

Cell

Cell Feature

Vertex Feature

Vertex Ensemble

Edge Feature

Edge Ensemble

Face Feature

Face Ensemble

Cell Feature

Cell Ensemble

Vertex Ensemble

Vertex Ensemble

Edge Ensemble

Edge Ensemble

Face Ensemble

Face Ensemble

Cell Ensemble

Cell Ensemble

This Filter will store the medoids for the final clusters within the created Attribute Matrix.

Random Number Seed Parameters

Parameter Name

Parameter Type

Parameter Notes

Description

Use Seed for Random Generation

Bool

When true the user will be able to put in a seed for random generation

Seed Value

Scalar Value

UInt64

The seed fed into the random generator

Stored Seed Value Array Name

DataObjectName

Name of array holding the seed value

Input Parameter(s)

Parameter Name

Parameter Type

Parameter Notes

Description

Use Mask Array

Bool

Specifies whether or not to use a mask array

Mask Array

Array Selection

Allowed Types: uint8, boolean

DataPath to the boolean or uint8 mask array. Values that are true will mark that cell/point as usable.

Number of Clusters

Scalar Value

UInt64

This will be the tuple size for Cluster Attribute Matrix and the values within

Distance Metric

Choices

Distance Metric type to be used for calculations

Input Data Objects

Parameter Name

Parameter Type

Parameter Notes

Description

Attribute Array to Cluster

Array Selection

Allowed Types: int8, uint8, int16, uint16, int32, uint32, int64, uint64, float32, float64

The array to find the medoids for

Output Data Object(s)

Parameter Name

Parameter Type

Parameter Notes

Description

Cluster Ids Array Name

DataObjectName

name of the ids array to be created in Attribute Array to Cluster’s parent group

Cluster Attribute Matrix

DataGroupCreation

name and path of Attribute Matrix to hold Cluster Data

Cluster Medoids Array Name

DataObjectName

name of the medoids array to be created in Cluster Attribute Matrix

References

[1] A simple and fast algorithm for K-medoids clustering, H.S. Park and C.H. Jun, Expert Systems with Applications, vol. 28 (2), pp. 3336-3341, 2009.

Example Pipelines

DREAM3D-NX Help

If you need help, need to file a bug report or want to request a new feature, please head over to the DREAM3DNX-Issues GitHub site where the community of DREAM3D-NX users can help answer your questions.