8.15. Compute Attribute Array Statistics

Group (Subgroup)

DREAM3D Review (Statistics)

Description

This Filter computes a variety of statistics for a given scalar array. The currently available statistics are array length, minimum, maximum, (arithmetic) mean, median, mode, standard deviation, and summation; any combination of these statistics may be computed by this Filter. Any scalar array, of any primitive type, may be used as input. The type of the output arrays depends on the kind of statistic computed:

Statistic	Primitive Type
Histogram	uint64 (of user set component size)
Length	signed 64-bit integer
Minimum	same type as input
Maximum	same type as input
Mean	double
Median	double
Mode	same type as input
Standard Deviation	double
Summation	double
Standardized	double
Number of Unique Values	signed 32-bit integer

The user may optionally use a mask to specify points to be ignored when computing the statistics; only points where the supplied mask is true will be considered when computing statistics. Additionally, the user may select to have the statistics computed per Feature or Ensemble by supplying an Ids array. For example, if the user opts to compute statistics per Feature and selects an array that has 10 unique Feature Ids, then this Filter will compute 10 sets of statistics (e.g., find the mean of the supplied array for each Feature, find the total number of points in each Feature (the length), etc.).

The input array may also be standardized, meaning that the array values will be adjusted such that they have a mean of 0 and unit variance. This Standardize Data option requires the selection of both the Find Mean and Find Standard Deviation options. The standardized data will be saved as a new array object stored in the same Attribute Matrix as the input array. Note that if the Standardize Data option is selected, the mean and standard deviation values created by this Filter reflect the mean and standard deviation of the original array; the new standardized array has a mean of 0 and unit variance. The standardized array will be computed in double precision. If the statistics are being computed per Feature or Ensemble, then the array values are standardized according to the mean and standard deviation for each Feature/Ensemble. For example, if 5 unique Features were being analyzed and Standardize Data was selected, then the array values for Feature 1 would be standardized according to the mean and standard deviation for Feature 1, then the array values for Feature 2 would be standardized according to the mean and standard deviation for Feature 2, and so on for the remaining Features.

The user must select a destination Attribute Matrix in which the computed statistics will be stored. If electing to Compute Statistics Per Feature/Ensemble, then a reasonable selection for this array is the Feature/Ensemble Attribute Matrix associated with the supplied Feature/Ensemble Ids. However, the only requirement is that the number of columns in the selected destination Attribute Matrix match the number of Features/Ensembles specified by the supplied Id array. This requirement is enforced at run time. If computing statistics for the entire input array, then only one value is computed per statistic; therefore, the arrays produced only contain one value. In this case, the destination Attribute Matrix should only contain 1 tuple. If such a Generic Attribute Matrix does not exist, it can be created.

Special operations occur for certain statistics if the supplied array is of type bool (for example, a mask array produced [when thresholding](@ref multithresholdobjects)). The length, minimum, maximum, median, mode, and summation are computed as normal (although the resulting values may be platform dependent). The mean and standard deviation for a boolean array will be true if there are more instances of true in the array than false. If Standardize Data is chosen for a boolean array, no actual modifications will be made to the input. These operations for boolean inputs are chosen as a basic convention, and are not intended be representative of true boolean logic.

Hisogram Notes:

When creating a histogram the output arrays can take 2 different layouts.

Histogram and “Compute Statistics by Feature/Ensemble” is NOT selected

The output is in the form of 2 Data Arrays. The first data array will have the counts. The number of tuples of the array is the same as the number of bins in the histogram. The second data array will have the bin ranges. The array has 2 components where the first component of each tuple is the minimum of the bin (inclusive) and the second component of the tuple is the maximum for that bin (exclusive).

Histogram and “Compute Statistics by Feature/Ensemble” IS selected

The output is in the form of 2 arrays, but for each output array the number of tuples of the array is the same as the number of features/ensembles for which you are calculating the statistics. The number of components for the “Counts” array is now the number of bins. The second array is the same tuple shape as the counts array but now the number of components is the number of bins * 2 and the data is encoded as [Bin Min, Bin Max], [Bin Min, Bin Max].

Note:

Input Data

Parameter Name	Parameter Type	Parameter Notes	Description
Attribute Array to Compute Statistics	Array Selection	Allowed Types: int8, uint8, int16, uint16, int32, uint32, int64, uint64, float32, float64, boolean Comp. Shape: 1	Input Attribute Array for which to compute statistics

Output Data

Parameter Name	Parameter Type	Parameter Notes	Description
Destination Attribute Matrix	DataGroupCreation		Attribute Matrix in which to store the computed statistics

Histogram Options

Parameter Name	Parameter Type	Parameter Notes	Description
Find Histogram	Bool		Whether to compute the histogram of the input array
Number of Bins	Scalar Value	Int32	Number of bins in histogram
Use Full Range for Histogram	Bool		If true, ignore min and max and use min and max from array upon which histogram is computed
Custom Histogram Min Value	Scalar Value	Float64	Min cutoff value for histogram
Custom Histogram Max Value	Scalar Value	Float64	Max cutoff value for histogram
Histogram Bin Counts Array Name	DataObjectName		The name of the histogram bin counts array
Histogram Bin Ranges Array Name	DataObjectName		The name of the histogram bin ranges array
Most Populated Bin Array Name	DataObjectName		The name of the Most Populated Bin array
Find Modal Histogram Bin Ranges	Bool		Whether to compute the histogram bin ranges that contain the mode values. This option requires that “ Find Mode “ is turned on.
Modal Histogram Bin Ranges Array Name	DataObjectName		The name of the array that stores the histogram bin range(s) that contain the mode(s) of the data.

Optional Data Mask

Parameter Name	Parameter Type	Parameter Notes	Description
Use Mask Array	Bool		Specifies whether or not to use a mask array
Mask Array	Array Selection	Allowed Types: uint8, boolean Comp. Shape: 1	DataPath to the boolean mask array. Values that are true will mark that cell/point as usable.

Algorithm Options

Parameter Name	Parameter Type	Parameter Notes	Description
Compute Statistics Per Feature/Ensemble	Bool		Whether the statistics should be computed on a Feature/Ensemble basis
Cell Feature Ids	Array Selection	Allowed Types: int32 Comp. Shape: 1	Specifies to which Feature each Element belongs

Output Output Arrays

Parameter Name	Parameter Type	Description
Feature-Has-Data Array Name	DataObjectName	The name of the boolean array that indicates whether or not each feature contains any data. This array is especially useful to help determine whether or not the outputted statistics are actually valid or not for a given feature.
Find Length	Bool	Whether to compute the length of the input array
Length Array Name	DataObjectName	The name of the length array
Find Minimum	Bool	Whether to compute the minimum of the input array
Minimum Array Name	DataObjectName	The name of the minimum array
Find Maximum	Bool	Whether to compute the maximum of the input array
Maximum Array Name	DataObjectName	The name of the maximum array
Find Mean	Bool	Whether to compute the arithmetic mean of the input array
Mean Array Name	DataObjectName	The name of the mean array
Find Median	Bool	Whether to compute the median of the input array
Median Array Name	DataObjectName	The name of the median array
Find Mode	Bool	Whether to compute the mode of the input array
Mode Array Name	DataObjectName	The name of the mode array
Find Standard Deviation	Bool	Whether to compute the standard deviation of the input array
Standard Deviation Array Name	DataObjectName	The name of the standard deviation array
Find Summation	Bool	Whether to compute the summation of the input array
Summation Array Name	DataObjectName	The name of the summation array
Standardize Data	Bool	Should a standardized data array be generated
Standardized Data Array Name	DataObjectName	The name of the standardized data array
Find Number of Unique Values	Bool	Whether to compute the number of unique values in the input array
Number of Unique Values Array Name	DataObjectName	The name of the array which stores the calculated number of unique values

Example Pipelines

License & Copyright

Please see the description file distributed with this plugin.

DREAM3D-NX Help

If you need help, need to file a bug report or want to request a new feature, please head over to the DREAM3DNX-Issues GitHub site where the community of DREAM3D-NX users can help answer your questions.