```{index} single: Filters; Compute Attribute Array Statistics
```
# Compute Attribute Array Statistics

## Group (Subgroup)

DREAM3D Review (Statistics)

## Description

***WARNING: Histogram functionality moved to a different filter.***

This **Filter** computes a variety of statistics for a given scalar array. The currently available statistics are array length, minimum, maximum, (arithmetic) mean, median, mode, standard deviation, and summation; any combination of these statistics may be computed by this **Filter**. Any scalar array, of any primitive type, may be used as input. The type of the output arrays depends on the kind of statistic computed:

| Statistic               | Primitive Type                      |
|-------------------------|-------------------------------------|
| Histogram               | uint64 (of user set component size) |
| Length                  | signed 64-bit integer               |
| Minimum                 | same type as input                  |
| Maximum                 | same type as input                  |
| Mean                    | double                              |
| Median                  | double                              |
| Mode                    | same type as input                  |
| Standard Deviation      | double                              |
| Summation               | double                              |
| Standardized            | double                              |
| Number of Unique Values | signed 32-bit integer               |

The user may optionally use a mask to specify points to be ignored when computing the statistics; only points where the supplied mask is *true* will be considered when computing statistics.  Additionally, the user may select to have the statistics computed per **Feature** or **Ensemble** by supplying an Ids array.  For example, if the user opts to compute statistics per **Feature** and selects an array that has 10 unique **Feature** Ids, then this **Filter** will compute 10 sets of statistics (e.g., find the mean of the supplied array for each **Feature**, find the total number of points in each **Feature** (the length), etc.).  

The input array may also be *standardized*, meaning that the array values will be adjusted such that they have a mean of 0 and unit variance.  This *Standardize Data* option requires the selection of both the *Find Mean* and *Find Standard Deviation* options.  The standardized data will be saved as a new array object stored in the same **Attribute Matrix** as the input array.  Note that if the *Standardize Data* option is selected, the mean and standard deviation values created by this **Filter** reflect the mean and standard deviation of the *original* array; the new standardized array has a mean of 0 and unit variance.  The standardized array will be computed in double precision.  If the statistics are being computed per **Feature** or **Ensemble**, then the array values are standardized according to the mean and standard deviation *for each **Feature/Ensemble***.  For example, if 5 unique **Features** were being analyzed and *Standardize Data* was selected, then the array values for **Feature** 1 would be standardized according to the mean and standard deviation for **Feature** 1, then the array values for **Feature** 2 would be standardized according to the mean and standard deviation for **Feature** 2, and so on for the remaining **Features**.  

The user must select a destination **Attribute Matrix** in which the computed statistics will be stored.  If electing to *Compute Statistics Per Feature/Ensemble*, then a reasonable selection for this array is the **Feature/Ensemble** **Attribute Matrix** associated with the supplied **Feature/Ensemble** Ids.  However, the only requirement is that the number of columns in the selected destination **Attribute Matrix** match the number of **Features/Ensembles** specified by the supplied Id array.  This requirement is enforced at run time.  If computing statistics for the entire input array, then only one value is computed per statistic; therefore, the arrays produced only contain one value.  In this case, the destination **Attribute Matrix** should only contain 1 tuple.  If such a **Generic Attribute Matrix** does not exist, it can be created.

Special operations occur for certain statistics if the supplied array is of type *bool* (for example, a mask array produced [when thresholding](@ref multithresholdobjects)).  The length, minimum, maximum, median, mode, and summation are computed as normal (although the resulting values may be platform dependent).  The mean and standard deviation for a boolean array will be true if there are more instances of true in the array than false.  If *Standardize Data* is chosen for a boolean array, no actual modifications will be made to the input.  These operations for boolean inputs are chosen as a basic convention, and are not intended be representative of true boolean logic.

## Ranges Breakdown

The ranges feature was added to primarily offer the following functionality:

1. option to output an array that has the Feature id in it. (Feature Ids Indexing Array)
2. option to set the "Feature Id" range.

- Allow the user to "pad out the feature ids" to a specific range
- Allow the user to only compute stats for specific feature Ids

3. option to Ignore Feature Id Zero.
4. remove empty spaces for feature ids that start above 1

All of these can be achieved with the new functionality, here's how:

For option 1, this array (Feature Ids Indexing Array) is automatically created for any Range selection other than `None`. The nuance here is that if your range or `Shrink To Fit` contains all the features this array will be redundant and can be removed, however, this is a very niche occurance and users are encouraged to just select `None` if they know this to be the case.

For option 2, this is provided with both the `Padded Custom Range` and `Minimum Size in Custom Range`. The latter is intended for users who are trying to cut down size without aproiri knowledge of the number of features. It will chop anything outside the upper bound or take the max feature if the custom upper bound exceeds it. The same is true for the lower bound in that it will take the higher of the two between provided range and the min Feature Id. `Padded Custom Range` will fill generate/fill extra values for values below and above the minimum and maximum Feature Id respectively. See the bonus section for additional range features.

For option 3, the ability to ignore Feature Id Zero (the invalid Feature Id) is provided directly in the form of `Ignore Feature 0` and indirectly through ranges.

For option 4, the most direct feature to address this is the `Shrink to Fit` range option, however it can also be achived with `Minimum Size in Custom Range`.

*Bonus: If you are unsure of the max feature id in your range, supplying a `-1` will determine the maximum feature id and use it as the upper bound in execution.*


### Input Data

| Parameter Name | Parameter Type | Parameter Notes | Description |
|----------------|----------------|-----------------|-------------|
| Attribute Array to Compute Statistics | Array Selection | Allowed Types: int8, uint8, int16, uint16, int32, uint32, int64, uint64, float32, float64, boolean Comp. Shape: 1 | Input Attribute Array for which to compute statistics |

### Output Data

| Parameter Name | Parameter Type | Parameter Notes | Description |
|----------------|----------------|-----------------|-------------|
| Destination Attribute Matrix | DataGroupCreation |  | Attribute Matrix in which to store the computed statistics |

### Optional Data Mask

| Parameter Name | Parameter Type | Parameter Notes | Description |
|----------------|----------------|-----------------|-------------|
| Use Mask Array | Bool |  | Specifies whether or not to use a mask array |
| Mask Array | Array Selection | Allowed Types: uint8, boolean Comp. Shape: 1 | DataPath to the boolean mask array. Values that are true will mark that cell/point as usable. |

### Algorithm Options

| Parameter Name | Parameter Type | Parameter Notes | Description |
|----------------|----------------|-----------------|-------------|
| Compute Statistics Per Feature/Ensemble | Bool |  | Whether the statistics should be computed on a Feature/Ensemble basis |
| Cell Feature Ids | Array Selection | Allowed Types: int32 Comp. Shape: 1 | Specifies to which feature each cell belongs. |
| Feature Range Type | Choices |  | Set a range to manipulate output size in various ways. See detailed breakdown and tips in the documentation... |
| Custom Feature ID Range | Vector of Int32 Values | Order=Lower Bound (inclusive),Upper Bound (inclusive) | The range of feature Ids, inclusive, `-1` in upper bound will be resolved to max Feature Id in the array; Only applies to  `Padded Custom Range` and `Minimum Size in Custom Range` |
| Feature ID Indexing Name | DataObjectName |  | The name of the indexing array for the output mapping in the reduced `Destination Attribute Matrix`, not applicable if `None` selected |

### Output Output Arrays

| Parameter Name | Parameter Type | Parameter Notes | Description |
|----------------|----------------|-----------------|-------------|
| Feature-Has-Data Array Name | DataObjectName |  | The name of the boolean array that indicates whether or not each feature contains any data.  This array is especially useful to help determine whether or not the outputted statistics are actually valid or not for a given feature. |
| Find Length | Bool |  | Whether to compute the length of the input array |
| Length Array Name | DataObjectName |  | The name of the length array |
| Find Minimum | Bool |  | Whether to compute the minimum of the input array |
| Minimum Array Name | DataObjectName |  | The name of the minimum array |
| Find Maximum | Bool |  | Whether to compute the maximum of the input array |
| Maximum Array Name | DataObjectName |  | The name of the maximum array |
| Find Mean | Bool |  | Whether to compute the arithmetic mean of the input array |
| Mean Array Name | DataObjectName |  | The name of the mean array |
| Find Median | Bool |  | Whether to compute the median of the input array |
| Median Array Name | DataObjectName |  | The name of the median array |
| Find Mode | Bool |  | Whether to compute the mode of the input array |
| Mode Array Name | DataObjectName |  | The name of the mode array |
| Find Standard Deviation | Bool |  | Whether to compute the standard deviation of the input array |
| Standard Deviation Array Name | DataObjectName |  | The name of the standard deviation array |
| Find Summation | Bool |  | Whether to compute the summation of the input array |
| Summation Array Name | DataObjectName |  | The name of the summation array |
| Standardize Data | Bool |  | Should a standardized data array be generated |
| Standardized Data Array Name | DataObjectName |  | The name of the standardized data array |
| Find Number of Unique Values | Bool |  | Whether to compute the number of unique values in the input array |
| Number of Unique Values Array Name | DataObjectName |  | The name of the array which stores the calculated number of unique values |

## Example Pipelines

## License & Copyright

Please see the description file distributed with this plugin.

## DREAM3D-NX Help

If you need help, need to file a bug report or want to request a new feature, please head over to the [DREAM3DNX-Issues](https://github.com/BlueQuartzSoftware/DREAM3DNX-Issues/discussions) GitHub site where the community of DREAM3D-NX users can help answer your questions.
