pyrolite.comp.impute
- pyrolite.comp.impute.EMCOMP(X, threshold=None, tol=0.0001, convergence_metric=<function <lambda>>, max_iter=30)[source]
EMCOMP replaces rounded zeros in a compositional data set based on a set of thresholds. After Palarea-Albaladejo and Martín-Fernández (2008) 1.
- Parameters
X (
numpy.ndarray
) – Dataset with rounded zerosthreshold (
numpy.ndarray
) – Array of threshold values for each component as a proprotion.tol (
float
) – Tolerance to check for convergence.convergence_metric (
callable
) – Callable function to check for convergence. Here we use a compositional distance rather than a maximum absolute difference, with very similar performance. Function needs to accept twonumpy.ndarray
arguments and third tolerance argument.max_iter (
int
) – Maximum number of iterations before an error is thrown.- Returns
X_est (
numpy.ndarray
) – Dataset with rounded zeros replaced.prop_zeros (
float
) – Proportion of zeros in the original data set.n_iters (
int
) – Number of iterations needed for convergence.Notes
At least one component without missing values is needed for the divisor. Rounded zeros/missing values are replaced by values below their respective detection limits.
This routine is not completely numerically stable as written.
Todo
Implement methods to deal with variable decection limits (i.e thresholds are array shape
(N, D)
)Conisder non-normal models for data distributions.
Improve numerical stability to reduce the chance of
np.inf
appearing.References
- 1
Palarea-Albaladejo J. and Martín-Fernández J. A. (2008) A modified EM ALR-algorithm for replacing rounded zeros in compositional data sets. Computers & Geosciences 34, 902–917. doi: 10.1016/j.cageo.2007.09.015