pyrolite.comp.impute

pyrolite.comp.impute.EMCOMP(X, threshold=None, tol=0.0001, convergence_metric=<function <lambda>>, max_iter=30)[source]

EMCOMP replaces rounded zeros in a compositional data set based on a set of thresholds. After Palarea-Albaladejo and Martín-Fernández (2008) 1.

Parameters
  • X (numpy.ndarray) – Dataset with rounded zeros

  • threshold (numpy.ndarray) – Array of threshold values for each component as a proprotion.

  • tol (float) – Tolerance to check for convergence.

  • convergence_metric (callable) – Callable function to check for convergence. Here we use a compositional distance rather than a maximum absolute difference, with very similar performance. Function needs to accept two numpy.ndarray arguments and third tolerance argument.

  • max_iter (int) – Maximum number of iterations before an error is thrown.

Returns

  • X_est (numpy.ndarray) – Dataset with rounded zeros replaced.

  • prop_zeros (float) – Proportion of zeros in the original data set.

  • n_iters (int) – Number of iterations needed for convergence.

Notes

  • At least one component without missing values is needed for the divisor. Rounded zeros/missing values are replaced by values below their respective detection limits.

  • This routine is not completely numerically stable as written.

Todo

  • Implement methods to deal with variable decection limits (i.e thresholds are array shape (N, D))

  • Conisder non-normal models for data distributions.

  • Improve numerical stability to reduce the chance of np.inf appearing.

References

1

Palarea-Albaladejo J. and Martín-Fernández J. A. (2008) A modified EM ALR-algorithm for replacing rounded zeros in compositional data sets. Computers & Geosciences 34, 902–917. doi: 10.1016/j.cageo.2007.09.015