Log-transforms
pyrolite includes a few functions for dealing with compositional data, at the heart of
which are i) closure (i.e. everything sums to 100%) and ii) log-transforms to deal with
the compositional space. The commonly used log-transformations include the
Additive Log-Ratio (ALR()
), Centred Log-Ratio
(CLR()
), and Isometric Log-Ratio
(ILR()
) 1 2.
This example will show you how to access and use some of these functions in pyrolite.
First let’s create some example data:
from pyrolite.util.synthetic import normal_frame, random_cov_matrix
df = normal_frame(
size=100,
cov=random_cov_matrix(sigmas=[0.1, 0.05, 0.3, 0.6], dim=4, seed=32),
seed=32,
)
df.describe()
Let’s have a look at some of the log-transforms, which can be accessed directly from
your dataframes (via pyrolite.comp.pyrocomp
), after you’ve imported
pyrolite.comp
. Note that the transformations will return new dataframes,
rather than modify their inputs. For example:
import pyrolite.comp
lr_df = df.pyrocomp.CLR() # using a centred log-ratio transformation
The transformations are implemented such that the column names generally make it evident which transformations have been applied (here using default simple labelling; see below for other examples):
Index(['CLR(SiO2/G)', 'CLR(CaO/G)', 'CLR(MgO/G)', 'CLR(FeO/G)', 'CLR(TiO2/G)'], dtype='object')
To invert these transformations, you can call the respective inverse transform:
Given we haven’t done anything to our dataframe in the meantime, we should be back
where we started, and our values should all be equal within numerical precision.
To verify this, we can use numpy.allclose()
:
import numpy as np
np.allclose(back_transformed, df)
True
In addition to easy access to the transforms, there’s also a convenience function
for taking a log-transformed mean (log-transforming, taking a mean, and inverse log
transforming; logratiomean()
):
SiO2 0.241
CaO 0.395
MgO 0.094
FeO 0.104
TiO2 0.166
dtype: float64
While this function defaults to using clr()
,
you can specify other log-transforms to use:
df.pyrocomp.logratiomean(transform="CLR")
SiO2 0.241
CaO 0.395
MgO 0.094
FeO 0.104
TiO2 0.166
dtype: float64
Notably, however, the logratio means should all give you the same result:
np.allclose(
df.pyrocomp.logratiomean(transform="CLR"),
df.pyrocomp.logratiomean(transform="ALR"),
) & np.allclose(
df.pyrocomp.logratiomean(transform="CLR"),
df.pyrocomp.logratiomean(transform="ILR"),
)
True
To change the default labelling outputs for column names, you can use the label_mode parameter, for example to get nice labels for plotting:
import matplotlib.pyplot as plt
df.pyrocomp.ILR(label_mode="latex").iloc[:, 0:2].pyroplot.scatter()
plt.show()
Alternatively if you simply want numeric indexes which you can use in e.g. a ML
pipeline, you can use label_mode="numeric"
:
df.pyrocomp.ILR(label_mode="numeric").columns
Index(['ILR0', 'ILR1', 'ILR2', 'ILR3'], dtype='object')
- 1
Aitchison, J., 1984. The statistical analysis of geochemical compositions. Journal of the International Association for Mathematical Geology 16, 531–564. doi: 10.1007/BF01029316
- 2
Egozcue, J.J., Pawlowsky-Glahn, V., Mateu-Figueras, G., Barceló-Vidal, C., 2003. Isometric Logratio Transformations for Compositional Data Analysis. Mathematical Geology 35, 279–300. doi: 10.1023/A:1023818214614
See also
- Examples:
- Tutorials:
- Modules and Functions:
Total running time of the script: (0 minutes 0.741 seconds)