Log-transforms

pyrolite includes a few functions for dealing with compositional data, at the heart of which are i) closure (i.e. everything sums to 100%) and ii) log-transforms to deal with the compositional space. The commonly used log-transformations include the Additive Log-Ratio (ALR()), Centred Log-Ratio (CLR()), and Isometric Log-Ratio (ILR()) 1 2.

This example will show you how to access and use some of these functions in pyrolite.

First let’s create some example data:

from pyrolite.util.synthetic import normal_frame, random_cov_matrix

df = normal_frame(
    size=100,
    cov=random_cov_matrix(sigmas=[0.1, 0.05, 0.3, 0.6], dim=4, seed=32),
    seed=32,
)
df.describe()
SiO2 CaO MgO FeO TiO2
count 100.000 100.000 100.000 100.000 100.000
mean 0.240 0.389 0.093 0.105 0.172
std 0.037 0.017 0.008 0.028 0.057
min 0.122 0.338 0.072 0.046 0.074
25% 0.216 0.379 0.088 0.084 0.131
50% 0.239 0.390 0.093 0.103 0.168
75% 0.263 0.401 0.099 0.121 0.207
max 0.343 0.430 0.111 0.180 0.408


Let’s have a look at some of the log-transforms, which can be accessed directly from your dataframes (via pyrolite.comp.pyrocomp), after you’ve imported pyrolite.comp. Note that the transformations will return new dataframes, rather than modify their inputs. For example:

import pyrolite.comp

lr_df = df.pyrocomp.CLR()  # using a centred log-ratio transformation

The transformations are implemented such that the column names generally make it evident which transformations have been applied (here using default simple labelling; see below for other examples):

Index(['CLR(SiO2/G)', 'CLR(CaO/G)', 'CLR(MgO/G)', 'CLR(FeO/G)', 'CLR(TiO2/G)'], dtype='object')

To invert these transformations, you can call the respective inverse transform:

Given we haven’t done anything to our dataframe in the meantime, we should be back where we started, and our values should all be equal within numerical precision. To verify this, we can use numpy.allclose():

import numpy as np

np.allclose(back_transformed, df)
True

In addition to easy access to the transforms, there’s also a convenience function for taking a log-transformed mean (log-transforming, taking a mean, and inverse log transforming; logratiomean()):

SiO2    0.241
CaO     0.395
MgO     0.094
FeO     0.104
TiO2    0.166
dtype: float64

While this function defaults to using clr(), you can specify other log-transforms to use:

df.pyrocomp.logratiomean(transform="CLR")
SiO2    0.241
CaO     0.395
MgO     0.094
FeO     0.104
TiO2    0.166
dtype: float64

Notably, however, the logratio means should all give you the same result:

np.allclose(
    df.pyrocomp.logratiomean(transform="CLR"),
    df.pyrocomp.logratiomean(transform="ALR"),
) & np.allclose(
    df.pyrocomp.logratiomean(transform="CLR"),
    df.pyrocomp.logratiomean(transform="ILR"),
)
True

To change the default labelling outputs for column names, you can use the label_mode parameter, for example to get nice labels for plotting:

import matplotlib.pyplot as plt

df.pyrocomp.ILR(label_mode="latex").iloc[:, 0:2].pyroplot.scatter()
plt.show()
logtransforms

Alternatively if you simply want numeric indexes which you can use in e.g. a ML pipeline, you can use label_mode="numeric":

df.pyrocomp.ILR(label_mode="numeric").columns
Index(['ILR0', 'ILR1', 'ILR2', 'ILR3'], dtype='object')
1

Aitchison, J., 1984. The statistical analysis of geochemical compositions. Journal of the International Association for Mathematical Geology 16, 531–564. doi: 10.1007/BF01029316

2

Egozcue, J.J., Pawlowsky-Glahn, V., Mateu-Figueras, G., Barceló-Vidal, C., 2003. Isometric Logratio Transformations for Compositional Data Analysis. Mathematical Geology 35, 279–300. doi: 10.1023/A:1023818214614

Total running time of the script: (0 minutes 0.741 seconds)