pyrolite.util.synthetic

Utility functions for creating synthetic (geochemical) data.

pyrolite.util.synthetic.random_cov_matrix(dim, sigmas=None, validate=False, seed=None)[source]

Generate a random covariance matrix which is symmetric positive-semidefinite.

Parameters

dim (int) – Dimensionality of the covariance matrix.

sigmas (numpy.ndarray) – Optionally specified sigmas for the variables.

validate (bool) – Whether to validate output.

Returns

Covariance matrix of shape (dim, dim).

Return type

numpy.ndarray

Todo

Implement a characteristic scale for the covariance matrix.

pyrolite.util.synthetic.random_composition(size=1000, D=4, mean=None, cov=None, propnan=0.1, missing_columns=None, missing=None, seed=None)[source]

Generate a simulated random unimodal compositional dataset, optionally with missing data.

Parameters

size (int) – Size of the dataset.

D (int) – Dimensionality of the dataset.

mean (numpy.ndarray, None) – Optional specification of mean composition.

cov (numpy.ndarray, None) – Optional specification of covariance matrix (in log space).

propnan (float, [0, 1)) – Proportion of missing values in the output dataset.

missing_columns (int | tuple) – Specification of columns to be missing. If an integer is specified, interpreted to be the number of columns containin missing data (at a proportion defined by propnan). If a tuple or list, the specific columns to contain missing data.

missing (str, None) – Missingness pattern. If not None, one of "MCAR", "MAR", "MNAR".

If missing = "MCAR", data will be missing at random.

If missing = "MAR", data will be missing with some relationship to other parameters.

If missing = "MNAR", data will be thresholded at some lower bound.

seed (int, None) – Random seed to use, optionally specified.

Returns

Simulated dataset with missing values.

Return type

numpy.ndarray

Todo

Add feature to translate rough covariance in D to logcovariance in D-1

Update the :code:`missing = “MAR”` example to be more realistic/variable.

pyrolite.util.synthetic.normal_frame(columns=['SiO2', 'CaO', 'MgO', 'FeO', 'TiO2'], size=10, mean=None, **kwargs)[source]

Creates a pandas.DataFrame with samples from a single multivariate-normal distributed composition.

Parameters

columns (list) – List of columns to use for the dataframe. These won’t have any direct impact on the data returned, and are only for labelling.

size (int) – Index length for the dataframe.

mean (numpy.ndarray, None) – Optional specification of mean composition.

Return type

pandas.DataFrame

pyrolite.util.synthetic.normal_series(index=['SiO2', 'CaO', 'MgO', 'FeO', 'TiO2'], mean=None, **kwargs)[source]

Creates a pandas.Series with a single sample from a single multivariate-normal distributed composition.

Parameters

index (list) – List of indexes for the series. These won’t have any direct impact on the data returned, and are only for labelling.

mean (numpy.ndarray, None) – Optional specification of mean composition.

Return type

pandas.Series

pyrolite.util.synthetic.example_spider_data(start='EMORB_SM89', norm_to='PM_PON', size=120, noise_level=0.5, offsets=None, units='ppm')[source]

Generate some random data for demonstrating spider plots.

By default, this generates a composition based around EMORB, normalised to Primitive Mantle.

Parameters

start (str) – Composition to start with.

norm_to (str) – Composition to normalise to. Can optionally specify None.

size (int) – Number of observations to include (index length).

noise_level (float) – Log-units of noise (1sigma).

offsets (dict) – Dictionary of offsets in log-units (in log units).

units (str) – Units to use before conversion. Should have no effect other than reducing calculation times if norm_to is None.

Returns

df – Dataframe of example synthetic data.

Return type

pandas.DataFrame

pyrolite.util.synthetic.example_patterns_from_parameters(fit_parameters, radii=None, n=100, proportional_noise=0.15, includes_tetrads=False, columns=None)[source]