pyrolite.util.synthetic

Utility functions for creating synthetic (geochemical) data.

pyrolite.util.synthetic.random_cov_matrix(dim, sigmas=None, validate=False, seed=None)[source]

Generate a random covariance matrix which is symmetric positive-semidefinite.

Parameters
  • dim (int) – Dimensionality of the covariance matrix.

  • sigmas (numpy.ndarray) – Optionally specified sigmas for the variables.

  • validate (bool) – Whether to validate output.

Returns

Covariance matrix of shape (dim, dim).

Return type

numpy.ndarray

Todo

  • Implement a characteristic scale for the covariance matrix.

pyrolite.util.synthetic.random_composition(size=1000, D=4, mean=None, cov=None, propnan=0.1, missing_columns=None, missing=None, seed=None)[source]

Generate a simulated random unimodal compositional dataset, optionally with missing data.

Parameters
  • size (int) – Size of the dataset.

  • D (int) – Dimensionality of the dataset.

  • mean (numpy.ndarray, None) – Optional specification of mean composition.

  • cov (numpy.ndarray, None) – Optional specification of covariance matrix (in log space).

  • propnan (float, [0, 1)) – Proportion of missing values in the output dataset.

  • missing_columns (int | tuple) – Specification of columns to be missing. If an integer is specified, interpreted to be the number of columns containin missing data (at a proportion defined by propnan). If a tuple or list, the specific columns to contain missing data.

  • missing (str, None) – Missingness pattern. If not None, one of "MCAR", "MAR", "MNAR".

    • If missing = "MCAR", data will be missing at random.

    • If missing = "MAR", data will be missing with some relationship to other parameters.

    • If missing = "MNAR", data will be thresholded at some lower bound.

  • seed (int, None) – Random seed to use, optionally specified.

Returns

Simulated dataset with missing values.

Return type

numpy.ndarray

Todo

  • Add feature to translate rough covariance in D to logcovariance in D-1

  • Update the :code:`missing = “MAR”` example to be more realistic/variable.

pyrolite.util.synthetic.normal_frame(columns=['SiO2', 'CaO', 'MgO', 'FeO', 'TiO2'], size=10, mean=None, **kwargs)[source]

Creates a pandas.DataFrame with samples from a single multivariate-normal distributed composition.

Parameters
  • columns (list) – List of columns to use for the dataframe. These won’t have any direct impact on the data returned, and are only for labelling.

  • size (int) – Index length for the dataframe.

  • mean (numpy.ndarray, None) – Optional specification of mean composition.

Return type

pandas.DataFrame

pyrolite.util.synthetic.normal_series(index=['SiO2', 'CaO', 'MgO', 'FeO', 'TiO2'], mean=None, **kwargs)[source]

Creates a pandas.Series with a single sample from a single multivariate-normal distributed composition.

Parameters
  • index (list) – List of indexes for the series. These won’t have any direct impact on the data returned, and are only for labelling.

  • mean (numpy.ndarray, None) – Optional specification of mean composition.

Return type

pandas.Series

pyrolite.util.synthetic.example_spider_data(start='EMORB_SM89', norm_to='PM_PON', size=120, noise_level=0.5, offsets=None, units='ppm')[source]

Generate some random data for demonstrating spider plots.

By default, this generates a composition based around EMORB, normalised to Primitive Mantle.

Parameters
  • start (str) – Composition to start with.

  • norm_to (str) – Composition to normalise to. Can optionally specify None.

  • size (int) – Number of observations to include (index length).

  • noise_level (float) – Log-units of noise (1sigma).

  • offsets (dict) – Dictionary of offsets in log-units (in log units).

  • units (str) – Units to use before conversion. Should have no effect other than reducing calculation times if norm_to is None.

Returns

df – Dataframe of example synthetic data.

Return type

pandas.DataFrame

pyrolite.util.synthetic.example_patterns_from_parameters(fit_parameters, radii=None, n=100, proportional_noise=0.15, includes_tetrads=False, columns=None)[source]