pyrolite.util.pd

pyrolite.util.pd.drop_where_all_empty(df)[source]

Drop rows and columns which are completely empty.

Parameters

df (pandas.DataFrame | pandas.Series) – Pandas object to ensure is in the form of a series.

pyrolite.util.pd.read_table(filepath, index_col=0, **kwargs)[source]

Read tabluar data from an excel or csv text-based file.

Parameters

filepath (str | pathlib.Path) – Path to file.

Return type

pandas.DataFrame

pyrolite.util.pd.column_ordered_append(df1, df2, **kwargs)[source]

Appends one dataframe to another, preserving the column order of the first and adding new columns on the right. Also accepts and passes on standard keyword arguments for pd.DataFrame.append.

Parameters
  • df1 (pandas.DataFrame) – The dataframe for which columns order is preserved in the output.

  • df2 (pandas.DataFrame) – The dataframe for which new columns are appended to the output.

Return type

pandas.DataFrame

pyrolite.util.pd.accumulate(dfs, ignore_index=False, trace_source=False, names=[])[source]

Accumulate an iterable containing multiple pandas.DataFrame to a single frame.

Parameters
  • dfs (list) – Sequence of dataframes.

  • ignore_index (bool) – Whether to ignore the indexes upon joining.

  • trace_source (bool) – Whether to retain a reference to the source of the data rows.

  • names (list) – Names to use in place of indexes for source names.

Returns

Accumulated dataframe.

Return type

pandas.DataFrame

pyrolite.util.pd.to_frame(ser)[source]

Simple utility for converting to pandas.DataFrame.

Parameters

ser (pandas.Series | pandas.DataFrame) – Pandas object to ensure is in the form of a dataframe.

Return type

pandas.DataFrame

pyrolite.util.pd.to_ser(df)[source]

Simple utility for converting single column pandas.DataFrame to pandas.Series.

Parameters

df (pandas.DataFrame | pandas.Series) – Pandas object to ensure is in the form of a series.

Return type

pandas.Series

pyrolite.util.pd.to_numeric(df, errors: str = 'coerce', exclude=['float', 'int'])[source]

Converts non-numeric columns to numeric type where possible.

Notes

Avoid using .loc or .iloc on the LHS to make sure that data dtypes are propagated.

pyrolite.util.pd.zero_to_nan(df, rtol=1e-05, atol=1e-08)[source]

Replace floats close, less or equal to zero with np.nan in a dataframe.

Parameters
  • df (pandas.DataFrame) – DataFrame to censor.

  • rtol (float) – The relative tolerance parameter.

  • atol (float) – The absolute tolerance parameter.

Returns

Censored DataFrame.

Return type

pandas.DataFrame

pyrolite.util.pd.outliers(df, cols=[], detect=<function <lambda>>, quantile_select=(0.02, 0.98), logquantile=False, exclude=False)[source]
pyrolite.util.pd.concat_columns(df, columns=None, astype=<class 'str'>, **kwargs)[source]

Concatenate strings across columns.

Parameters
  • df (pandas.DataFrame) – Dataframe to concatenate.

  • columns (list) – List of columns to concatenate.

  • astype (type) – Type to convert final concatenation to.

Return type

pandas.Series

pyrolite.util.pd.uniques_from_concat(df, columns=None, hashit=True)[source]

Creates ideally unique keys from multiple columns. Optionally hashes string to standardise length of identifier.

Parameters
  • df (pandas.DataFrame) – DataFrame to create indexes for.

  • columns (list) – Columns to use in the string concatenatation.

  • hashit (bool, True) – Whether to use a hashing algorithm to create the key from a typically longer string.

Return type

pandas.Series

pyrolite.util.pd.df_from_csvs(csvs, dropna=True, ignore_index=False, **kwargs)[source]

Takes a list of .csv filenames and converts to a single DataFrame. Combines columns across dataframes, preserving order of the first entered.

E.g. SiO2, Al2O3, MgO, MnO, CaO SiO2, MgO, FeO, CaO SiO2, Na2O, Al2O3, FeO, CaO => SiO2, Na2O, Al2O3, MgO, FeO, MnO, CaO - Existing neighbours take priority (i.e. FeO won’t be inserted bf Al2O3) - Earlier inputs take priority (where ordering is ambiguous, place the earlier first)

Todo

Attempt to preserve column ordering across column sets, assuming they are generally in the same order but preserving only some of the information.