pyrolite.util.pd
- pyrolite.util.pd.drop_where_all_empty(df)[source]
Drop rows and columns which are completely empty.
- Parameters
df (
pandas.DataFrame
|pandas.Series
) – Pandas object to ensure is in the form of a series.
- pyrolite.util.pd.read_table(filepath, index_col=0, **kwargs)[source]
Read tabluar data from an excel or csv text-based file.
- Parameters
filepath (
str
|pathlib.Path
) – Path to file.- Return type
- pyrolite.util.pd.column_ordered_append(df1, df2, **kwargs)[source]
Appends one dataframe to another, preserving the column order of the first and adding new columns on the right. Also accepts and passes on standard keyword arguments for pd.DataFrame.append.
- Parameters
df1 (
pandas.DataFrame
) – The dataframe for which columns order is preserved in the output.df2 (
pandas.DataFrame
) – The dataframe for which new columns are appended to the output.- Return type
- pyrolite.util.pd.accumulate(dfs, ignore_index=False, trace_source=False, names=[])[source]
Accumulate an iterable containing multiple
pandas.DataFrame
to a single frame.
- Parameters
- Returns
Accumulated dataframe.
- Return type
- pyrolite.util.pd.to_frame(ser)[source]
Simple utility for converting to
pandas.DataFrame
.
- Parameters
ser (
pandas.Series
|pandas.DataFrame
) – Pandas object to ensure is in the form of a dataframe.- Return type
- pyrolite.util.pd.to_ser(df)[source]
Simple utility for converting single column
pandas.DataFrame
topandas.Series
.
- Parameters
df (
pandas.DataFrame
|pandas.Series
) – Pandas object to ensure is in the form of a series.- Return type
- pyrolite.util.pd.to_numeric(df, errors: str = 'coerce', exclude=['float', 'int'])[source]
Converts non-numeric columns to numeric type where possible.
Notes
Avoid using .loc or .iloc on the LHS to make sure that data dtypes are propagated.
- pyrolite.util.pd.zero_to_nan(df, rtol=1e-05, atol=1e-08)[source]
Replace floats close, less or equal to zero with np.nan in a dataframe.
- Parameters
df (
pandas.DataFrame
) – DataFrame to censor.rtol (
float
) – The relative tolerance parameter.atol (
float
) – The absolute tolerance parameter.- Returns
Censored DataFrame.
- Return type
- pyrolite.util.pd.outliers(df, cols=[], detect=<function <lambda>>, quantile_select=(0.02, 0.98), logquantile=False, exclude=False)[source]
- pyrolite.util.pd.concat_columns(df, columns=None, astype=<class 'str'>, **kwargs)[source]
Concatenate strings across columns.
- Parameters
df (
pandas.DataFrame
) – Dataframe to concatenate.columns (
list
) – List of columns to concatenate.astype (
type
) – Type to convert final concatenation to.- Return type
- pyrolite.util.pd.uniques_from_concat(df, columns=None, hashit=True)[source]
Creates ideally unique keys from multiple columns. Optionally hashes string to standardise length of identifier.
- Parameters
df (
pandas.DataFrame
) – DataFrame to create indexes for.columns (
list
) – Columns to use in the string concatenatation.hashit (
bool
,True
) – Whether to use a hashing algorithm to create the key from a typically longer string.- Return type
- pyrolite.util.pd.df_from_csvs(csvs, dropna=True, ignore_index=False, **kwargs)[source]
Takes a list of .csv filenames and converts to a single DataFrame. Combines columns across dataframes, preserving order of the first entered.
E.g. SiO2, Al2O3, MgO, MnO, CaO SiO2, MgO, FeO, CaO SiO2, Na2O, Al2O3, FeO, CaO => SiO2, Na2O, Al2O3, MgO, FeO, MnO, CaO - Existing neighbours take priority (i.e. FeO won’t be inserted bf Al2O3) - Earlier inputs take priority (where ordering is ambiguous, place the earlier first)
Todo
Attempt to preserve column ordering across column sets, assuming they are generally in the same order but preserving only some of the information.