pyrolite.util.text
- pyrolite.util.text.to_width(multiline_string, width=79, **kwargs)[source]
Uses builtin textwapr for text wrapping to a specific width.
- pyrolite.util.text.normalise_whitespace(strg)[source]
Substitutes extra tabs, newlines etc. for a single space.
- pyrolite.util.text.remove_prefix(z, prefix)[source]
Remove a specific prefix from the start of a string.
- pyrolite.util.text.remove_suffix(x, suffix=' ')[source]
Remove a specific suffix from the end of a string.
- pyrolite.util.text.titlecase(s, exceptions=['and', 'in', 'a'], abbrv=['ID', 'IGSN', 'CIA', 'CIW', 'PIA', 'SAR', 'SiTiIndex', 'WIP'], capitalize_first=True, split_on='[\\.\\s_-]+', delim='')[source]
Formats strings in CamelCase, with exceptions for simple articles and omitted abbreviations which retain their capitalization.
Todo
Option for retaining original CamelCase.
- pyrolite.util.text.string_variations(names, preprocess=['lower', 'strip'], swaps=[(' ', '_'), (' ', '_'), ('-', ' '), ('_', ' '), ('-', ''), ('_', '')])[source]
Returns equilvaent string variations based on an input set of strings.
- Parameters
names ({list, str}) – String or list of strings to generate name variations of.
preprocess (list) – List of preprocessing string functions to apply before generating variations.
swaps (list) – List of tuples for str.replace(out, in).
- Returns
Set (or SortedSet, if sortedcontainers installed) of unique string variations.
- Return type
- pyrolite.util.text.parse_entry(entry, regex='(\\s)*?(?P<value>[\\.\\w]+)(\\s)*?', delimiter=',', values_only=True, first_only=True, errors=None, replace_nan='None')[source]
Parses an arbitrary string data entry to return values based on a regular expression containing named fields including ‘value’ (and any others). If the entry is of non-string type, this will return the value (e.g. int, float, NaN, None).
- Parameters
entry (
str
) – String entry which to search for the regex pattern.regex (
str
) – Regular expression to compile and use to search the entry for a value.delimiter (
str
, :','
) – Optional delimiter to split the string in case of multiple inclusion.values_only (
bool
,True
) – Option to return only values (single or list), or to instead return the dictionary corresponding to the matches.first_only (
bool
,True
) – Option to return only the first match, or else all matcheserrors – Error value to denote ‘no match’. Not yet implemented.
- pyrolite.util.text.split_records(data, delimiter='\\r\\n')[source]
Splits records in a csv where quotation marks are used. Splits on a delimiter followed by an even number of quotation marks.