pyrolite.util.text

pyrolite.util.text.to_width(multiline_string, width=79, **kwargs)[source]

Uses builtin textwapr for text wrapping to a specific width.

pyrolite.util.text.normalise_whitespace(strg)[source]

Substitutes extra tabs, newlines etc. for a single space.

pyrolite.util.text.remove_prefix(z, prefix)[source]

Remove a specific prefix from the start of a string.

pyrolite.util.text.remove_suffix(x, suffix=' ')[source]

Remove a specific suffix from the end of a string.

pyrolite.util.text.quoted_string(s)[source]
pyrolite.util.text.titlecase(s, exceptions=['and', 'in', 'a'], abbrv=['ID', 'IGSN', 'CIA', 'CIW', 'PIA', 'SAR', 'SiTiIndex', 'WIP'], capitalize_first=True, split_on='[\\.\\s_-]+', delim='')[source]

Formats strings in CamelCase, with exceptions for simple articles and omitted abbreviations which retain their capitalization.

Todo

  • Option for retaining original CamelCase.

pyrolite.util.text.string_variations(names, preprocess=['lower', 'strip'], swaps=[(' ', '_'), (' ', '_'), ('-', ' '), ('_', ' '), ('-', ''), ('_', '')])[source]

Returns equilvaent string variations based on an input set of strings.

Parameters
  • names ({list, str}) – String or list of strings to generate name variations of.

  • preprocess (list) – List of preprocessing string functions to apply before generating variations.

  • swaps (list) – List of tuples for str.replace(out, in).

Returns

Set (or SortedSet, if sortedcontainers installed) of unique string variations.

Return type

set

pyrolite.util.text.parse_entry(entry, regex='(\\s)*?(?P<value>[\\.\\w]+)(\\s)*?', delimiter=',', values_only=True, first_only=True, errors=None, replace_nan='None')[source]

Parses an arbitrary string data entry to return values based on a regular expression containing named fields including ‘value’ (and any others). If the entry is of non-string type, this will return the value (e.g. int, float, NaN, None).

Parameters
  • entry (str) – String entry which to search for the regex pattern.

  • regex (str) – Regular expression to compile and use to search the entry for a value.

  • delimiter (str, :',') – Optional delimiter to split the string in case of multiple inclusion.

  • values_only (bool, True) – Option to return only values (single or list), or to instead return the dictionary corresponding to the matches.

  • first_only (bool, True) – Option to return only the first match, or else all matches

  • errors – Error value to denote ‘no match’. Not yet implemented.

pyrolite.util.text.split_records(data, delimiter='\\r\\n')[source]

Splits records in a csv where quotation marks are used. Splits on a delimiter followed by an even number of quotation marks.

pyrolite.util.text.slugify(value, delim='-')[source]

Normalizes a string, removes non-alpha characters, converts spaces to delimiters.

Parameters
  • value (str) – String to slugify.

  • delim (str) – Delimiter to replace whitespace with.

Return type

str

pyrolite.util.text.int_to_alpha(num)[source]

Encode an integer into alpha characters, useful for sequences of axes/figures.

Parameters

int (int) – Integer to encode.

Returns

Alpha-encoding of a small integer.

Return type

str