mahautils.multics.MahaMulticsConfigFile

class mahautils.multics.MahaMulticsConfigFile(path: str | Path | None = None, unit_converter: UnitConverter | None = None)

Bases: TextFile

A generic class for processing Maha Multics configuration files

This class is intended to represent a range of Maha Multics configuration files, and configures settings (such as the character used for comments) that are applicable to all Maha Multics configuration files, as well as providing general methods for processing such files. Subclasses should generally be created and customized to specific types of Maha Multics files.

Attributes

unit_converter

The unit converter used to convert the units of quantities stored in the file

Methods

__init__([path, unit_converter])

Creates an object to represent Maha Multics configuration files

extract_section_by_keyword(section_label, ...)

Extracts a section from the contents list of file lines

parse()

Parses the data in contents and stores it in class attributes

Inherited Attributes

comment_chars

A tuple of all characters considered to denote comments

contents

A reference to a list containing the (potentially modified) file content of each line of the file

hashes

A copy of the dictionary containing any file hashes previously computed for the file specified by the path attribute

line_ending

The character(s) used to denote the end of lines in the text file

path

Path describing the location of the file on the disk

raw_contents

A copy of the raw file content

trailing_newline

Whether the original file had a newline at the end of the file

Inherited Methods

clean_contents([remove_comments, ...])

Clean contents in-place

clear_file_hashes()

Clears any stored file hashes

compute_file_hashes([hash_functions, store])

Computes hashes of the file specified by the path attribute

has_changed()

Returns whether the file specified by the path attribute has changed since the last time file hashes were computed

overwrite([prologue, epilogue, line_ending])

Write data in contents to the file specified by path

read([path, parse])

Read file from disk

set_contents(contents, trailing_newline[, ...])

Add data to the contents list

set_read_metadata([path])

Configures metadata related to file to be read from disk

store_file_hashes([hash_functions])

Computes and stores hashes of the file specified by the path attribute

track_new_file(path[, hash_functions])

Shortcut for simultaneously modifying the path attribute and storing file hashes

update_contents()

Updates the contents list based on object attributes

write(output_file[, write_mode, ...])

Write file to disk

__init__(path: str | Path | None = None, unit_converter: UnitConverter | None = None) None

Creates an object to represent Maha Multics configuration files

Creates an instance of the MahaMulticsConfigFile class, including configuring file comments to be represented by the # character.

Parameters:
  • path (str or pathlib.Path, optional) – Location of the text file in the file system (default is None)

  • unit_converter (pyxx.units.UnitConverter, optional) – A pyxx.units.UnitConverter instance which will be used to convert units of quantities stored in the configuration file (default is None, which uses the MahaMulticsUnitConverter unit converter to perform unit conversions)

property unit_converter: UnitConverter

The unit converter used to convert the units of quantities stored in the file

clean_contents(remove_comments: bool = False, skip_full_line_comments: bool = False, strip: bool = False, concat_lines: bool = False, remove_blank_lines: bool = False) None

Clean contents in-place

Cleans contents (removing comments, blank lines, etc.) based on user-defined rules. Modifications are made in-place (i.e., the resulting content is stored in contents).

Parameters:
  • remove_comments (bool, optional) – Whether to remove comments from file (default is True)

  • skip_full_line_comments (bool, optional) – Whether to skip removing comments where the comment is the only text on a line. Only applies if remove_comments is True (default is False)

  • strip (bool, optional) – Whether to strip leading and trailing whitespace from each line (default is True)

  • concat_lines (bool, optional) – Whether to concatenate lines ending with a backslash with the following line (default is True)

  • remove_blank_lines (bool, optional) – Whether to remove lines that contain no content after other cleaning operations have completed (default is True)

clear_file_hashes() None

Clears any stored file hashes

property comment_chars: Tuple[str, ...] | None

A tuple of all characters considered to denote comments

compute_file_hashes(hash_functions: tuple | str = ('md5', 'sha256'), store: bool = False) Dict[str, str]

Computes hashes of the file specified by the path attribute

Computes and returns the hashes of the file specified by the path attribute, with the option to populate the hashes dictionary with their values.

Parameters:
  • hash_functions (tuple or str, optional) – Tuple of strings (or individual string) specifying which hash(es) to compute. Any hash functions supported by hashlib can be used. Default is ('md5', 'sha256')

  • store (bool, optional) – Whether to store the computed hashes in the hashes dictionary (default is False)

Returns:

A dictionary containing the file hashes specified by hash_functions

Return type:

dict

See also

pyxx.files.compute_file_hash

Function used to compute file hashes

Notes

Prior to calling this method, the path attribute must be defined. To simultaneously set the path attribute and store file hashes, use track_new_file().

property contents: List[str]

A reference to a list containing the (potentially modified) file content of each line of the file

Warning

This attribute returns the list by reference. This means that if you set a variable equal to this reference, then editing this variable will edit the contents attribute (e.g., if you set my_content = MyTextFile.contents, then editing my_content will change the content stored in MyTextFile).

Notes

If trying to set the contents attribute, do not try to set this attribute directly (i.e., don’t use code similar to MyTextFile.contents = ['line1', 'line2', 'line3']). Instead, use the set_contents() method, as it offers greater control over whether the contents are passed by reference or value.

extract_section_by_keyword(section_label: str, begin_regex: str, end_regex: str, section_line_regex: str = '(.*)', max_sections: int | None = None, begin_idx: int = 0, allow_comment_lines: bool = True) Tuple[List[Match], List[Tuple[str, ...]], int, int]

Extracts a section from the contents list of file lines

Many Maha Multics configuration files contain sections with certain types of data, where the section begins following a formatted section marker and ends at another marker (both with unique, identifiable regex patterns). This method extracts the data from such a section. If multiple sections are found, the data in all sections is merged, unless specified otherwise by setting max_sections.

Parameters:
  • section_label (str) – A descriptive name identifying the section. This is not used in parsing the file; it is only used to customize error messages and make them more descriptive

  • begin_regex (str) – The regex pattern which marks the beginning of the section

  • end_regex (str) – The regex pattern which marks the end of the section

  • section_line_regex (str, optional) – If provided, this regex pattern must be matched by all lines inside the section (default is '(.*)', which matches any text)

  • max_sections (int, optional) – The maximum number of sections to extract; that is, only the first max_sections encountered will be extracted and returned (default is None, which extracts all sections)

  • begin_idx (int, optional) – The index (in the contents list) at which to begin to search for and extract data from sections (default is 0)

  • allow_comment_lines (bool, optional) – If True, any lines within the section that do not match section_line_regex but begin with any of the characters in comment_chars will be outputted (part of the second output of the method); if False, any lines within the section that do not match section_line_regex will result in an error being thrown (default is True)

Returns:

  • list – A list of re.Match objects containing the matches for the regex pattern section_line_regex for all lines in the section(s)

  • list – A list (of the same length as the first argument returned) of tuples of strings. For each re.Match object, the corresponding item in this list contains a tuple with any full-line comments preceding the matched line

  • int – The index of contents of the next line immediately following the line on which end_regex was found

  • int – The number of sections that were extracted from the contents list

has_changed() bool

Returns whether the file specified by the path attribute has changed since the last time file hashes were computed

Returns:

Whether file has changed since the last time file hashes were computed

Return type:

bool

property hashes: Dict[str, str]

A copy of the dictionary containing any file hashes previously computed for the file specified by the path attribute

property line_ending: str | Tuple[str, ...]

The character(s) used to denote the end of lines in the text file

This property only applies to files that were read using the read() method. After reading a file, this property stores the line ending(s) used in the file. Lines in text files can be terminated with '\n' (LF), '\r\n' (CRLF), '\r', or a combination of these characters (potentially with different line endings on different lines).

After reading a file, this property stores either a string containing the line endings on every line of the file, or a tuple containing all line endings encountered throughout the file.

overwrite(prologue: str = '', epilogue: str | None = None, line_ending: str = '\n') None

Write data in contents to the file specified by path

Writes the lines of content in the contents attribute to the (previously-defined) file specified by the path attribute, suppressing warnings before overwriting the file. This is useful for cases when the file contents are manually populated and it is desired to “dump” them to a file. This method is also useful if a file’s contents need to be updated periodically based on the results of another process.

Parameters:
  • prologue (str, optional) – Content written at beginning of file (default is '')

  • epilogue (str, optional) – Content written at end of file (default is to use the value of the line_ending argument if trailing_newline is True and '' otherwise)

  • line_ending (str, optional) – String written at the end of each line when writing file content (default is '\n')

property path: Path | None

Path describing the location of the file on the disk

Assigning a value to this attribute (regardless whether it matches the current value or is a different path) will save the value as a pathlib.Path and will automatically clear any saved file hashes.

property raw_contents: List[str] | None

A copy of the raw file content

If the file was read using the read() method, this attribute stores the original, unaltered contents of each line of the input file, and it returns a copy of this list of lines. If the file was not read with the read() method, this attribute stores a value of None.

read(path: str | Path | None = None, parse: bool = True) None

Read file from disk

Calling this method reads the file specified by the path attribute from the disk, populating contents and raw_contents. Additionally, the file hashes stored in the hashes attribute are updated (to make it easier to check if the file has been modified later).

Parameters:
  • path (str or pathlib.Path, optional) – Location of the text file in the file system (default is None)

  • parse (bool, optional) – Whether to call the parse() method after reading the file (default is True)

set_contents(contents: List[str], trailing_newline: bool, pass_by_reference: bool = False) None

Add data to the contents list

Allows users to manually fill the contents list with user-defined content. The input list must be a list of strings, and the user can optionally choose whether to pass the input by reference or value.

Parameters:
  • contents (list) – List of strings which are to be assigned to the contents list

  • trailing_newline (bool) – Whether the contents being added represent a file with a trailing newline (because the file wasn’t read, the object has no way to determine whether the file has a trailing newline, so users must provide this information)

  • pass_by_reference (bool, optional) – Whether to pass the contents argument by reference (default is False)

Notes

If passing contents by reference, this means that if subsequent changes are made to the original contents object, they will be reflected in the contents attribute. If passing by value, then a copy of the contents argument will be made, so changing the object outside the class instance will not affect the contents attribute.

set_read_metadata(path: str | Path | None = None) None

Configures metadata related to file to be read from disk

This method performs several pre-processing steps to prepare to read a file from the disk:

  1. Sets the path attribute. If the path argument was provided, the attribute is set to this value; otherwise, the existing value stored in the path attribute is used (or an error is thrown if not defined).

  2. Verifies that the file specified by the path attribute exists.

  3. Stores the hashes for the file.

It is advised that this method be called prior to reading any file.

Parameters:

path (str or pathlib.Path, optional) – Location of the file in the file system (default is None)

Raises:
  • AttributeError – If the both the path argument and the existing path attribute are None

  • FileNotFoundError – If the file specified by path (after completing Step 1 above) does not exist

store_file_hashes(hash_functions: tuple | str = ('md5', 'sha256')) None

Computes and stores hashes of the file specified by the path attribute

Computes given hashes of the file specified by the path attribute and populates the hashes dictionary with their values.

Parameters:

hash_functions (tuple or str, optional) – Tuple of strings (or individual string) specifying which hash(es) to compute. Any hash functions supported by hashlib can be used. Default is ('md5', 'sha256')

See also

pyxx.files.compute_file_hash

Function used to compute file hashes

track_new_file

Use this method if you want to store file hashes but the path attribute isn’t yet defined

Notes

Prior to calling this method, the path attribute must be defined. To simultaneously set the path attribute and store file hashes, use track_new_file().

track_new_file(path: str | Path, hash_functions: tuple | str = ('md5', 'sha256')) None

Shortcut for simultaneously modifying the path attribute and storing file hashes

This method functions as a “shortcut,” both modifying the path attribute and storing an optionally user-specified list of file hashes in the hashes attribute. The intention of this method is that if a File instance is tracking a given file, and user wants to switch to tracking another file, this provides a convenient way to do so with a single line of code.

Parameters:
  • file (str or pathlib.Path) – File that the object is to represent

  • hash_functions (tuple or str, optional) – Tuple of strings (or individual string) specifying which hash(es) to compute. Any hash functions supported by hashlib can be used. Default is ('md5', 'sha256')

See also

pyxx.files.compute_file_hash

Function used to compute file hashes

property trailing_newline: bool

Whether the original file had a newline at the end of the file

update_contents() None

Updates the contents list based on object attributes

This method by default does nothing. However, it is intended that subclasses of TextFile should override this method and define file-specific behavior in this method for converting custom object attributes to lines of text in the file, and storing these data in contents.

For example, if defining a CSV-parser, the class might have an attribute that stores numerical data in a NumPy array, and the update_contents() method might convert the data in this array to comma-separated strings and store them in contents.

write(output_file: str | Path, write_mode: str = 'w', warn_before_overwrite: bool = True, prologue: str = '', epilogue: str | None = None, line_ending: str = '\n', update_contents: bool = True) None

Write file to disk

Calling this method writes the file contents stored in contents to the disk.

Parameters:
  • output_file (str or pathlib.Path) – Output file to which to write content

  • write_mode (str, optional) – Any mode (such as 'w' or 'a') for the built-in open() function for writing files (default is 'w')

  • warn_before_overwrite (bool, optional) – Whether to throw an error if output_file already exists (default is True)

  • prologue (str, optional) – Content written at beginning of file (default is '')

  • epilogue (str, optional) – Content written at end of file (default is to use the value of the line_ending argument if trailing_newline is True and '' otherwise)

  • line_ending (str, optional) – String written at the end of each line when writing file content (default is '\n')

  • update_contents (bool, optional) – Whether to call the update_contents() method before writing the file (default is True)

parse() None

Parses the data in contents and stores it in class attributes

For Maha Multics configuration files, the parse() method verifies that the file has been read prior to attempting to parse it.