API Documentation for utils

update_dict(target_dict, *source_dicts)

Updates the dictionary with the keys and values from other dictionaries. Only new keys (not already present in the target) are added.

Parameters:

target_dict (dict) – The dictionary that needs to be updated with new keys and values.
source_dicts (dict) – One or multiple dictionaries that are used to update the target dictionary with new keys and values.

Returns:

None

Return type:

None

combine_subdict_values(data)

Combines the values from the individual sub-dictionaries into a single list.

Parameters:: data (dict) – Dictionary with values that are sub-dictionaries.
Returns:: A dictionary with a single key named ‘all’ that contains a list of all combined values from all the sub-dictionaries.
Return type:: dict

update_values(df, new, unique_data, row_name)

Update the values in the input DataFrame based upon the frame values and an reference DataFrame.

Parameters:

df (pd.DataFrame) – Input DataFrame that will be updated.
new (pd.DataFrame) – The reference DataFrame containing values that are used to update the input DataFrame.
unique_data (dict) – A dictionary containing keys that represent the specific unique column names that need to be updated in the input DataFrame.
row_name (str) – The name of the column in the DataFrame used to index into new DataFrame.

Returns:

None. This function updates a dataframe and does not return anything.

Return type:

None

remove_duplicate_values(data)

Remove the duplicate values from sub-dictionaries within the input dictionary.

Parameters:: data (dict) – The input dictionary containing sub-dictionaries with possible duplicate values.
Returns:: A dictionary without duplicate values.
Return type:: dict

read_pdb_as_dataframe(pdb_file)

Helper function reading a PDB

Parameters:: pdb_file (str) – Path to the PDB file.
Returns:: DataFrame containing PDB data of the x, y, z coordinates of atoms.
Return type:: pd.DataFrame

Note

This function extracts only lines starting with ‘ATOM’ and parses the x, y, z coordinates based on selected fields in the PDB format. Assumes coordinates are located at columns 31–54.

filter_and_parse_pdb(protein_pdb)

This function reads in a PDB and returns the structure with bioparser.

Parameters:: protein_pdb (str) – Path to a protein PDB file.
Returns:: Parsed PDB structure object containing protein atoms.
Return type:: Bio.PDB.Structure.Structure

Note

The function: - Includes only lines starting with ‘ATOM’. - Excludes water molecules (residue names ‘HOH’, ‘WAT’) and terminal phosphates (‘T4P’, ‘T3P’). - Skips lines with non-numeric residue sequence identifiers.