API Documentation for utils

update_dict(target_dict, *source_dicts)

Updates the dictionary with the keys and values from other dictionaries. Only new keys (not already present in the target) are added.

Parameters:
  • target_dict (dict) – The dictionary that needs to be updated with new keys and values.

  • source_dicts (dict) – One or multiple dictionaries that are used to update the target dictionary with new keys and values.

Returns:

None

Return type:

None

combine_subdict_values(data)

Combines the values from the individual sub-dictionaries into a single list.

Parameters:

data (dict) – Dictionary with values that are sub-dictionaries.

Returns:

A dictionary with a single key named ‘all’ that contains a list of all combined values from all the sub-dictionaries.

Return type:

dict

update_values(df, new, unique_data, row_name)

Update the values in the input DataFrame based upon the frame values and an reference DataFrame.

Parameters:
  • df (pd.DataFrame) – Input DataFrame that will be updated.

  • new (pd.DataFrame) – The reference DataFrame containing values that are used to update the input DataFrame.

  • unique_data (dict) – A dictionary containing keys that represent the specific unique column names that need to be updated in the input DataFrame.

  • row_name (str) – The name of the column in the DataFrame used to index into new DataFrame.

Returns:

None. This function updates a dataframe and does not return anything.

Return type:

None

remove_duplicate_values(data)

Remove the duplicate values from sub-dictionaries within the input dictionary.

Parameters:

data (dict) – The input dictionary containing sub-dictionaries with possible duplicate values.

Returns:

A dictionary without duplicate values.

Return type:

dict

read_pdb_as_dataframe(pdb_file)

Helper function reading a PDB

Parameters:

pdb_file (str) – Path to the PDB file.

Returns:

DataFrame containing PDB data of the x, y, z coordinates of atoms.

Return type:

pd.DataFrame

Note

This function extracts only lines starting with ‘ATOM’ and parses the x, y, z coordinates based on selected fields in the PDB format. Assumes coordinates are located at columns 31–54.

filter_and_parse_pdb(protein_pdb)

This function reads in a PDB and returns the structure with bioparser.

Parameters:

protein_pdb (str) – Path to a protein PDB file.

Returns:

Parsed PDB structure object containing protein atoms.

Return type:

Bio.PDB.Structure.Structure

Note

The function: - Includes only lines starting with ‘ATOM’. - Excludes water molecules (residue names ‘HOH’, ‘WAT’) and terminal phosphates (‘T4P’, ‘T3P’). - Skips lines with non-numeric residue sequence identifiers.