API Documentation for utils
- update_dict(target_dict, *source_dicts)
Updates the dictionary with the keys and values from other dictionaries. Only new keys (not already present in the target) are added.
- Parameters:
target_dict (dict) – The dictionary that needs to be updated with new keys and values.
source_dicts (dict) – One or multiple dictionaries that are used to update the target dictionary with new keys and values.
- Returns:
None
- Return type:
None
- combine_subdict_values(data)
Combines the values from the individual sub-dictionaries into a single list.
- Parameters:
data (dict) – Dictionary with values that are sub-dictionaries.
- Returns:
A dictionary with a single key named ‘all’ that contains a list of all combined values from all the sub-dictionaries.
- Return type:
dict
- update_values(df, new, unique_data, row_name)
Update the values in the input DataFrame based upon the frame values and an reference DataFrame.
- Parameters:
df (pd.DataFrame) – Input DataFrame that will be updated.
new (pd.DataFrame) – The reference DataFrame containing values that are used to update the input DataFrame.
unique_data (dict) – A dictionary containing keys that represent the specific unique column names that need to be updated in the input DataFrame.
row_name (str) – The name of the column in the DataFrame used to index into new DataFrame.
- Returns:
None. This function updates a dataframe and does not return anything.
- Return type:
None
- remove_duplicate_values(data)
Remove the duplicate values from sub-dictionaries within the input dictionary.
- Parameters:
data (dict) – The input dictionary containing sub-dictionaries with possible duplicate values.
- Returns:
A dictionary without duplicate values.
- Return type:
dict
- read_pdb_as_dataframe(pdb_file)
Helper function reading a PDB
- Parameters:
pdb_file (str) – Path to the PDB file.
- Returns:
DataFrame containing PDB data of the x, y, z coordinates of atoms.
- Return type:
pd.DataFrame
Note
This function extracts only lines starting with ‘ATOM’ and parses the x, y, z coordinates based on selected fields in the PDB format. Assumes coordinates are located at columns 31–54.
- filter_and_parse_pdb(protein_pdb)
This function reads in a PDB and returns the structure with bioparser.
- Parameters:
protein_pdb (str) – Path to a protein PDB file.
- Returns:
Parsed PDB structure object containing protein atoms.
- Return type:
Bio.PDB.Structure.Structure
Note
The function: - Includes only lines starting with ‘ATOM’. - Excludes water molecules (residue names ‘HOH’, ‘WAT’) and terminal phosphates (‘T4P’, ‘T3P’). - Skips lines with non-numeric residue sequence identifiers.