API Documentation for stablewaters
- class StableWaters(trajectory, topology, water_eps)
A pipeline for identifying and analyzing stable water molecules from molecular dynamics (MD) trajectories.
This class processes MD simulations to trace water molecules that exhibit limited movement over time, clusters them using DBSCAN and identifies representative water positions. Additionally, it analyzes potential interactions between stable water clusters and protein residues.
- Parameters:
trajectory (str) – Path to the trajectory file.
topology (str) – Path to the topology file.
water_eps (float) – Epsilon parameter for DBSCAN clustering, in Angstrom.
- Variables:
u (mda.Universe) – Universe object created from the topology and trajectory files.
water_eps (float) – Epsilon parameter for DBSCAN clustering, in Angstrom.
- stable_waters_pipeline(output_directory='./stableWaters')
Function to run the pipeline to extract stable water clusters, and their representatives from a PDB & DCD file.
- Parameters:
output_directory (str, optional) – Directory where output files will be saved. Default is “./stableWaters”.
- Returns:
None. This function does not return anything and saves the files.
- Return type:
None
- analyze_protein_and_water_interaction(protein_pdb_file, representative_waters_file, cluster_eps, output_directory='./stableWaters', distance_threshold=5.0)
Analyse the interaction of residues to water molecules using a threshold that can be specified when calling the function.
- Parameters:
protein_pdb_file (str) – Path to the protein PDB file without waters.
representative_waters_file (str) – Path to the representative waters PDB file, or any PDB file containing only waters.
cluster_eps (float) – DBSCAN clustering epsilon parameter.
output_directory (str, optional) – Directory where output files will be saved. Default is “./stableWaters”.
distance_threshold (float, optional) – Threshold distance for identifying interacting residues. Default is 5.0 (Angstrom).
- Returns:
None. This function does not return anything and saves the data in a Dataframe.
- Return type:
None
- _trace_waters(output_directory)
Trace the water molecules in a trajectory and write all which move below one Angstrom distance. To adjust the distance alter the integer.
- Parameters:
output_directory (str) – Directory where output files will be saved.
- Returns:
stable_waters (pd.DataFrame): DataFrame containing stable water coordinates.
total_frames (int): Total number of frames.
- Return type:
Tuple[pd.DataFrame, int]
- _perform_clustering_and_writing(stable_waters, cluster_eps, total_frames, output_directory)
Perform DBSCAN clustering on the stable water coordinates, and write the clusters and their representatives to PDB files.
- Parameters:
stable_waters (pd.DataFrame) – DataFrame containing stable water coordinates.
cluster_eps (float) – DBSCAN clustering epsilon parameter. This is in Angstrom in this case, and defines which Water distances should be within one cluster.
total_frames (int) – Total number of frames.
output_directory (str) – Directory where output files will be saved.
- Returns:
None. Writes out the clusters into their representatives PDB files.
- Return type:
None
- _write_pdb_clusters_and_representatives(clustered_waters, min_samples, output_sub_directory)
Writes the clusters and their representatives to PDB files.
- Parameters:
clustered_waters (pd.DataFrame) – DataFrame containing clustered water coordinates.
min_samples (int) – Minimum number of samples for DBSCAN clustering.
output_sub_directory (str) – Subdirectory where output PDB files will be saved.
- Returns:
None. Writes clusters out as PDB files.
- Return type:
None
- _find_interacting_residues(structure, representative_waters, distance_threshold)
This function maps waters (e.g. the representative waters) to interacting residues of a different PDB structure input. Use “filter_and_parse_pdb” to get the input for this function.
- Parameters:
Bio.PDB.Structure.Structure – Biopython PDB structure object.
representative_waters (pd.DataFrame) – DataFrame containing representative water coordinates.
distance_threshold (float) – Threshold distance for identifying interacting residues.
- Returns:
Dictionary mapping cluster numbers to interacting residues.
- Return type:
dict