API Documentation for Preprocessing

class Preprocessing

A class providing utilities for preparing and modifying PDB and trajectory files for molecular dynamics analysis.

renumber_protein_residues(input_pdb, reference_pdb, output_pdb)

Renumber protein residues in a trajectory PDB file based on a reference PDB structure.

Parameters:
  • input_pdb (str) – Path to the input PDB file to be renumbered.

  • reference_pdb (str) – Path to the reference PDB file for correct residue numbering.

  • output_pdb (str) – Path to the output PDB file with updated residue numbering.

increase_ring_indices(ring, lig_index)

Adjusts ligand ring atom indices to align with full protein-ligand complex indices.

Parameters:
  • ring (list) – List of ligand atom indices in the ring.

  • lig_index (int) – Index offset from the full structure.

Returns:

List of adjusted atom indices.

Return type:

list

process_pdb_file(input_pdb_filename)

Modifies residue names in a PDB file to standard forms (e.g., water residues renamed to “HOH”).

Parameters:

input_pdb_filename (str) – Path to the PDB file to be processed.

extract_and_save_ligand_as_sdf(input_pdb_filename, output_filename, target_resname)

Extracts a ligand from a receptor-ligand complex PDB and saves it in SDF format.

Parameters:
  • input_pdb_filename (str) – Path to the complex PDB file.

  • output_filename (str) – Path for the output SDF file.

  • target_resname (str) – Residue name of the ligand.

renumber_atoms_in_residues(input_pdb_file, output_pdb_file, lig_name)

Renames ligand atom names based on element and count within the residue for clarity.

Parameters:
  • input_pdb_file (str) – Path to the original PDB file.

  • output_pdb_file (str) – Path to save the updated PDB file.

  • lig_name (str) – Name of the ligand residue.

replace_atom_type(data)

Corrects atom type annotations in ligand lines marked with ‘X’.

Parameters:

data (str) – Contents of the PDB file as a string.

Returns:

Modified PDB file contents.

Return type:

str

process_pdb(input_file, output_file)

Wrapper function that processes and writes corrected PDB content to a file.

Parameters:
  • input_file (str) – Path to the input PDB file.

  • output_file (str) – Path to the output PDB file.