choppa package
Subpackages
Submodules
choppa.align module
- class choppa.align.AlignFactory(fitness_dict: OrderedDict, complex: Structure)[source]
Bases:
objectBase class for aligning Fitness data with PDB complexes within choppa.
- complex_get_seqidcs()[source]
From a PDB.Structure.Structure, extracts the sequence indices as stored in the PDB file
- fill_aligned_fitness(aligned_fitness_dict)[source]
For an aligned fitness dict, there may be gaps with respect to the complex PDB. Fills these with empty data for easier parsing during visualization.
- fitness_get_seqidcs()[source]
From a fitness OrderedDict, extracts the amino acid sequence indices as a list
- fitness_reset_keys(alignment)[source]
Given a fitness OrderedDict and an alignment, reset the keys of the fitness OrderedDict.
NB: also fills the fitness dict with indices that exist in the PDB but not in the fitness data, i.e. represented as ‘empty’ dict entries. This way the fitness HTML view will have ‘empty’ fitness data for those residues.
- get_alignment(fitness_seq, complex_seq)[source]
Aligns two AA sequences with BioPython’s PairwiseAligner (https://biopython.org/DIST/docs/tutorial/Tutorial.html#sec128). We do a local alignment with BLOSUM to take evolutionary divergence into account.
choppa.logoplots module
- class choppa.logoplots.LogoPlot(residue_dict, fitness_threshold)[source]
Bases:
objectGiven a dict with mutants for a given residue, generate logoplots.
choppa.render module
- class choppa.render.InteractiveView(filled_aligned_fitness_dict, complex, complex_rdkit, fitness_threshold, output_session_file='out.html')[source]
Bases:
objectUses 3DMol and Jinja to create a single HTML file that can be hosted anywhere to enable shareable interactive views of the fitness data on top of the complex PDB
- get_confidence_limits()[source]
Figures out what the global maximum and minimum is of the confidence measures (e.g. number of reads) in the experimental protocol of the fitness data. If there is no confidence measure, returns False
- get_interaction_dict()[source]
Generates interactions to be displayed on the interactive HTML view. Interactions are colored by the same rules as for PyMOL (render.PublicationView()), but a dict of interactions is used which is generated in render.PublicationView().pymol_add_interactions().
- get_logoplot_dict(confidence_lims, multiprocess=True, max_workers=None)[source]
For a fitness dict, load all base64 logoplots into memory using multithreading if requested.
Instead of adding base64 strings to fitness dict (making it uninterpretable), make a separate dict that mimics the form of fitness dict.
- get_surface_coloring_dict()[source]
Based on fitness coloring, creates a dict where keys are colors, values are residue numbers.
- inject_stuff_in_template(sdf_str, pdb_str, surface_coloring, logoplot_dict)[source]
” Replaces parts of a template HTML with relevant bits of data to get to a HTML view of the (ligand-) protein, its fitness and its interactions (if any).
Uses Jinja2 templating to render based on static HTML template.
- class choppa.render.PublicationView(filled_aligned_fitness_dict, complex, complex_rdkit, fitness_threshold, output_session_file='out.pse')[source]
Bases:
objectUses the PyMOL API to create a session file for publication-ready views of the fitness data on top of the complex PDB. Users will need to ray the PyMOL view with the desired ray settings in the GUI application themselves, but a combination of pre-set ray settings is provided in the .pse file.
- count_fit_residues(fitness_data)[source]
For a dict with mutants in a fitness dict, counts the number of fit mutants
- pymol_add_interactions(p, mutability_color_dict)[source]
Adds interactions to pymol session if a ligand is present. Interactions are colored by fitness of contacted residues, not by interaction type.
- pymol_color_by_fitness(p)[source]
With a pymol session set up with a system using self.pymol_setup_system(), integrates fitness data by coloring residues by mutability degree.
- pymol_color_coder()[source]
Given the aligned fitness dict, returns two dicts: - residue_mutability_levels : {residue index : number of fit mutations, ..} - mutability_color_dict : {number of fit mutations : color, ..}
- pymol_prettify_system(p, ligands_in_system)[source]
With a pymol session set up with a system using self.pymol_setup_system(), makes the session pretty. This code isn’t pretty though, that’s just because of how the PyMOL API is constructed.
- pymol_select_components(p)[source]
Makes selections in PyMOL for ligand, protein, binding site. Returns whether there is/are (a) ligand(s) present in the system.
choppa.utils module
- choppa.utils.biopython_to_mda(BP_complex)[source]
Converts a biopython protein object to an MDAnalysis one.
- choppa.utils.get_contacts_mda(complex, bigcutoff=4.1, remove_solvent=True)[source]
Use MDAnalysis to generate a dictionary of distance endpoint xyz coordinates between atoms in the ligand and protein residues.
- choppa.utils.get_ligand_resnames_from_pdb_str(PDB_str, remove_solvent=True)[source]
Uses MDAnalysis to figure out what residue names the ligand(s) in the protein PDB (str) has/have.
Uses StringIO to circumvent having to write to memory.
- choppa.utils.get_pdb_components(PDB_str, remove_solvent=True)[source]
Split a protein-ligand pdb into protein and ligand components :param PDB_str: :return:
- choppa.utils.process_ligand(ligand)[source]
Add bond orders to a pdb ligand in an MDA universe object. 1. load PDB into PyMol session (PyMOL does the bond guessing) 2. write ligand to stream as SDF 3. Read the stream into an RDKit molecule
- choppa.utils.process_protein(protein)[source]
Returns the string for the protein in an MDA universe object.
- choppa.utils.show_contacts(pymol_instance, selection_residues, selection_lig, contact_color, bigcutoff=4.0)[source]
Heavily reduced PyMOL plugin that provides show_contacts command and GUI for highlighting good and bad polar contacts. Factored out of clustermols by Matthew Baumgartner.
Returns: List of contacts
- choppa.utils.split_pdb_str(PDB_str)[source]
From a PDB string, gets the string for the protein and (if present) the ligand SDF (with guessed bond orders).
Inspired by https://gist.github.com/PatWalters/c046fee2760e6894ed13e19b8c99193b
Module contents
Integrated mutational and structural biology data into a concerted HTML view