choppa package

Submodules

choppa.align module

class choppa.align.AlignFactory(fitness_dict: OrderedDict, complex: Structure)[source]

Bases: object

Base class for aligning Fitness data with PDB complexes within choppa.

align_fitness()[source]: Align a fitness OrderedDict to a complex object

complex_get_seq()[source]: From a PDB.Structure.Structure, extracts the amino acid sequence

complex_get_seqidcs()[source]: From a PDB.Structure.Structure, extracts the sequence indices as stored in the PDB file

fill_aligned_fitness(aligned_fitness_dict)[source]: For an aligned fitness dict, there may be gaps with respect to the complex PDB. Fills these with empty data for easier parsing during visualization.

fitness_get_seq()[source]: From a fitness OrderedDict, extracts the amino acid sequence

fitness_get_seqidcs()[source]: From a fitness OrderedDict, extracts the amino acid sequence indices as a list

fitness_reset_keys(alignment)[source]

Given a fitness OrderedDict and an alignment, reset the keys of the fitness OrderedDict.

NB: also fills the fitness dict with indices that exist in the PDB but not in the fitness data, i.e. represented as ‘empty’ dict entries. This way the fitness HTML view will have ‘empty’ fitness data for those residues.

get_alignment(fitness_seq, complex_seq)[source]: Aligns two AA sequences with BioPython’s PairwiseAligner (https://biopython.org/DIST/docs/tutorial/Tutorial.html#sec128). We do a local alignment with BLOSUM to take evolutionary divergence into account.

get_fitness_alignment_shift_dict(alignment)[source]: Given an input complex sequence with residue indices (may not start at 0) and the fitness-complex alignment, creates a dictionary with indices that should be used for the fitness data of the form {fitness_idx : aligned_idx}

validate_alignment()[source]: [Placeholder] validates the alignment of a fitness OrderedDict to a complex object

choppa.logoplots module

class choppa.logoplots.LogoPlot(residue_dict, fitness_threshold)[source]

Bases: object

Given a dict with mutants for a given residue, generate logoplots.

build_logoplot(global_min_confidence=False, global_max_confidence=False)[source]

divide_fitness_types()[source]: Determines which mutants are fit/unfit given the fitness_threshold and returns all data required for logoplot generation in a simple dict form.

render_logoplot(mutants, global_min_confidence=False, global_max_confidence=False, lhs=True, wildtype=False)[source]

Creates a logoplot as a base64 string. Also annotes with confidence values if present.

TODO: nicer rounded ticks agnostic to array limits

choppa.logoplots.render_singleres_logoplot(res)[source]: Renders a simple, single-letter ‘logoplot’, normally for showing the wildtype residue. The plot is square and is typically rendered top center downstream.

choppa.render module

class choppa.render.InteractiveView(filled_aligned_fitness_dict, complex, complex_rdkit, fitness_threshold, output_session_file='out.html')[source]

Bases: object

Uses 3DMol and Jinja to create a single HTML file that can be hosted anywhere to enable shareable interactive views of the fitness data on top of the complex PDB

get_confidence_limits()[source]: Figures out what the global maximum and minimum is of the confidence measures (e.g. number of reads) in the experimental protocol of the fitness data. If there is no confidence measure, returns False

get_interaction_dict()[source]: Generates interactions to be displayed on the interactive HTML view. Interactions are colored by the same rules as for PyMOL (render.PublicationView()), but a dict of interactions is used which is generated in render.PublicationView().pymol_add_interactions().

get_logoplot_dict(confidence_lims, multiprocess=True, max_workers=None)[source]

For a fitness dict, load all base64 logoplots into memory using multithreading if requested.

Instead of adding base64 strings to fitness dict (making it uninterpretable), make a separate dict that mimics the form of fitness dict.

get_surface_coloring_dict()[source]: Based on fitness coloring, creates a dict where keys are colors, values are residue numbers.

inject_stuff_in_template(sdf_str, pdb_str, surface_coloring, logoplot_dict)[source]

” Replaces parts of a template HTML with relevant bits of data to get to a HTML view of the (ligand-) protein, its fitness and its interactions (if any).

Uses Jinja2 templating to render based on static HTML template.

render()[source]

surface_coloring_dict_to_js(color_res_dict)[source]: Transforms a dictionary of residue indices per color (hex) to a JavaScript-compatible string.

class choppa.render.PublicationView(filled_aligned_fitness_dict, complex, complex_rdkit, fitness_threshold, output_session_file='out.pse')[source]

Bases: object

Uses the PyMOL API to create a session file for publication-ready views of the fitness data on top of the complex PDB. Users will need to ray the PyMOL view with the desired ray settings in the GUI application themselves, but a combination of pre-set ray settings is provided in the .pse file.

count_fit_residues(fitness_data)[source]: For a dict with mutants in a fitness dict, counts the number of fit mutants

pymol_add_interactions(p, mutability_color_dict)[source]: Adds interactions to pymol session if a ligand is present. Interactions are colored by fitness of contacted residues, not by interaction type.

pymol_color_by_fitness(p)[source]: With a pymol session set up with a system using self.pymol_setup_system(), integrates fitness data by coloring residues by mutability degree.

pymol_color_coder()[source]: Given the aligned fitness dict, returns two dicts: - residue_mutability_levels : {residue index : number of fit mutations, ..} - mutability_color_dict : {number of fit mutations : color, ..}

pymol_prettify_system(p, ligands_in_system)[source]: With a pymol session set up with a system using self.pymol_setup_system(), makes the session pretty. This code isn’t pretty though, that’s just because of how the PyMOL API is constructed.

pymol_select_components(p)[source]: Makes selections in PyMOL for ligand, protein, binding site. Returns whether there is/are (a) ligand(s) present in the system.

pymol_setup_system(p, remove_solvents=True)[source]: Sets up the protein (and ligands if present) without changing any of the looks. Also removes some stuff we’re not interested in.

pymol_start_session()[source]: Boots up a session with the publicly available PyMOL API.

pymol_write_session(p, out_filename)[source]: Writes out a pymol session to a .pse file.

render()[source]: Renders the PyMOL session file.

choppa.utils module

choppa.utils.biopython_to_mda(BP_complex)[source]: Converts a biopython protein object to an MDAnalysis one.

choppa.utils.get_contacts_mda(complex, bigcutoff=4.1, remove_solvent=True)[source]: Use MDAnalysis to generate a dictionary of distance endpoint xyz coordinates between atoms in the ligand and protein residues.

choppa.utils.get_ligand_resnames_from_pdb_str(PDB_str, remove_solvent=True)[source]

Uses MDAnalysis to figure out what residue names the ligand(s) in the protein PDB (str) has/have.

Uses StringIO to circumvent having to write to memory.

choppa.utils.get_pdb_components(PDB_str, remove_solvent=True)[source]: Split a protein-ligand pdb into protein and ligand components :param PDB_str: :return:

choppa.utils.process_ligand(ligand)[source]: Add bond orders to a pdb ligand in an MDA universe object. 1. load PDB into PyMol session (PyMOL does the bond guessing) 2. write ligand to stream as SDF 3. Read the stream into an RDKit molecule

choppa.utils.process_protein(protein)[source]: Returns the string for the protein in an MDA universe object.

choppa.utils.sdf_str_from_pdb(pdb_str)[source]: Convert a PDB string to an SDF string using RDKit.

choppa.utils.show_contacts(pymol_instance, selection_residues, selection_lig, contact_color, bigcutoff=4.0)[source]

Heavily reduced PyMOL plugin that provides show_contacts command and GUI for highlighting good and bad polar contacts. Factored out of clustermols by Matthew Baumgartner.

Returns: List of contacts

choppa.utils.split_pdb_str(PDB_str)[source]

From a PDB string, gets the string for the protein and (if present) the ligand SDF (with guessed bond orders).

Inspired by https://gist.github.com/PatWalters/c046fee2760e6894ed13e19b8c99193b

Module contents

Integrated mutational and structural biology data into a concerted HTML view

choppa package

Subpackages

Submodules

choppa.align module

choppa.logoplots module

choppa.render module

choppa.utils module

Module contents