Tools for processing FastQ files

File Validation

Pipelines and functions assessing the quality of input files.

FastQC

class tool.validate_fastqc.fastqcTool(configuration=None)[source]

Tool for running indexers over a genome FASTA file

run(input_files, input_metadata, output_files)[source]

Tool for assessing the quality of reads in a FastQ file

Parameters:
  • input_files (dict) –
    fastq : str
    List of file locations
  • metadata (dict) –
    fastq : dict
    Required meta data
  • output_files (dict) –
    report : str
    Location of the HTML
Returns:

array – First element is a list of the index files. Second element is a list of the matching metadata

Return type:

list

validate(**kwargs)[source]

FastQC Validator

Parameters:
  • FastQC_file (str) – Location of the FastQ file
  • report_loc (str) – Location of the output report file

TrimGalore

class tool.trimgalore.trimgalore(configuration=None)[source]

Tool for trimming FASTQ reads that are of low quality

static get_trimgalore_params(params)[source]

Function to handle for extraction of commandline parameters

Parameters:params (dict) –
Returns:
Return type:list
run(input_files, input_metadata, output_files)[source]

The main function to run TrimGalore to remove low quality and very short reads. TrimGalore uses CutAdapt and FASTQC for the analysis.

Parameters:
  • input_files (dict) –
    fastq1 : string
    Location of the FASTQ file
    fastq2 : string
    [OPTIONAL] Location of the paired end FASTQ file
  • metadata (dict) – Matching metadata for the inpit FASTQ files
Returns:

  • output_files (dict) –

    fastq1_trimmed : str

    Location of the trimmed FASTQ file

    fastq2_trimmed : str

    [OPTIONAL] Location of a trimmed paired end FASTQ file

  • output_metadata (dict) – Matching metadata for the output files

trimgalore_paired(**kwargs)[source]

Trims and removes low quality subsections and reads from paired-end FASTQ files

Parameters:
  • fastq_file_in (str) – Location of the input fastq file
  • fastq_file_out (str) – Location of the output fastq file
  • params (dict) – Parameters to use in TrimGalore
Returns:

Indicator of the success of the function

Return type:

bool

trimgalore_single(**kwargs)[source]

Trims and removes low quality subsections and reads from a singed-ended FASTQ file

Parameters:
  • fastq_file_in (str) – Location of the input fastq file
  • fastq_file_out (str) – Location of the output fastq file
  • params (dict) – Parameters to use in TrimGalore
Returns:

Indicator of the success of the function

Return type:

bool

trimgalore_version(**kwargs)[source]

Trims and removes low quality subsections and reads from a singed-ended FASTQ file

Parameters:
  • fastq_file_in (str) – Location of the input fastq file
  • fastq_file_out (str) – Location of the output fastq file
  • params (dict) – Parameters to use in TrimGalore
Returns:

Indicator of the success of the function

Return type:

bool

Indexers

Bowtie 2

class tool.bowtie_indexer.bowtieIndexerTool(configuration=None)[source]

Tool for running indexers over a genome FASTA file

bowtie2_indexer(**kwargs)[source]

Bowtie2 Indexer

Parameters:
  • file_loc (str) – Location of the genome assembly FASTA file
  • idx_loc (str) – Location of the output index file
run(input_files, input_metadata, output_files)[source]

Tool for generating assembly aligner index files for use with the Bowtie 2 aligner

Parameters:
  • input_files (list) – List with a single str element with the location of the genome assembly FASTA file
  • metadata (list) –
Returns:

array – First element is a list of the index files. Second element is a list of the matching metadata

Return type:

list

BSgenome Index

class tool.forge_bsgenome.bsgenomeTool(configuration=None)[source]

Tool for peak calling for iDamID-seq data

bsgenome_creater(**kwargs)[source]

Make BSgenome index files.Uses an R script that wraps the required code.

Parameters:
  • genome (str) –
  • circo_chrom (str) – Comma separated list of chromosome ids that are circular in the genome
  • seed_file_param (dict) – Parameters required for the function to build the seed file
  • genome_2bit (str) –
  • chrom_size (str) –
  • seed_file (str) –
  • bsgenome (str) –
static genome_to_2bit(genome, genome_2bit)[source]

Generate the 2bit genome file from a FASTA file

Parameters:
  • genome (str) – Location of the FASRA genome file
  • genome_2bit (str) – Location of the 2bit genome file
Returns:

True if successful, False if not.

Return type:

bool

static get_chrom_size(genome_2bit, chrom_size, circ_chrom)[source]

Generate the chrom.size file and identify the available chromosomes in the 2Bit file.

Parameters:
  • genome_2bit (str) – Location of the 2bit genome file
  • chrom_size (str) – Location to save the chrom.size file to
  • circ_chrom (list) – List of chromosomes that are known to be circular
Returns:

  • If successful 2 lists – [0] : List of the linear chromosomes in the 2bit file [1] : List of circular chromosomes in the 2bit file
  • Returns (False, False) if there is an IOError

run(input_files, input_metadata, output_files)[source]

The main function to run iNPS for peak calling over a given BAM file and matching background BAM file.

Parameters:
  • input_files (list) – List of input bam file locations where 0 is the bam data file and 1 is the matching background bam file
  • metadata (dict) –
Returns:

  • output_files (list) – List of locations for the output files.
  • output_metadata (list) – List of matching metadata dict objects

BS-Seeker2 Indexer

class tool.bs_seeker_indexer.bssIndexerTool(configuration=None)[source]

Script from BS-Seeker2 for building the index for alignment. In this case it uses Bowtie2.

bss_build_index(**kwargs)[source]

Function to submit the FASTA file for the reference sequence and build the required index file used by the aligner.

Parameters:
  • fasta_file (str) – Location of the genome FASTA file
  • aligner (str) – Aligner to use by BS-Seeker2. Currently only bowtie2 is available in this build
  • aligner_path (str) – Location of the aligners binary file
  • bss_path – Location of the BS-Seeker2 libraries
  • idx_out (str) – Location of the output compressed index file
Returns:

bam_out – Location of the output bam alignment file

Return type:

str

static get_bss_index_params(params)[source]

Function to handle to extraction of commandline parameters and formatting them for use in the aligner for BWA ALN

Parameters:params (dict) –
Returns:
Return type:list
run(input_files, input_metadata, output_files)[source]

Tool for indexing the genome assembly using BS-Seeker2. In this case it is using Bowtie2

Parameters:
  • input_files (list) – FASTQ file
  • metadata (list) –
Returns:

array – Location of the filtered FASTQ file

Return type:

list

BWA

class tool.bwa_indexer.bwaIndexerTool(configuration=None)[source]

Tool for running indexers over a genome FASTA file

bwa_indexer(**kwargs)[source]

BWA Indexer

Parameters:
  • file_loc (str) – Location of the genome assebly FASTA file
  • idx_out (str) – Location of the output index file
Returns:

Return type:

bool

run(input_files, input_metadata, output_files)[source]

Function to run the BWA over a genome assembly FASTA file to generate the matching index for use with the aligner

Parameters:
  • input_files (dict) – List containing the location of the genome assembly FASTA file
  • meta_data (dict) –
  • output_files (dict) – List of outpout files generated
Returns:

  • output_files (dict) –

    index : str

    Location of the index file defined in the input parameters

  • output_metadata (dict) –

    index : Metadata

    Metadata relating to the index file

GEM

class tool.gem_indexer.gemIndexerTool(configuration=None)[source]

Tool for running indexers over a genome FASTA file

gem_indexer(**kwargs)[source]

GEM Indexer

Parameters:
  • genome_file (str) – Location of the genome assembly FASTA file
  • idx_loc (str) – Location of the output index file
run(input_files, input_metadata, output_files)[source]

Tool for generating assembly aligner index files for use with the GEM indexer

Parameters:
  • input_files (list) – List with a single str element with the location of the genome assembly FASTA file
  • input_metadata (list) –
Returns:

array – First element is a list of the index files. Second element is a list of the matching metadata

Return type:

list

Kallisto

class tool.kallisto_indexer.kallistoIndexerTool(configuration=None)[source]

Tool for running indexers over a genome FASTA file

kallisto_indexer(**kwargs)[source]

Kallisto Indexer

Parameters:
  • file_loc (str) – Location of the cDNA FASTA file for a genome
  • idx_loc (str) – Location of the output index file
run(input_files, input_metadata, output_files)[source]

Tool for generating assembly aligner index files for use with Kallisto

Parameters:
  • input_files (list) – FASTA file location will all the cDNA sequences for a given genome
  • input_metadata (list) –
Returns:

array – First element is a list of the index files. Second element is a list of the matching metadata

Return type:

list

Aligners

Bowtie2

class tool.bowtie_aligner.bowtie2AlignerTool(configuration=None)[source]

Tool for aligning sequence reads to a genome using BWA

bowtie2_aligner_paired(**kwargs)[source]

Bowtie2 Aligner - Paired End

Parameters:
  • genome_file_loc (str) – Location of the genomic fasta
  • read_file_loc1 (str) – Location of the FASTQ file
  • read_file_loc2 (str) – Location of the FASTQ file
  • bam_loc (str) – Location of the output aligned bam file
  • bt2_1_file (str) – Location of the <genome>.1.bt2 index file
  • bt2_2_file (str) – Location of the <genome>.2.bt2 index file
  • bt2_3_file (str) – Location of the <genome>.3.bt2 index file
  • bt2_4_file (str) – Location of the <genome>.4.bt2 index file
  • bt2_rev1_file (str) – Location of the <genome>.rev.1.bt2 index file
  • bt2_rev2_file (str) – Location of the <genome>.rev.2.bt2 index file
  • aln_params (dict) – Alignment parameters
Returns:

bam_loc – Location of the output file

Return type:

str

bowtie2_aligner_single(**kwargs)[source]

Bowtie2 Aligner - Single End

Parameters:
  • genome_file_loc (str) – Location of the genomic fasta
  • read_file_loc1 (str) – Location of the FASTQ file
  • bam_loc (str) – Location of the output aligned bam file
  • bt2_1_file (str) – Location of the <genome>.1.bt2 index file
  • bt2_2_file (str) – Location of the <genome>.2.bt2 index file
  • bt2_3_file (str) – Location of the <genome>.3.bt2 index file
  • bt2_4_file (str) – Location of the <genome>.4.bt2 index file
  • bt2_rev1_file (str) – Location of the <genome>.rev.1.bt2 index file
  • bt2_rev2_file (str) – Location of the <genome>.rev.2.bt2 index file
  • aln_params (dict) – Alignment parameters
Returns:

bam_loc – Location of the output file

Return type:

str

static get_aln_params(params, paired=False)[source]

Function to handle to extraction of commandline parameters and formatting them for use in the aligner for Bowtie2

Parameters:
  • params (dict) –
  • paired (bool) – Indicate if the parameters are paired-end specific. [DEFAULT=False]
Returns:

Return type:

list

run(input_files, input_metadata, output_files)[source]

The main function to align bam files to a genome using Bowtie2

Parameters:
  • input_files (dict) – File 0 is the genome file location, file 1 is the FASTQ file
  • metadata (dict) –
  • output_files (dict) –
Returns:

  • output_files (dict) – First element is a list of output_bam_files, second element is the matching meta data
  • output_metadata (dict)

untar_index(**kwargs)[source]

Extracts the Bowtie2 index files from the genome index tar file.

Parameters:
  • genome_file_name (str) – Location string of the genome fasta file
  • genome_idx (str) – Location of the Bowtie2 index file
  • bt2_1_file (str) – Location of the <genome>.1.bt2 index file
  • bt2_2_file (str) – Location of the <genome>.2.bt2 index file
  • bt2_3_file (str) – Location of the <genome>.3.bt2 index file
  • bt2_4_file (str) – Location of the <genome>.4.bt2 index file
  • bt2_rev1_file (str) – Location of the <genome>.rev.1.bt2 index file
  • bt2_rev2_file (str) – Location of the <genome>.rev.2.bt2 index file
Returns:

Boolean indicating if the task was successful

Return type:

bool

BWA - ALN

class tool.bwa_aligner.bwaAlignerTool(configuration=None)[source]

Tool for aligning sequence reads to a genome using BWA

bwa_aligner_paired(**kwargs)[source]

BWA ALN Aligner - Paired End

Parameters:
  • genome_file_loc (str) – Location of the genomic fasta
  • read_file_loc1 (str) – Location of the FASTQ file
  • read_file_loc2 (str) – Location of the FASTQ file
  • bam_loc (str) – Location of the output aligned bam file
  • amb_file (str) – Location of the amb index file
  • ann_file (str) – Location of the ann index file
  • bwt_file (str) – Location of the bwt index file
  • pac_file (str) – Location of the pac index file
  • sa_file (str) – Location of the sa index file
  • aln_params (dict) – Alignment parameters
Returns:

bam_loc – Location of the output file

Return type:

str

bwa_aligner_single(**kwargs)[source]

BWA ALN Aligner - Single Ended

Parameters:
  • genome_file_loc (str) – Location of the genomic fasta
  • read_file_loc (str) – Location of the FASTQ file
  • bam_loc (str) – Location of the output aligned bam file
  • amb_file (str) – Location of the amb index file
  • ann_file (str) – Location of the ann index file
  • bwt_file (str) – Location of the bwt index file
  • pac_file (str) – Location of the pac index file
  • sa_file (str) – Location of the sa index file
  • aln_params (dict) – Alignment parameters
Returns:

bam_loc – Location of the output file

Return type:

str

static get_aln_params(params)[source]

Function to handle to extraction of commandline parameters and formatting them for use in the aligner for BWA ALN

Parameters:params (dict) –
Returns:
Return type:list
run(input_files, input_metadata, output_files)[source]

The main function to align bam files to a genome using BWA

Parameters:
  • input_files (dict) – File 0 is the genome file location, file 1 is the FASTQ file
  • metadata (dict) –
  • output_files (dict) –
Returns:

  • output_files (dict) – First element is a list of output_bam_files, second element is the matching meta data
  • output_metadata (dict)

untar_index(**kwargs)[source]

Extracts the BWA index files from the genome index tar file.

Parameters:
  • genome_file_name (str) – Location string of the genome fasta file
  • genome_idx (str) – Location of the BWA index file
  • amb_file (str) – Location of the amb index file
  • ann_file (str) – Location of the ann index file
  • bwt_file (str) – Location of the bwt index file
  • pac_file (str) – Location of the pac index file
  • sa_file (str) – Location of the sa index file
Returns:

Boolean indicating if the task was successful

Return type:

bool

BWA - MEM

class tool.bwa_mem_aligner.bwaAlignerMEMTool(configuration=None)[source]

Tool for aligning sequence reads to a genome using BWA

bwa_aligner_paired(**kwargs)[source]

BWA MEM Aligner - Paired End

Parameters:
  • genome_file_loc (str) – Location of the genomic fasta
  • read_file_loc1 (str) – Location of the FASTQ file
  • read_file_loc2 (str) – Location of the FASTQ file
  • bam_loc (str) – Location of the output aligned bam file
  • amb_file (str) – Location of the amb index file
  • ann_file (str) – Location of the ann index file
  • bwt_file (str) – Location of the bwt index file
  • pac_file (str) – Location of the pac index file
  • sa_file (str) – Location of the sa index file
  • mem_params (dict) – Alignment parameters
Returns:

bam_loc – Location of the output file

Return type:

str

bwa_aligner_single(**kwargs)[source]

BWA MEM Aligner - Single Ended

Parameters:
  • genome_file_loc (str) – Location of the genomic fasta
  • read_file_loc (str) – Location of the FASTQ file
  • bam_loc (str) – Location of the output aligned bam file
  • amb_file (str) – Location of the amb index file
  • ann_file (str) – Location of the ann index file
  • bwt_file (str) – Location of the bwt index file
  • pac_file (str) – Location of the pac index file
  • sa_file (str) – Location of the sa index file
  • mem_params (dict) – Alignment parameters
Returns:

bam_loc – Location of the output file

Return type:

str

static get_mem_params(params)[source]

Function to handle to extraction of commandline parameters and formatting them for use in the aligner for BWA MEM

Parameters:params (dict) –
Returns:
Return type:list
run(input_files, input_metadata, output_files)[source]

The main function to align bam files to a genome using BWA

Parameters:
  • input_files (dict) – File 0 is the genome file location, file 1 is the FASTQ file
  • metadata (dict) –
  • output_files (dict) –
Returns:

  • output_files (dict) – First element is a list of output_bam_files, second element is the matching meta data
  • output_metadata (dict)

untar_index(**kwargs)[source]

Extracts the BWA index files from the genome index tar file.

Parameters:
  • genome_file_name (str) – Location string of the genome fasta file
  • genome_idx (str) – Location of the BWA index file
  • amb_file (str) – Location of the amb index file
  • ann_file (str) – Location of the ann index file
  • bwt_file (str) – Location of the bwt index file
  • pac_file (str) – Location of the pac index file
  • sa_file (str) – Location of the sa index file
Returns:

Boolean indicating if the task was successful

Return type:

bool

BS-Seeker2 Aligner

class tool.bs_seeker_aligner.bssAlignerTool(configuration=None)[source]

Script from BS-Seeker2 for building the index for alignment. In this case it uses Bowtie2.

bs_seeker_aligner(**kwargs)[source]

Alignment of the paired ends to the reference genome

Generates bam files for the alignments

This is performed by running the external program rather than reimplementing the code from the main function to make it easier when it comes to updating the changes in BS-Seeker2

Parameters:
  • input_fastq1 (str) – Location of paired end FASTQ file 1
  • input_fastq2 (str) – Location of paired end FASTQ file 2
  • aligner (str) – Aligner to use
  • aligner_path (str) – Location of the aligner
  • genome_fasta (str) – Location of the genome FASTA file
  • genome_idx (str) – Location of the tar.gz genome index file
  • bam_out (str) – Location of the aligned bam file
Returns:

bam_out – Location of the BAM file generated during the alignment.

Return type:

file

bs_seeker_aligner_single(**kwargs)[source]

Alignment of the paired ends to the reference genome

Generates bam files for the alignments

This is performed by running the external program rather than reimplementing the code from the main function to make it easier when it comes to updating the changes in BS-Seeker2

Parameters:
  • input_fastq1 (str) – Location of paired end FASTQ file 1
  • input_fastq2 (str) – Location of paired end FASTQ file 2
  • aligner (str) – Aligner to use
  • aligner_path (str) – Location of the aligner
  • genome_fasta (str) – Location of the genome FASTA file
  • genome_idx (str) – Location of the tar.gz genome index file
  • bam_out (str) – Location of the aligned bam file
Returns:

bam_out – Location of the BAM file generated during the alignment.

Return type:

file

static get_aln_params(params, paired=False)[source]

Function to handle to extraction of commandline parameters and formatting them for use in the aligner for Bowtie2

Parameters:
  • params (dict) –
  • paired (bool) – Indicate if the parameters are paired-end specific. [DEFAULT=False]
Returns:

Return type:

list

run(input_files, input_metadata, output_files)[source]

Tool for indexing the genome assembly using BS-Seeker2. In this case it is using Bowtie2

Parameters:
  • input_files (list) – FASTQ file
  • output_files (list) – Results files.
  • metadata (list) –
Returns:

array – Location of the filtered FASTQ file

Return type:

list

run_aligner(genome_idx, bam_out, script, params)[source]

Run the aligner

Parameters:
  • genome_idx (str) – Location of the genome index archive
  • bam_out (str) – Location of the output bam file
  • script (str) – Location of the BS Seeker2 aligner script
  • params (list) – Parameter list for the aligner
Returns:

True if the function completed successfully

Return type:

bool

Filters

BioBamBam Filter

class tool.biobambam_filter.biobambam(configuration=None)[source]

Tool to sort and filter bam files

biobambam_filter_alignments(**kwargs)[source]

Sorts and filters the bam file.

It is important that all duplicate alignments have been removed. This can be run as an intermediate step, but should always be run as a check to ensure that the files are sorted and duplicates have been removed.

Parameters:
  • bam_file_in (str) – Location of the input bam file
  • bam_file_out (str) – Location of the output bam file
  • tmp_dir (str) – Tmp location for intermediate files during the sorting
Returns:

bam_file_out – Location of the output bam file

Return type:

str

run(input_files, input_metadata, output_files)[source]

The main function to run BioBAMBAMfilter to remove duplicates and spurious reads from the FASTQ files before analysis.

Parameters:
  • input_files (dict) – List of input bam file locations where 0 is the bam data file
  • metadata (dict) – Matching meta data for the input files
  • output_files (dict) – List of output file locations
Returns:

  • output_files (dict) – Filtered bam fie.
  • output_metadata (dict) – List of matching metadata dict objects

BS-Seeker2 Filter

class tool.bs_seeker_filter.filterReadsTool(configuration=None)[source]

Script from BS-Seeker2 for filtering FASTQ files to remove repeats

bss_seeker_filter(**kwargs)[source]

This is optional, but removes reads that can be problematic for the alignment of whole genome datasets.

If performing RRBS then this step can be skipped

This is a function that is installed as part of the BS-Seeker installation process.

Parameters:infile (str) – Location of the FASTQ file
Returns:outfile – Location of the filtered FASTQ file
Return type:str
run(input_files, input_metadata, output_files)[source]

Tool for filtering duplicate entries from FASTQ files using BS-Seeker2

Parameters:
  • input_files (list) – FASTQ file
  • input_metadata (list) –
Returns:

array – Location of the filtered FASTQ file

Return type:

list

Trim Galore

class tool.trimgalore.trimgalore(configuration=None)[source]

Tool for trimming FASTQ reads that are of low quality

static get_trimgalore_params(params)[source]

Function to handle for extraction of commandline parameters

Parameters:params (dict) –
Returns:
Return type:list
run(input_files, input_metadata, output_files)[source]

The main function to run TrimGalore to remove low quality and very short reads. TrimGalore uses CutAdapt and FASTQC for the analysis.

Parameters:
  • input_files (dict) –
    fastq1 : string
    Location of the FASTQ file
    fastq2 : string
    [OPTIONAL] Location of the paired end FASTQ file
  • metadata (dict) – Matching metadata for the inpit FASTQ files
Returns:

  • output_files (dict) –

    fastq1_trimmed : str

    Location of the trimmed FASTQ file

    fastq2_trimmed : str

    [OPTIONAL] Location of a trimmed paired end FASTQ file

  • output_metadata (dict) – Matching metadata for the output files

trimgalore_paired(**kwargs)[source]

Trims and removes low quality subsections and reads from paired-end FASTQ files

Parameters:
  • fastq_file_in (str) – Location of the input fastq file
  • fastq_file_out (str) – Location of the output fastq file
  • params (dict) – Parameters to use in TrimGalore
Returns:

Indicator of the success of the function

Return type:

bool

trimgalore_single(**kwargs)[source]

Trims and removes low quality subsections and reads from a singed-ended FASTQ file

Parameters:
  • fastq_file_in (str) – Location of the input fastq file
  • fastq_file_out (str) – Location of the output fastq file
  • params (dict) – Parameters to use in TrimGalore
Returns:

Indicator of the success of the function

Return type:

bool

trimgalore_version(**kwargs)[source]

Trims and removes low quality subsections and reads from a singed-ended FASTQ file

Parameters:
  • fastq_file_in (str) – Location of the input fastq file
  • fastq_file_out (str) – Location of the output fastq file
  • params (dict) – Parameters to use in TrimGalore
Returns:

Indicator of the success of the function

Return type:

bool

Peak Calling

BS-Seeker2 Methylation Caller

iDEAR

class tool.idear.idearTool(configuration=None)[source]

Tool for peak calling for iDamID-seq data

idear_peak_calling(**kwargs)[source]

Make iDamID-seq peak calls. These are saved as bed files That can then get displayed on genome browsers. Uses an R script that wraps teh iDEAR protocol.

Parameters:
  • sample_name (str) –
  • bg_name (str) –
  • sample_bam_tar_file (str) – Location of the aligned sequences in bam format
  • bg_bam_tar_file (str) – Location of the aligned background sequences in bam format
  • species (str) – Species name for the alignments
  • assembly (str) – Assembly used for teh aligned sequences
  • peak_bed (str) – Location of the peak bed file
Returns:

peak_bed – Location of the collated bed file

Return type:

str

run(input_files, input_metadata, output_files)[source]

The main function to run iNPS for peak calling over a given BAM file and matching background BAM file.

Parameters:
  • input_files (list) – List of input bam file locations where 0 is the bam data file and 1 is the matching background bam file
  • metadata (dict) –
Returns:

  • output_files (list) – List of locations for the output files.
  • output_metadata (list) – List of matching metadata dict objects

iNPS

class tool.inps.inps(configuration=None)[source]

Tool for peak calling for MNase-seq data

inps_peak_calling(**kwargs)[source]

Convert Bam to Bed then make Nucleosome peak calls. These are saved as bed files That can then get displayed on genome browsers.

Parameters:
  • bam_file (str) – Location of the aligned sequences in bam format
  • peak_bed (str) – Location of the collated bed file of nucleosome peak calls
Returns:

peak_bed – Location of the collated bed file of nucleosome peak calls

Return type:

str

run(input_files, input_metadata, output_files)[source]

The main function to run iNPS for peak calling over a given BAM file and matching background BAM file.

Parameters:
  • input_files (list) – List of input bam file locations where 0 is the bam data file and 1 is the matching background bam file
  • metadata (dict) –
Returns:

  • output_files (list) – List of locations for the output files.
  • output_metadata (list) – List of matching metadata dict objects

Kallisto Quantification

class tool.kallisto_quant.kallistoQuantificationTool(configuration=None)[source]

Tool for quantifying RNA-seq alignments to calculate expression levels of genes within a genome.

kallisto_quant_paired(**kwargs)[source]

Kallisto quantifier for paired end RNA-seq data

Parameters:
  • idx_loc (str) – Location of the output index file
  • fastq_file_loc_01 (str) – Location of the FASTQ sequence file
  • fastq_file_loc_02 (str) – Location of the paired FASTQ sequence file
Returns:

wig_file_loc – Location of the wig file containing the levels of expression

Return type:

loc

kallisto_quant_single(**kwargs)[source]

Kallisto quantifier for single end RNA-seq data

Parameters:
  • idx_loc (str) – Location of the output index file
  • fastq_file_loc (str) – Location of the FASTQ sequence file
Returns:

wig_file_loc – Location of the wig file containing the levels of expression

Return type:

loc

kallisto_tsv2bed(**kwargs)[source]

So that the TSV file can be viewed within the genome browser it is handy to convert the file to a BigBed file

kallisto_tsv2gff(**kwargs)[source]

So that the TSV file can be viewed within the genome browser it is handy to convert the file to a BigBed file

static load_gff_ensembl(gff_file)[source]

Function to extract all of the genes and their locations from a GFF file generated by ensembl

static load_gff_ucsc(gff_file)[source]

Function to extract all of the genes and their locations from a GFF file generated by ensembl

run(input_files, input_metadata, output_files)[source]

Tool for calculating the level of expression

Parameters:
  • input_files (list) – Kallisto index file for the FASTQ file for the experiemtnal alignments
  • input_metadata (list) –
Returns:

array – First element is a list of the index files. Second element is a list of the matching metadata

Return type:

list

static seq_read_stats(file_in)[source]

Calculate the mean and standard deviation of the reads in a fastq file

Parameters:file_in (str) – Location of a FASTQ file
Returns:mean : Mean length of sequenced strands std : Standard deviation of lengths of sequenced strands
Return type:dict

MACS2

class tool.macs2.macs2(configuration=None)[source]

Tool for peak calling for ChIP-seq data

static get_macs2_params(params)[source]

Function to handle to extraction of commandline parameters and formatting them for use in the aligner for BWA ALN

Parameters:params (dict) –
Returns:
Return type:list
macs2_peak_calling(**kwargs)[source]

Function to run MACS2 for peak calling on aligned sequence files and normalised against a provided background set of alignments.

Parameters:
  • name (str) – Name to be used to identify the files
  • bam_file (str) – Location of the aligned FASTQ files as a bam file
  • bai_file (str) – Location of the bam index file
  • bam_file_bgd (str) – Location of the aligned FASTQ files as a bam file representing background values for the cell
  • bai_file_bgd (str) – Location of the background bam index file
  • narrowpeak (str) – Location of the output narrowpeak file
  • summits_bed (str) – Location of the output summits bed file
  • broadpeak (str) – Location of the output broadpeak file
  • gappedpeak (str) – Location of the output gappedpeak file
  • chromosome (str) – If the tool is to be run over a single chromosome the matching chromosome name should be specified. If None then the whole bam file is analysed
Returns:

  • narrowPeak (file) – BED6+4 file - ideal for transcription factor binding site identification
  • summitPeak (file) – BED4+1 file - Contains the peak summit locations for everypeak
  • broadPeak (file) – BED6+3 file - ideal for histone binding site identification
  • gappedPeak (file) – BED12+3 file - Contains a merged set of the broad and narrow peak files
  • Definitions defined for each of these files have come from the MACS2
  • documentation described in the docs at https (//github.com/taoliu/MACS)

macs2_peak_calling_nobgd(**kwargs)[source]

Function to run MACS2 for peak calling on aligned sequence files without a background dataset for normalisation.

Parameters:
  • name (str) – Name to be used to identify the files
  • bam_file (str) – Location of the aligned FASTQ files as a bam file
  • bai_file (str) – Location of the bam index file
  • narrowpeak (str) – Location of the output narrowpeak file
  • summits_bed (str) – Location of the output summits bed file
  • broadpeak (str) – Location of the output broadpeak file
  • gappedpeak (str) – Location of the output gappedpeak file
  • chromosome (str) – If the tool is to be run over a single chromosome the matching chromosome name should be specified. If None then the whole bam file is analysed
Returns:

  • narrowPeak (file) – BED6+4 file - ideal for transcription factor binding site identification
  • summitPeak (file) – BED4+1 file - Contains the peak summit locations for everypeak
  • broadPeak (file) – BED6+3 file - ideal for histone binding site identification
  • gappedPeak (file) – BED12+3 file - Contains a merged set of the broad and narrow peak files
  • Definitions defined for each of these files have come from the MACS2
  • documentation described in the docs at https (//github.com/taoliu/MACS)

run(input_files, input_metadata, output_files)[source]

The main function to run MACS 2 for peak calling over a given BAM file and matching background BAM file.

Parameters:
  • input_files (dict) – List of input bam file locations where 0 is the bam data file and 1 is the matching background bam file
  • metadata (dict) –
Returns:

  • output_files (dict) – List of locations for the output files.
  • output_metadata (dict) – List of matching metadata dict objects

Hi-C Parsing

The following tools are a split out of the Hi-C pipelines generated to use the TADbit library.

FASTQ mapping

class tool.tb_full_mapping.tbFullMappingTool[source]

Tool for mapping fastq paired end files to the GEM index files

run(input_files, input_metadata, output_files)[source]

The main function to map the FASTQ files to the GEM file over different window sizes ready for alignment

Parameters:
  • input_files (list) –
    gem_file : str
    Location of the genome GEM index file
    fastq_file_bgd : str
    Location of the FASTQ file
  • metadata (dict) –
    windows : list
    List of lists with the window sizes to be computed
    enzyme_name : str
    Restriction enzyme used [OPTIONAL]
Returns:

  • output_files (list) – List of locations for the output files.
  • output_metadata (list) – List of matching metadata dict objects

tb_full_mapping_frag(**kwargs)[source]

Function to map the FASTQ files to the GEM file based on fragments derived from the restriction enzyme that was used.

Parameters:
  • gem_file (str) – Location of the genome GEM index file
  • fastq_file_bgd (str) – Location of the FASTQ file
  • enzyme_name (str) – Restriction enzyme name (MboI)
  • windows (list) – List of lists with the window sizes to be computed
  • window_file (str) – Location of the first window index file
Returns:

window_file – Location of the window index file

Return type:

str

tb_full_mapping_iter(**kwargs)[source]

Function to map the FASTQ files to the GEM file over different window sizes ready for alignment

Parameters:
  • gem_file (str) – Location of the genome GEM index file
  • fastq_file_bgd (str) – Location of the FASTQ file
  • windows (list) – List of lists with the window sizes to be computed
  • window1 (str) – Location of the first window index file
  • window2 (str) – Location of the second window index file
  • window3 (str) – Location of the third window index file
  • window4 (str) – Location of the fourth window index file
Returns:

  • window1 (str) – Location of the first window index file
  • window2 (str) – Location of the second window index file
  • window3 (str) – Location of the third window index file
  • window4 (str) – Location of the fourth window index file

Map Parsing

class tool.tb_parse_mapping.tbParseMappingTool[source]

Tool for parsing the mapped reads and generating the list of paired ends that have a match at both ends.

run(input_files, input_metadata, output_files)[source]

The main function to map the aligned reads and return the matching pairs. Parsing of the mappings can be either iterative of fragment based. If it is to be iteractive then the locations of 4 output file windows for each end of the paired end window need to be provided. If it is fragment based, then only 2 window locations need to be provided along within an enzyme name.

Parameters:
  • input_files (list) –
    genome_file : str
    Location of the genome FASTA file
    window1_1 : str
    Location of the first window index file
    window1_2 : str
    Location of the second window index file
    window1_3 : str
    [OPTIONAL] Location of the third window index file
    window1_4 : str
    [OPTIONAL] Location of the fourth window index file
    window2_1 : str
    Location of the first window index file
    window2_2 : str
    Location of the second window index file
    window2_3 : str
    [OPTIONAL] Location of the third window index file
    window2_4 : str
    [OPTIONAL] Location of the fourth window index file
  • metadata (dict) –
    windows : list
    List of lists with the window sizes to be computed
    enzyme_name : str
    Restricture enzyme name
    mapping : list
    The mapping function used. The options are iter or frag.
Returns:

  • output_files (list) – List of locations for the output files.
  • output_metadata (dict) – Dict of matching metadata dict objects

Example

Iterative:

from tool import tb_parse_mapping

genome_file = 'genome.fasta'

root_name_1 = "/tmp/data/expt_source_1".split
root_name_2 = "/tmp/data/expt_source_2".split
windows = [[1,25], [1,50], [1,75], [1,100]]

windows1 = []
windows2 = []

for w in windows:
    tail = "_full_" + w[0] + "-" + w[1] + ".map"
    windows1.append('/'.join(root_name_1) + tail)
    windows2.append('/'.join(root_name_2) + tail)

files = [genome_file] + windows1 + windows2

tpm = tb_parse_mapping.tb_parse_mapping()
metadata = {'enzyme_name' : 'MboI', 'mapping' : ['iter', 'iter'], 'expt_name' = 'test'}
tpm_files, tpm_meta = tpm.run(files, metadata)

Fragment based mapping:

from tool import tb_parse_mapping

genome_file = 'genome.fasta'

root_name_1 = "/tmp/data/expt_source_1".split
root_name_2 = "/tmp/data/expt_source_2".split
windows = [[1,100]]

start = windows[0][0]
end   = windows[0][1]

window1_1 = '/'.join(root_name_1) + "_full_" + start + "-" + end + ".map"
window1_2 = '/'.join(root_name_1) + "_frag_" + start + "-" + end + ".map"

window2_1 = '/'.join(root_name_2) + "_full_" + start + "-" + end + ".map"
window2_2 = '/'.join(root_name_2) + "_frag_" + start + "-" + end + ".map"

files = [
    genome_file,
    window1_1, window1_2,
    window2_1, window2_2,
]

tpm = tb_parse_mapping.tb_parse_mapping()
metadata = {'enzyme_name' : 'MboI', 'mapping' : ['frag', 'frag'], 'expt_name' = 'test'}
tpm_files, tpm_meta = tpm.run(files, metadata)
tb_parse_mapping_frag(**kwargs)[source]

Function to map the aligned reads and return the matching pairs

Parameters:
  • genome_seq (dict) – Object containing the sequence of each of the chromosomes
  • enzyme_name (str) – Name of the enzyme used to digest the genome
  • window1_full (str) – Location of the first window index file
  • window1_frag (str) – Location of the second window index file
  • window2_full (str) – Location of the first window index file
  • window2_frag (str) – Location of the second window index file
  • reads (str) – Location of the reads thats that has a matching location at both ends of the paired reads
Returns:

reads – Location of the intersection of mapped reads that have matching reads in both pair end files

Return type:

str

tb_parse_mapping_iter(**kwargs)[source]

Function to map the aligned reads and return the matching pairs

Parameters:
  • genome_seq (dict) – Object containing the sequence of each of the chromosomes
  • enzyme_name (str) – Name of the enzyme used to digest the genome
  • window1_1 (str) – Location of the first window index file
  • window1_2 (str) – Location of the second window index file
  • window1_3 (str) – Location of the third window index file
  • window1_4 (str) – Location of the fourth window index file
  • window2_1 (str) – Location of the first window index file
  • window2_2 (str) – Location of the second window index file
  • window2_3 (str) – Location of the third window index file
  • window2_4 (str) – Location of the fourth window index file
  • reads (str) – Location of the reads thats that has a matching location at both ends of the paired reads
Returns:

reads – Location of the intersection of mapped reads that have matching reads in both pair end files

Return type:

str

Filter Aligned Reads

class tool.tb_filter.tbFilterTool(configuration=None)[source]

Tool for filtering out experimetnal artifacts from the aligned data

run(input_files, input_metadata, output_files)[source]

The main function to filter the reads to remove experimental artifacts

Parameters:
  • input_files (list) –
    reads : str
    Location of the reads thats that has a matching location at both ends of the paired reads
  • metadata (dict) –
    conservative : bool
    Level of filtering to apply [DEFAULT : True]
Returns:

  • output_files (list) – List of locations for the output files.
  • output_metadata (list) – List of matching metadata dict objects

tb_filter(**kwargs)[source]

Function to filter out expoerimental artifacts

Parameters:
  • reads (str) – Location of the reads thats that has a matching location at both ends of the paired reads
  • filtered_reads_file (str) – Location of the filtered reads
  • conservative (bool) – Level of filtering [DEFAULT : True]
Returns:

filtered_reads – Location of the filtered reads

Return type:

str

Identify TADs and Compartments

class tool.tb_segment.tbSegmentTool[source]

Tool for finding tads and compartments in an adjacency matrix

run(input_files, input_metadata, output_files)[source]

The main function to the predict TAD sites and compartments for a given resolution from the Hi-C matrix

Parameters:
  • input_files (list) –
    bamin : str
    Location of the tadbit bam paired reads
    biases : str
    Location of the pickle hic biases
  • metadata (dict) –
    resolution : int
    Resolution of the Hi-C
    workdir : str
    Location of working directory
    ncpus : int
    Number of cpus to use
Returns:

  • output_files (list) – List of locations for the output files.
  • output_metadata (list) – List of matching metadata dict objects

tb_segment(**kwargs)[source]

Function to find tads and compartments in the Hi-C matrix

Parameters:
  • bamin (str) – Location of the tadbit bam paired reads
  • biases (str) – Location of the pickle hic biases
  • resolution (int) – Resolution of the Hi-C
  • callers (str) – 1 for ta calling, 2 for compartment calling
  • workdir (str) – Location of working directory
  • ncpus (int) – Number of cpus to use
Returns:

  • compartments (str) – Location of tsv file with compartment definition
  • tads (str) – Location of tsv file with tad definition
  • filtered_bins (str) – Location of filtered_bins png

Normalize paired end reads file

class tool.tb_normalize.tbNormalizeTool[source]

Tool for normalizing an adjacency matrix

run(input_files, input_metadata, output_files)[source]

The main function for the normalization of the Hi-C matrix to a given resolution

Parameters:
  • input_files (list) –
    bamin : str
    Location of the tadbit bam paired reads
  • metadata (dict) –
    normalization: str
    normalization(s) to apply. Order matters. Choices: [Vanilla, oneD]
    resolution : str
    Resolution of the Hi-C
    min_perc : str
    lower percentile from which consider bins as good.
    max_perc : str
    upper percentile until which consider bins as good.
    workdir : str
    Location of working directory
    ncpus : str
    Number of cpus to use
    min_count : str
    minimum number of reads mapped to a bin (recommended value could be 2500). If set this option overrides the perc_zero
    fasta: str
    Location of the fasta file with genome sequence, to compute GC content and number of restriction sites per bin. Required for oneD normalization
    mappability: str
    Location of the file with mappability, required for oneD normalization
    rest_enzyme: str
    For oneD normalization. Name of the restriction enzyme used to do the Hi-C experiment
Returns:

  • output_files (list) – List of locations for the output files.
  • output_metadata (list) – List of matching metadata dict objects

tb_normalize(**kwargs)[source]

Function to normalize to a given resolution the Hi-C matrix

Parameters:
  • bamin (str) – Location of the tadbit bam paired reads
  • normalization (str) – normalization(s) to apply. Order matters. Choices: [Vanilla, oneD]
  • resolution (str) – Resolution of the Hi-C
  • min_perc (str) – lower percentile from which consider bins as good.
  • max_perc (str) – upper percentile until which consider bins as good.
  • workdir (str) – Location of working directory
  • ncpus (str) – Number of cpus to use
  • min_count (str) – minimum number of reads mapped to a bin (recommended value could be 2500). If set this option overrides the perc_zero
  • fasta (str) – Location of the fasta file with genome sequence, to compute GC content and number of restriction sites per bin. Required for oneD normalization
  • mappability (str) – Location of the file with mappability, required for oneD normalization
  • rest_enzyme (str) – For oneD normalization. Name of the restriction enzyme used to do the Hi-C experiment
Returns:

  • hic_biases (str) – Location of HiC biases pickle file
  • interactions (str) – Location of interaction decay vs genomic distance pdf
  • filtered_bins (str) – Location of filtered_bins png

Extract binned matrix from paired end reads file

class tool.tb_bin.tbBinTool[source]

Tool for binning an adjacency matrix

run(input_files, input_metadata, output_files)[source]

The main function to the predict TAD sites for a given resolution from the Hi-C matrix

Parameters:
  • input_files (list) –
    bamin : str
    Location of the tadbit bam paired reads
    biases : str
    Location of the pickle hic biases
  • input_metadata (dict) –
    resolution : int
    Resolution of the Hi-C
    coord1 : str
    Coordinate of the region to retrieve. By default all genome, arguments can be either one chromosome name, or the coordinate in the form: “-c chr3:110000000-120000000”
    coord2 : str
    Coordinate of a second region to retrieve the matrix in the intersection with the first region.
    norm : str
    [[‘raw’]] normalization(s) to apply. Order matters. Choices: [norm, decay, raw]
    workdir : str
    Location of working directory
    ncpus : int
    Number of cpus to use
Returns:

  • output_files (list) – List of locations for the output files.
  • output_metadata (list) – List of matching metadata dict objects

tb_bin(**kwargs)[source]

Function to bin to a given resolution the Hi-C matrix

Parameters:
  • bamin (str) – Location of the tadbit bam paired reads
  • biases (str) – Location of the pickle hic biases
  • resolution (int) – Resolution of the Hi-C
  • coord1 (str) – Coordinate of the region to retrieve. By default all genome, arguments can be either one chromosome name, or the coordinate in the form: “-c chr3:110000000-120000000”
  • coord2 (str) – Coordinate of a second region to retrieve the matrix in the intersection with the first region.
  • norm (list) – [[‘raw’]] normalization(s) to apply. Order matters. Choices: [norm, decay, raw]
  • workdir (str) – Location of working directory
  • ncpus (int) – Number of cpus to use
Returns:

  • hic_contacts_matrix_raw (str) – Location of HiC raw matrix in text format
  • hic_contacts_matrix_nrm (str) – Location of HiC normalized matrix in text format
  • hic_contacts_matrix_raw_fig (str) – Location of HiC raw matrix in png format
  • hic_contacts_matrix_norm_fig (str) – Location of HiC normalized matrix in png format

Save Matrix to HDF5 File

class tool.tb_save_hdf5_matrix.tbSaveAdjacencyHDF5Tool[source]

Tool for filtering out experimetnal artifacts from the aligned data

run(input_files, output_files, metadata=None)[source]

The main function save the adjacency list from Hi-C into an HDF5 index file at the defined resolutions.

Parameters:
  • input_files (list) –
    adj_list : str
    Location of the adjacency list
    hdf5_file : str
    Location of the HDF5 output matrix file
  • metadata (dict) –
    resolutions : list
    Levels of resolution for the adjacency list to be daved at
    assembly : str
    Assembly of the aligned sequences
    normalized : bool
    Whether the dataset should be normalised before saving
Returns:

  • output_files (list) – List of locations for the output files.
  • output_metadata (list) – List of matching metadata dict objects

tb_matrix_hdf5(**kwargs)[source]

Function to the Hi-C matrix into an HDF5 file

This has to be run sequentially as it is not possible for multiple streams to write to the same HDF5 file. This is a run once and leave operatation. There also needs to be a check that no other process is writing to the HDF5 file at the same time. This should be done at the stage and unstaging level to prevent to file getting written to by multiple processes and generating conflicts.

This needs to include attributes for the chromosomes for each resolution - See the mg-rest-adjacency hdf5_reader for further details about the requirement. This prevents the need for secondary storage details outside of the HDF5 file.

Parameters:
  • hic_data (hic_data) – Hi-C data object
  • hdf5_file (str) – Location of the HDF5 output matrix file
  • resolution (int) – Resolution to read teh Hi-C adjacency list at
  • chromosomes (list) – List of listsd of the chromosome names and their size in the order that they are presented for indexing
Returns:

hdf5_file – Location of the HDF5 output matrix file

Return type:

str

Generate TAD Predictions

class tool.tb_generate_tads.tbGenerateTADsTool[source]

Tool for taking the adjacency lists and predicting TADs

run(input_files, output_files, metadata=None)[source]

The main function to the predict TAD sites for a given resolution from the Hi-C matrix

Parameters:
  • input_files (list) –
    adj_list : str
    Location of the adjacency list
  • metadata (dict) –
    resolutions : list
    Levels of resolution for the adjacency list to be daved at
    assembly : str
    Assembly of the aligned sequences
Returns:

  • output_files (list) – List of locations for the output files.
  • output_metadata (list) – List of matching metadata dict objects

tb_generate_tads(**kwargs)[source]

Function to the predict TAD sites for a given resolution from the Hi-C matrix

Parameters:
  • expt_name (str) – Location of the adjacency list
  • matrix_file (str) – Location of the HDF5 output matrix file
  • resolution (int) – Resolution to read the Hi-C adjacency list at
  • tad_file (str) – Location of the output TAD file
Returns:

tad_file – Location of the output TAD file

Return type:

str

tb_hic_chr(**kwargs)[source]

Get the list of chromosomes in the adjacency list

tb_merge_tad_files(**kwargs)[source]

Merge 2 TAD adjacnecny list files

Generate 3D models from binned interaction matrix

class tool.tb_model.tbModelTool[source]

Tool for normalizing an adjacency matrix

run(input_files, input_metadata, output_files)[source]

The main function for the normalization of the Hi-C matrix to a given resolution

Parameters:
  • input_files (list) –
    hic_contacts_matrix_norm : str
    Location of the tab-separated normalized matrix
  • metadata (dict) –
    optimize_only: bool
    True if only optimize, False for computing the models and stats
    gen_pos_chrom_name : str
    Coordinates of the genomic region to model.
    resolution : str
    Resolution of the Hi-C
    gen_pos_begin : int
    Genomic coordinate from which to start modeling.
    gen_pos_end : int
    Genomic coordinate where to end modeling.
    num_mod_comp : int
    Number of models to compute for each optimization step.
    num_mod_comp : int
    Number of models to keep.
    max_dist : str
    Range of numbers for optimal maxdist parameter, i.e. 400:1000:100; or just a single number e.g. 800; or a list of numbers e.g. 400 600 800 1000.
    upper_bound : int
    Range of numbers for optimal upfreq parameter, i.e. 0:1.2:0.3; or just a single number e.g. 0.8; or a list of numbers e.g. 0.1 0.3 0.5 0.9.
    lower_bound : int
    Range of numbers for optimal low parameter, i.e. -1.2:0:0.3; or just a single number e.g. -0.8; or a list of numbers e.g. -0.1 -0.3 -0.5 -0.9.
    cutoff : str
    Range of numbers for optimal cutoff distance. Cutoff is computed based on the resolution. This cutoff distance is calculated taking as reference the diameter of a modeled particle in the 3D model. i.e. 1.5:2.5:0.5; or just a single number e.g. 2; or a list of numbers e.g. 2 2.5.
    workdir : str
    Location of working directory
    ncpus : str
    Number of cpus to use
Returns:

  • output_files (list) – List of locations for the output files.
  • output_metadata (list) – List of matching metadata dict objects

tb_model(**kwargs)[source]

Function to normalize to a given resolution the Hi-C matrix

Parameters:
  • optimize_only (bool) – True if only optimize, False for computing the models and stats
  • hic_contacts_matrix_norm (str) – Location of the tab-separated normalized matrix
  • resolution (str) – Resolution of the Hi-C
  • gen_pos_chrom_name (str) – Coordinates of the genomic region to model.
  • gen_pos_begin (int) – Genomic coordinate from which to start modeling.
  • gen_pos_end (int) – Genomic coordinate where to end modeling.
  • num_mod_comp (int) – Number of models to compute for each optimization step.
  • num_mod_comp – Number of models to keep.
  • max_dist (str) – Range of numbers for optimal maxdist parameter, i.e. 400:1000:100; or just a single number e.g. 800; or a list of numbers e.g. 400 600 800 1000.
  • upper_bound (int) – Range of numbers for optimal upfreq parameter, i.e. 0:1.2:0.3; or just a single number e.g. 0.8; or a list of numbers e.g. 0.1 0.3 0.5 0.9.
  • lower_bound (int) – Range of numbers for optimal low parameter, i.e. -1.2:0:0.3; or just a single number e.g. -0.8; or a list of numbers e.g. -0.1 -0.3 -0.5 -0.9.
  • cutoff (str) – Range of numbers for optimal cutoff distance. Cutoff is computed based on the resolution. This cutoff distance is calculated taking as reference the diameter of a modeled particle in the 3D model. i.e. 1.5:2.5:0.5; or just a single number e.g. 2; or a list of numbers e.g. 2 2.5.
  • workdir (str) – Location of working directory
  • ncpus (str) – Number of cpus to use
Returns:

  • tadkit_models (str) – Location of TADkit json file
  • modeling_stats (str) – Location of the folder with the modeling files and stats