Testing

To ensure that the code written to process the data within the MuG VRE works it is essential to have a testing framework to ensure that as the code evolves it does not break the funcitonality of the pipelines and tools. There are 2 key parts to this:

  1. Sample data
  2. Runnable scripts

For each tool within the mg-process-fastq repository there is an initial set of sample data and matching tests for each tool and pipeline.

Pipelines

There is a test for each of the tools. This uses the “process” scripts to run each of the tools. This is to ensure that the pipeline scripts are able to call out to each of the tools and the correct parameters are handed to each one.

ChIP-Seq

To run the pipeline test:

pytest tests/test_pipeline_chipseq.py

Methods

tests.test_pipeline_chipseq.test_chipseq_pipeline_00()[source]

Test case to ensure that the ChIP-seq pipeline code works.

Running the pipeline with the test data from the command line:

runcompss                                                         \
   --lang=python                                                  \
   --library_path=${HOME}/bin                                     \
   --pythonpath=/<pyenv_virtenv_dir>/lib/python2.7/site-packages/ \
   --log_level=debug                                              \
   process_chipseq.py                                             \
      --taxon_id 9606                                             \
      --genome /<dataset_dir>/Human.GCA_000001405.22.fasta        \
      --assembly GRCh38                                           \
      --file /<dataset_dir>/DRR000150.22.fastq
tests.test_pipeline_chipseq.test_chipseq_pipeline_01()[source]

Test case to ensure that the ChIP-seq pipeline code works.

Running the pipeline with the test data from the command line:

runcompss                                                         \
   --lang=python                                                  \
   --library_path=${HOME}/bin                                     \
   --pythonpath=/<pyenv_virtenv_dir>/lib/python2.7/site-packages/ \
   --log_level=debug                                              \
   process_chipseq.py                                             \
      --taxon_id 9606                                             \
      --genome /<dataset_dir>/Human.GCA_000001405.22.fasta        \
      --assembly GRCh38                                           \
      --file /<dataset_dir>/DRR000150.22.fastq

iDamID-Seq

To run the pipeline test:

pytest tests/test_pipeline_idamidseq.py

Methods

tests.test_pipeline_idamidseq.test_idamidseq_pipeline_00()[source]

Test case to ensure that the ChIP-seq pipeline code works.

Running the pipeline with the test data from the command line:

runcompss                                                         \
   --lang=python                                                  \
   --library_path=${HOME}/bin                                     \
   --pythonpath=/<pyenv_virtenv_dir>/lib/python2.7/site-packages/ \
   --log_level=debug                                              \
   process_damidseq.py                                             \
      --taxon_id 9606                                             \
      --genome /<dataset_dir>/Human.GCA_000001405.22.fasta        \
      --assembly GRCh38                                           \
      --file /<dataset_dir>/DRR000150.22.fastq
tests.test_pipeline_idamidseq.test_idamidseq_pipeline_01()[source]

Test case to ensure that the ChIP-seq pipeline code works.

Running the pipeline with the test data from the command line:

runcompss                                                         \
   --lang=python                                                  \
   --library_path=${HOME}/bin                                     \
   --pythonpath=/<pyenv_virtenv_dir>/lib/python2.7/site-packages/ \
   --log_level=debug                                              \
   process_damidseq.py                                             \
      --taxon_id 9606                                             \
      --genome /<dataset_dir>/Human.GCA_000001405.22.fasta        \
      --assembly GRCh38                                           \
      --file /<dataset_dir>/DRR000150.22.fastq

Genome Indexing

To run the pipeline test:

pytest tests/test_pipeline_genome.py

Methods

tests.test_pipeline_genome.test_genome_pipeline_00()[source]

Test case to ensure that the Genome indexing pipeline code works.

Running the pipeline with the test data from the command line:

runcompss                                                         \
   --lang=python                                                  \
   --library_path=${HOME}/bin                                     \
   --pythonpath=/<pyenv_virtenv_dir>/lib/python2.7/site-packages/ \
   --log_level=debug                                              \
   process_genome.py                                              \
      --taxon_id 9606                                             \
      --genome /<dataset_dir>/Human.GCA_000001405.22.fasta        \
      --assembly GRCh38                                           \
      --file /<dataset_dir>/DRR000150.22.fastq
tests.test_pipeline_genome.test_genome_pipeline_01()[source]

Test case to ensure that the Genome indexing pipeline code works.

Running the pipeline with the test data from the command line:

runcompss                                                         \
   --lang=python                                                  \
   --library_path=${HOME}/bin                                     \
   --pythonpath=/<pyenv_virtenv_dir>/lib/python2.7/site-packages/ \
   --log_level=debug                                              \
   process_genome.py                                              \
      --taxon_id 9606                                             \
      --genome /<dataset_dir>/Human.GCA_000001405.22.fasta        \
      --assembly GRCh38                                           \
      --file /<dataset_dir>/DRR000150.22.fastq

Sample Data

Uses the genome sequences required by all the tools and pipelines

Hi-C

To run the pipeline test:

pytest tests/test_pipeline_tb.py

Methods

tests.test_pipeline_tb.test_tb_pipeline()[source]

Test case to ensure that the Hi-C pipeline code works.

Running the pipeline with the test data from the command line:

runcompss                                                                 \
   --lang=python                                                          \
   --library_path=/home/compss/bin                                        \
   --pythonpath=/<pyenv_virtenv_dir>//lib/python2.7/site-packages/        \
   --log_level=debug                                                      \
   process_hic.py                                                         \
      --taxon_id 9606                                                     \
      --genome /<dataset_dir>/tb.Human.GCA_000001405.22_gem.fasta         \
      --assembly GRCh38                                                   \
      --file1 /<dataset_dir>/tb.Human.SRR1658573_1.fastq                  \
      --file2 /<dataset_dir>/tb.Human.SRR1658573_2.fastq                  \
      --genome_gem /<dataset_dir>/tb.Human.GCA_000001405.22_gem.fasta.gem \
      --taxon_id 9606                                                     \
      --enzyme_name MboI                                                  \
      --resolutions 10000,100000                                          \
      --windows1 1,100                                                    \
      --windows2 1,100                                                    \
      --normalized 1                                                      \
      --tag tb.Human.SRR1658573                                           \
      --window_type frag                                                  \

Sample Data

Hi-C Test Data

MNase-Seq

To run the pipeline test:

pytest tests/test_pipeline_mnaseseq.py

Methods

tests.test_pipeline_mnaseseq.test_mnaseseq_pipeline()[source]

Test case to ensure that the MNase-seq pipeline code works.

Running the pipeline with the test data from the command line:

runcompss                                                         \
   --lang=python                                                  \
   --library_path=${HOME}/bin                                     \
   --pythonpath=/<pyenv_virtenv_dir>/lib/python2.7/site-packages/ \
   --log_level=debug                                              \
   process_mnaseseq.py                                            \
      --taxon_id 10090                                            \
      --genome /<dataset_dir>/Mouse.GRCm38.fasta                  \
      --assembly GRCm38                                           \
      --file /<dataset_dir>/DRR000386.fastq

RNA-Seq

To run the pipeline test:

pytest tests/test_pipeline_rnaseq.py

Methods

tests.test_pipeline_rnaseq.test_rnaseq_pipeline()[source]

Test case to ensure that the RNA-seq pipeline code works.

Running the pipeline with the test data from the command line:

runcompss                                                         \
   --lang=python                                                  \
   --library_path=${HOME}/bin                                     \
   --pythonpath=/<pyenv_virtenv_dir>/lib/python2.7/site-packages/ \
   --log_level=debug                                              \
   process_rnaseq.py                                              \
      --taxon_id 9606                                             \
      --genome /<dataset_dir>/Human.GRCh38.fasta                  \
      --assembly GRCh38                                           \
      --file /<dataset_dir>/ERR030872_1.fastq                     \
      --file2 /<dataset_dir>/ERR030872_2.fastq

Whole Genome Bisulfate Sequencing (WGBS)

To run the pipeline test:

pytest tests/test_pipeline_wgbs.py

Methods

Sample Data

WGBS Test Data

Tools

As the data stored is only the raw data, each of the sets of tools has been packaged up into a tool chain to run each of the tools without failing. This has been done with the tests.test_toolchains.py script.

python tests/test_toolchains.py --pipeline [genome | chipseq | hic | mnaseseq | rnaseq | wgbs]

This script automates the running of each of the tools that are required for a given pipeline.

Methods

tests.test_toolchains.all_toolchain(verbose=False)[source]

Runs the tests for all of the tools

This set is only required for determining code coverage.

tests.test_toolchains.biobambam_toolchain(verbose=False)[source]

Runs the tests for all of the tools from the BWA pipeline

Runs the following tests:

pytest -m chipseq tests/test_fastqc_validation.py
pytest -m chipseq tests/test_bwa_indexer.py
pytest -m chipseq tests/test_bwa_aligner.py
pytest -m chipseq tests/test_biobambam.py
tests.test_toolchains.bowtie2_toolchain(verbose=False)[source]

Runs the tests for all of the tools from the BWA pipeline

Runs the following tests:

pytest -m bowtie2 tests/test_fastqc_validation.py
pytest -m bowtie2 tests/test_bowtie_indexer.py
pytest -m bowtie2 tests/test_bowtie2_aligner.py
tests.test_toolchains.bwa_toolchain(verbose=False)[source]

Runs the tests for all of the tools from the BWA pipeline

Runs the following tests:

pytest -m bwa tests/test_fastqc_validation.py
pytest -m bwa tests/test_bwa_indexer.py
pytest -m bwa tests/test_bwa_aligner.py
tests.test_toolchains.chipseq_toolchain(verbose=False)[source]

Runs the tests for all of the tools from the ChIP-seq pipeline

Runs the following tests:

pytest -m chipseq tests/test_fastqc_validation.py
pytest -m chipseq tests/test_bwa_indexer.py
pytest -m chipseq tests/test_bwa_aligner.py
pytest -m chipseq tests/test_biobambam.py
pytest -m chipseq tests/test_macs2.py
tests.test_toolchains.genome_toolchain(verbose=False)[source]

Runs the tests for all of the tools from the Genome indexing pipeline

Runs the following tests:

pytest -m genome tests/test_bowtie_indexer.py
pytest -m genome tests/test_bwa_indexer.py
pytest -m genome tests/test_gem_indexer.py
tests.test_toolchains.hic_toolchain(verbose=False)[source]

Runs the tests for all of the tools from the Hi-C pipeline

Runs the following tests:

pytest -m hic tests/test_fastqc_validation.py
pytest -m hic tests/test_gem_indexer.py
pytest -m hic tests/test_tb_full_mapping.py
pytest -m hic tests/test_tb_parse_mapping.py
pytest -m hic tests/test_tb_filter.py
pytest -m hic tests/test_tb_normalize.py
pytest -m hic tests/test_tb_segment.py
pytest -m hic tests/test_tb_generate_tads.py
pytest -m hic tests/test_tb_bin.py
pytest -m hic tests/test_tb_save_hdf5_matrix.py
tests.test_toolchains.idamidseq_toolchain(verbose=False)[source]

Runs the tests for all of the tools from the iDamID-seq pipeline

Runs the following tests:

pytest -m idamidseq tests/test_bwa_indexer.py
pytest -m idamidseq tests/test_bwa_aligner.py
pytest -m idamidseq tests/test_biobambam.py
pytest -m idamidseq tests/test_bsgenome.py
pytest -m idamidseq tests/test_idear.py
tests.test_toolchains.mnaseseq_toolchain(verbose=False)[source]

Runs the tests for all of the tools from the MNase-seq pipeline

Runs the following tests:

pytest -m mnaseseq tests/test_fastqc_validation.py
pytest -m mnaseseq tests/test_bwa_indexer.py
pytest -m mnaseseq tests/test_bwa_aligner.py
pytest -m mnaseseq tests/test_inps.py
tests.test_toolchains.rnaseq_toolchain(verbose=False)[source]

Runs the tests for all of the tools from the RNA-seq pipeline

Runs the following tests:

pytest -m rnaseq tests/test_fastqc_validation.py
pytest -m rnaseq tests/test_kallisto_indexer.py
pytest -m rnaseq tests/test_kallisto_quant.py
tests.test_toolchains.tidy_data()[source]

Runs the tidy_data.sh script

tests.test_toolchains.wgbs_toolchain(verbose=0)[source]

Runs the tests for all of the tools from the WGBS pipeline

Runs the following tests:

pytest -m wgbs tests/test_fastqc_validation.py
pytest -m wgbs tests/test_bs_seeker_filter.py
pytest -m wgbs tests/test_bs_seeker_indexer.py
pytest -m wgbs tests/test_bs_seeker_aligner.py
pytest -m wgbs tests/test_bs_seeker_methylation_caller.py