Testing¶
To ensure that the code written to process the data within the MuG VRE works it is essential to have a testing framework to ensure that as the code evolves it does not break the funcitonality of the pipelines and tools. There are 2 key parts to this:
- Sample data
- Runnable scripts
For each tool within the mg-process-fastq repository there is an initial set of sample data and matching tests for each tool and pipeline.
Sample Data¶
Pipelines¶
There is a test for each of the tools. This uses the “process” scripts to run each of the tools. This is to ensure that the pipeline scripts are able to call out to each of the tools and the correct parameters are handed to each one.
ChIP-Seq¶
To run the pipeline test:
pytest tests/test_pipeline_chipseq.py
Methods¶
-
tests.test_pipeline_chipseq.
test_chipseq_pipeline_00
()[source]¶ Test case to ensure that the ChIP-seq pipeline code works.
Running the pipeline with the test data from the command line:
runcompss \ --lang=python \ --library_path=${HOME}/bin \ --pythonpath=/<pyenv_virtenv_dir>/lib/python2.7/site-packages/ \ --log_level=debug \ process_chipseq.py \ --taxon_id 9606 \ --genome /<dataset_dir>/Human.GCA_000001405.22.fasta \ --assembly GRCh38 \ --file /<dataset_dir>/DRR000150.22.fastq
-
tests.test_pipeline_chipseq.
test_chipseq_pipeline_01
()[source]¶ Test case to ensure that the ChIP-seq pipeline code works.
Running the pipeline with the test data from the command line:
runcompss \ --lang=python \ --library_path=${HOME}/bin \ --pythonpath=/<pyenv_virtenv_dir>/lib/python2.7/site-packages/ \ --log_level=debug \ process_chipseq.py \ --taxon_id 9606 \ --genome /<dataset_dir>/Human.GCA_000001405.22.fasta \ --assembly GRCh38 \ --file /<dataset_dir>/DRR000150.22.fastq
Sample Data¶
iDamID-Seq¶
To run the pipeline test:
pytest tests/test_pipeline_idamidseq.py
Methods¶
-
tests.test_pipeline_idamidseq.
test_idamidseq_pipeline_00
()[source]¶ Test case to ensure that the ChIP-seq pipeline code works.
Running the pipeline with the test data from the command line:
runcompss \ --lang=python \ --library_path=${HOME}/bin \ --pythonpath=/<pyenv_virtenv_dir>/lib/python2.7/site-packages/ \ --log_level=debug \ process_damidseq.py \ --taxon_id 9606 \ --genome /<dataset_dir>/Human.GCA_000001405.22.fasta \ --assembly GRCh38 \ --file /<dataset_dir>/DRR000150.22.fastq
-
tests.test_pipeline_idamidseq.
test_idamidseq_pipeline_01
()[source]¶ Test case to ensure that the ChIP-seq pipeline code works.
Running the pipeline with the test data from the command line:
runcompss \ --lang=python \ --library_path=${HOME}/bin \ --pythonpath=/<pyenv_virtenv_dir>/lib/python2.7/site-packages/ \ --log_level=debug \ process_damidseq.py \ --taxon_id 9606 \ --genome /<dataset_dir>/Human.GCA_000001405.22.fasta \ --assembly GRCh38 \ --file /<dataset_dir>/DRR000150.22.fastq
Sample Data¶
Genome Indexing¶
To run the pipeline test:
pytest tests/test_pipeline_genome.py
Methods¶
-
tests.test_pipeline_genome.
test_genome_pipeline_00
()[source]¶ Test case to ensure that the Genome indexing pipeline code works.
Running the pipeline with the test data from the command line:
runcompss \ --lang=python \ --library_path=${HOME}/bin \ --pythonpath=/<pyenv_virtenv_dir>/lib/python2.7/site-packages/ \ --log_level=debug \ process_genome.py \ --taxon_id 9606 \ --genome /<dataset_dir>/Human.GCA_000001405.22.fasta \ --assembly GRCh38 \ --file /<dataset_dir>/DRR000150.22.fastq
-
tests.test_pipeline_genome.
test_genome_pipeline_01
()[source]¶ Test case to ensure that the Genome indexing pipeline code works.
Running the pipeline with the test data from the command line:
runcompss \ --lang=python \ --library_path=${HOME}/bin \ --pythonpath=/<pyenv_virtenv_dir>/lib/python2.7/site-packages/ \ --log_level=debug \ process_genome.py \ --taxon_id 9606 \ --genome /<dataset_dir>/Human.GCA_000001405.22.fasta \ --assembly GRCh38 \ --file /<dataset_dir>/DRR000150.22.fastq
Sample Data¶
Uses the genome sequences required by all the tools and pipelines
Hi-C¶
To run the pipeline test:
pytest tests/test_pipeline_tb.py
Methods¶
-
tests.test_pipeline_tb.
test_tb_pipeline
()[source]¶ Test case to ensure that the Hi-C pipeline code works.
Running the pipeline with the test data from the command line:
runcompss \ --lang=python \ --library_path=/home/compss/bin \ --pythonpath=/<pyenv_virtenv_dir>//lib/python2.7/site-packages/ \ --log_level=debug \ process_hic.py \ --taxon_id 9606 \ --genome /<dataset_dir>/tb.Human.GCA_000001405.22_gem.fasta \ --assembly GRCh38 \ --file1 /<dataset_dir>/tb.Human.SRR1658573_1.fastq \ --file2 /<dataset_dir>/tb.Human.SRR1658573_2.fastq \ --genome_gem /<dataset_dir>/tb.Human.GCA_000001405.22_gem.fasta.gem \ --taxon_id 9606 \ --enzyme_name MboI \ --resolutions 10000,100000 \ --windows1 1,100 \ --windows2 1,100 \ --normalized 1 \ --tag tb.Human.SRR1658573 \ --window_type frag \
Sample Data¶
MNase-Seq¶
To run the pipeline test:
pytest tests/test_pipeline_mnaseseq.py
Methods¶
-
tests.test_pipeline_mnaseseq.
test_mnaseseq_pipeline
()[source]¶ Test case to ensure that the MNase-seq pipeline code works.
Running the pipeline with the test data from the command line:
runcompss \ --lang=python \ --library_path=${HOME}/bin \ --pythonpath=/<pyenv_virtenv_dir>/lib/python2.7/site-packages/ \ --log_level=debug \ process_mnaseseq.py \ --taxon_id 10090 \ --genome /<dataset_dir>/Mouse.GRCm38.fasta \ --assembly GRCm38 \ --file /<dataset_dir>/DRR000386.fastq
Sample Data¶
RNA-Seq¶
To run the pipeline test:
pytest tests/test_pipeline_rnaseq.py
Methods¶
-
tests.test_pipeline_rnaseq.
test_rnaseq_pipeline
()[source]¶ Test case to ensure that the RNA-seq pipeline code works.
Running the pipeline with the test data from the command line:
runcompss \ --lang=python \ --library_path=${HOME}/bin \ --pythonpath=/<pyenv_virtenv_dir>/lib/python2.7/site-packages/ \ --log_level=debug \ process_rnaseq.py \ --taxon_id 9606 \ --genome /<dataset_dir>/Human.GRCh38.fasta \ --assembly GRCh38 \ --file /<dataset_dir>/ERR030872_1.fastq \ --file2 /<dataset_dir>/ERR030872_2.fastq
Sample Data¶
Tools¶
As the data stored is only the raw data, each of the sets of tools has been packaged up into a tool chain to run each of the tools without failing. This has been done with the tests.test_toolchains.py script.
python tests/test_toolchains.py --pipeline [genome | chipseq | hic | mnaseseq | rnaseq | wgbs]
This script automates the running of each of the tools that are required for a given pipeline.
Methods¶
tests.test_toolchains.
all_toolchain
(verbose=False)[source]¶Runs the tests for all of the tools
This set is only required for determining code coverage.
tests.test_toolchains.
biobambam_toolchain
(verbose=False)[source]¶Runs the tests for all of the tools from the BWA pipeline
Runs the following tests:
pytest -m chipseq tests/test_fastqc_validation.py pytest -m chipseq tests/test_bwa_indexer.py pytest -m chipseq tests/test_bwa_aligner.py pytest -m chipseq tests/test_biobambam.py
tests.test_toolchains.
bowtie2_toolchain
(verbose=False)[source]¶Runs the tests for all of the tools from the BWA pipeline
Runs the following tests:
pytest -m bowtie2 tests/test_fastqc_validation.py pytest -m bowtie2 tests/test_bowtie_indexer.py pytest -m bowtie2 tests/test_bowtie2_aligner.py
tests.test_toolchains.
bwa_toolchain
(verbose=False)[source]¶Runs the tests for all of the tools from the BWA pipeline
Runs the following tests:
pytest -m bwa tests/test_fastqc_validation.py pytest -m bwa tests/test_bwa_indexer.py pytest -m bwa tests/test_bwa_aligner.py
tests.test_toolchains.
chipseq_toolchain
(verbose=False)[source]¶Runs the tests for all of the tools from the ChIP-seq pipeline
Runs the following tests:
pytest -m chipseq tests/test_fastqc_validation.py pytest -m chipseq tests/test_bwa_indexer.py pytest -m chipseq tests/test_bwa_aligner.py pytest -m chipseq tests/test_biobambam.py pytest -m chipseq tests/test_macs2.py
tests.test_toolchains.
genome_toolchain
(verbose=False)[source]¶Runs the tests for all of the tools from the Genome indexing pipeline
Runs the following tests:
pytest -m genome tests/test_bowtie_indexer.py pytest -m genome tests/test_bwa_indexer.py pytest -m genome tests/test_gem_indexer.py
tests.test_toolchains.
hic_toolchain
(verbose=False)[source]¶Runs the tests for all of the tools from the Hi-C pipeline
Runs the following tests:
pytest -m hic tests/test_fastqc_validation.py pytest -m hic tests/test_gem_indexer.py pytest -m hic tests/test_tb_full_mapping.py pytest -m hic tests/test_tb_parse_mapping.py pytest -m hic tests/test_tb_filter.py pytest -m hic tests/test_tb_normalize.py pytest -m hic tests/test_tb_segment.py pytest -m hic tests/test_tb_generate_tads.py pytest -m hic tests/test_tb_bin.py pytest -m hic tests/test_tb_save_hdf5_matrix.py
tests.test_toolchains.
idamidseq_toolchain
(verbose=False)[source]¶Runs the tests for all of the tools from the iDamID-seq pipeline
Runs the following tests:
pytest -m idamidseq tests/test_bwa_indexer.py pytest -m idamidseq tests/test_bwa_aligner.py pytest -m idamidseq tests/test_biobambam.py pytest -m idamidseq tests/test_bsgenome.py pytest -m idamidseq tests/test_idear.py
tests.test_toolchains.
mnaseseq_toolchain
(verbose=False)[source]¶Runs the tests for all of the tools from the MNase-seq pipeline
Runs the following tests:
pytest -m mnaseseq tests/test_fastqc_validation.py pytest -m mnaseseq tests/test_bwa_indexer.py pytest -m mnaseseq tests/test_bwa_aligner.py pytest -m mnaseseq tests/test_inps.py
tests.test_toolchains.
rnaseq_toolchain
(verbose=False)[source]¶Runs the tests for all of the tools from the RNA-seq pipeline
Runs the following tests:
pytest -m rnaseq tests/test_fastqc_validation.py pytest -m rnaseq tests/test_kallisto_indexer.py pytest -m rnaseq tests/test_kallisto_quant.py
tests.test_toolchains.
wgbs_toolchain
(verbose=0)[source]¶Runs the tests for all of the tools from the WGBS pipeline
Runs the following tests:
pytest -m wgbs tests/test_fastqc_validation.py pytest -m wgbs tests/test_bs_seeker_filter.py pytest -m wgbs tests/test_bs_seeker_indexer.py pytest -m wgbs tests/test_bs_seeker_aligner.py pytest -m wgbs tests/test_bs_seeker_methylation_caller.py