Consensus among five DEGs identification methods

Diferential Expression Analysis, with a consensus result.

  • Mapping, count and DEGs Analysys in one tool
  • Valid HTML5 code and well structured

Install

Setting up consexpression takes just a few commands. In install section you can see ste by step to run consexpression.

Cross platform

We use Python and R language in development of consexpression. You can run Consexpression on Windows, Mac OS and Linux.

Read the full article

The complete study was be published in decemebr, 2017. PLOS ONE is an journal community working together to advance science for the benefit of society.

Read the full article
Requirements

Consexpression makes use of various tools and, have various requirements:

  • Bowtie 2 is a tool for aligning sequencing reads.
  • TopHat 2, is a splice junction mapper for RNA-Seq reads.
  • R language, is a free software environment for statistical computing and graphics.
  • Python, is a programming language that lets you work more quickly and integrate your systems more effectively.
  • rpy2, is a interface between R an Python languages to benefit from the libraries of one language while working in the other.
  • HTSeq, is a Python framework to work with high-throughput sequencing data.
  • lib-xml2 is known to be very portable, the library should build and work without serious troubles on a variety of systems.
  • libcurl4-openssl-dev Only Linux.

If you have correctly install all requirements, follow the steps below to install consexpression:

Access the R workspace via the terminal: by typing:

You can also access the R workspace through an IDE like RStudio. In R workspace, follow commands:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
    install.packages("XML")
    install.packages("RCurl")
BiocManager::install("DESeq")
BiocManager::install("edgeR")
BiocManager::install("baySeq")
BiocManager::install("EBSeq")
BiocManager::install("samr")
BiocManager::install("NOISeq")
BiocManager::install("limma")
BiocManager::install("DESeq2")
BiocManager::install("devtools")
BiocManager::install("pachterlab/sleuth")
* These commands was tested in R language version 3.6.2

Download or clone consexpression

In terminal:

Go to folder where you like put consexpression code:

cd my_folder
git clone git@github.com:costasilvati/consexpression.git
                            
Specific Instructions
Configure your analyse

Consexpression use a file to configure analyse, this file is located in dao folder, named CONFIG_tool.txt. This file is a parameter used to execute consexpression.

Paremeter Information expected Usage
#=== Section of file All line start with '#' do not is processed
NAME: Name to identify results Will be used like prefix or suffix in generated files
REPLIC: Integer number of biological replicates in each treatment to analyse This nunmer define how columns are expected in table count
GROUP_NUMBER: Whole number indicates number of treatments Is used for compare expression betwen groups.
GROUP_NAMES: Comma-separated list with the name of the treatments Used for make identify replics and treatment
REFERENCE_GENOME: Path to FASTA file with complete genome of organism to mapping reads. Used for align reads
READS_DIRECTORY: Absolute path of folder treatments. Used to identify the location of all FASTQs in the analysis. Within this folder there must be sub-folders separated by treatments (groups).
GROUP_DIRECTORIES: Comma-separated list, with the absolute path of folders with FASTQ files. One folder per group. It is expected to find a folder for each treatment. Each folder must contain the same number of FASTQ files, indicated in replicates.
PAIRED_END: Boolean information [True|TRUE|False|FALSE] Used to identify sequencing RNA-Seq method - consexpression don't make paired end analyse
THREADS: Integer Used to parameterize processing usage by the mapping tool.
MODE: Text, can be one of the options: "union" or "intersection-strict" or "intersection-nonempty". Used by the htseq-count tool, it defines the way to consider the count of mapped reads. Read more at: documentation page.
ANOTATION_FILE: Absolute path of the GTF file linked to the genome file informed in REFERENCE. Used by the htseq-count tool to generate conversion tables based on SAM mapping files.
OUTPUT: Absolute path to save results of expression analysis. Used to write output files for differential expression analysis tools. The file with the genes identified by consensus is also written in this folder.

With the configuration file filled in, navigate via terminal to the consexpression installation folder, and type:

python experiment.py dao/CONFIG_tool.txt
                                
Warning! The configuration file is available with the tool in the project's dao folder, named CONFGI_tool.txt. This file can be in other locations and be renamed, as long as it follows the original training standard. If it is necessary to use this file in another location, remember to inform the file's absolute path when running the tool, replacing dao / CONFIG_toll.txt with the path of your file.

This tool was developed to perform a complete analysis: Mapping, counting, expression analysis and consensus.

The need was identified for many users who had count data and wished to identify only the consensus of their results.

To perform expression analysis only, you should normally fill in the configuration file with all the information from your experiment. Leave only the REFERENCE field of the configuration file empty. In this way, consexpression will identify that there is no need to perform mapping and contact, and will do the expression analysis.

Run the tool normally, with the command:

python experiment.py dao/CONFIG_tool.txt
                                

During execution the tool will ask for the absolute path of the counting table (text file, separated by a comma), type the path of the file and press enter. The analysis of expression and consensus will be performed.