Quick NGS All-in-one data processing for Next-Generation Sequencing

Setting up the server environment


Prerequisites for using QuickNGS

To be operated in a production mode, the software requires a login to a compute cluster with a Linux operation system and one of the widely adopted job scheduling systems such as Torque or SLURM. To be really productive, the cluster should have at least 12 compute nodes available for computation. As some of the tools integrated in QuickNGS have high memory demands, at least some of these nodes should have 32 GB or more main memory. Depending on the data load of your lab, there should be 1 to 5 TB memory available on a parallel filesystem shared by all compute nodes.
Furthermore, the database-centered organization of the software requires a login to an empty database on a MySQL server. As the database will be constantly growing with increasing number of data samples processed, there should be at least 10 GB available for the tablespace on the hard drive. To enable proper connections between the QuickNGS software ifself and the MySQL database, a functional MySQL client software should be available on the cluster system. In addition, the full usability including the very efficient end-user web interface requires a pre-configured Apache web server.


Installation of the system

In this manual, we refer to two types of directories on your compute cluster, namely the installation directory (<installdir>) and the big data directory (<datadir>). The installation directory contains the QuickNGS code and the software which the framework is based on, whereas the big data directory contains the output of the QuickNGS runs and the reference data downloaded from public repositories. The reason to split the location of the software into two directories is that the structure and contents of the big data will never change when a new QuickNGS release is being setup into the installation directory. QuickNGS runs that have been processed with older versions will thus always keep their structure in an unchanged directory.

Database configuration. Before setting up the software itself, the database that is being distributed together with the code has to be installed on a MySQL server with an existing empty database. We recommend using PHPMyAdmin for all interactive database operations. After login, open and run the SQL script QuickNGS.sql in your database. It will create the complete database structure needed to operate the QuickNGS system.

Software installation. To start the QuickNGS installation procedure, run the installer.sh script and follow the instructions. In particular, you have to specify the login data to your database. The database-related options can be changed at a later time point by simply editing the quickngs.config file in the installation directory. If you are planning to configure your system for usage with a web server, you need to specify a directory on this server to which your <datadir>/web subdirectory will be mounted, e.g. /var/www/html/quickngs. Apart from this, most options can be answered by just leaving the default values.

Extra software required by QuickNGS Cancer. Among the tools used by the QuickNGS Cancer platform are The Genome Analysis Toolkit and MuTect which require user registration. For non-profit research, the registration is possible at no charge. For the QuickNGS installation, this implies that these two tools have to be downloaded separately and put into any directory which can be specified when running the QuickNGS installer.

Web server configuration. To mount the <datadir>/web directory to some sub-folder of the Apache directory on your webserver, we recommend using sshfs: A typical command to be run on the web server would then be

sshfs user@cluster:quickngs/web/ /var/www/html/quickngs \
   -o allow_other,uid=1000,gid=1000,umask=002

where 1000 is the uid and gid of a user that is allowed to write to /var/www/html/quickngs on the webserver. The mount point /var/www/html/quickngs must be the same as the one specified during the installation process.

The web server will be used for two purposes: The pipeline operators can use the QuickNGS database interface to edit the most important database tables at the URL http://<yourwebserver>/quickngs/db. It allows you to add, edit or delete sample and sample group specifications as well as informations on the labs you are collaborating with. Second, all analysis results can be presented to your collaboration partners with the end-user interface. Any time an analysis run has been finished, the results can be accessed on the personalized and login-protected page http://<yourwebserver>/quickngs/<login>.


Starting a test run

As test cases for a new QuickNGS installation, we have chosen data sets from the NCBI Short Read Archive (SRA) at the accession numbers SRP011390 (RNA-Seq), SRP043191 (miRNA-Seq), SRP007261 (ChIP-Seq) and SRP020555 (WGS). The data are mirrored here in FastQ format for more convenient access. Please download the FastQ files to your compute cluster and change the file paths in the Samples section of the database interface (or in the NGSSamples table if you do not have a web server installed). All other meta data are included in the default structure of the MySQL database. You can then start the analysis by creating links to the files in the stack directory:

cd <datadir>/stack/new
ln -s <path_to_data>/*.fq .

As at this point, CRON jobs for automated starting of the workflows have probably not yet been created, you need to manually start the wrapper script of the RNA-Seq pipeline:

cd <installdir>/scripts/rnaseq
./start_RNAseq.sh

This will run the complete RNA-Seq workflow from basic QC to database export and upload into your web interface.


Previous topic: Introduction
Next topic: Using QuickNGS as a production environment

News

June 6th, 2017: Our paper on the cancer genome analysis platform QuickNGS Cancer has been published in Human Mutation.

September 30th, 2016: Please access the 'Multi-Layer Integration' area to combine RNA-Seq and ChIP-Seq data with an early release of our new multi-OMICS data integration platform.

June 23rd, 2016: Tools for gene set enrichment analysis using GO terms and KEGG pathways have been adopted into QuickNGS from version 1.2.2 on.

February 15th, 2016: The latest QuickNGS release now includes QuickNGS Cancer, a new platform specifically designed for cancer genome analysis.

January 4th, 2016: Bluebee High-Performance Genomics B.V., Delft, The Netherlands, have adopted QuickNGS into their cloud-based NGS analysis solution.

September 23rd, 2015: Please refer to our new FAQ to handle common problems with the QuickNGS results.

July 31th, 2015: The QuickNGS paper was published in BMC Genomics! Please cite this paper for all analyses based on the QuickNGS system.

July 11th, 2014: The first public version of the QuickNGS source code has just been released! Please click here to download the software.