SMART-RDA: A Galaxy Workflow for RNA-Seq Data Analysis

Abstract

RNA-seq using the Next Generation Sequencing (NGS) approach is a common technology to analyze large-scale RNA transcript data for gene expression studies. However, an appropriate bioinformatics tool is needed to analyze a large amount of transcriptomes data from RNA-seq experiment. The aim of this study was to construct a system that can be easily applied to analyze RNA-seq data. RNA-seq analysis tool as SMART-RDA was constructed in this study. It is a computational workflow based on Galaxy framework to be used for analyzing RNA-seq raw data into gene expression information. This workflow was adapted from a well-known Tuxedo Protocol for RNA-seq analysis with some modifications. Expression value from each transcriptome was quantitatively stated as Fragments Per Kilobase of exon per Million fragments (FPKM). RNA-seq data of sterile and fertile oil palm (Pisifera) pollens derived from Sequence Read Archive (SRA) NCBI were used to test this workflow in local facility Galaxy server. The results showed that differentially gene expression in pollens might be responsible for sterile and fertile characteristics in palm oil Pisifera.

Keywords: FPKM; Galaxy workflow; Gene expression; RNA sequencing.

References
[1] A. Grada and K. Weinbrecht, “Next-generation sequencing: Methodology and application,” Journal of Investigative Dermatology, vol. 133, no. 8, p. e11, 2013.

[2] V. Thakur and R. Varshney, “Challenges and strategies for next generation sequencing (NGS) data analysis,” J Comput Sci Syst Biol, vol. 03, no. 02, pp. 040– 042, 2010.

[3] L. B. B. Martin, Z. Fei, J. J. Giovannoni, and J. K. C. Rose, “Catalyzing plant science research with RNA-seq,” Frontiers in Plant Science, vol. 4, article no. 66, 2013.

[4] M. Garber, M. G. Grabherr, M. Guttman, and C. Trapnell, “Computational methods for transcriptome annotation and quantification using RNA-seq,” Nature Methods, vol. 8, no. 6, pp. 469–477, 2011.

[5] S. M. Rumble, P. Lacroute, A. V. Dalca, M. Fiume, A. Sidow, and M. Brudno, “SHRiMP: Accurate mapping of short color-space reads,” PLoS Computational Biology, vol. 5, no. 5, Article ID e1000386, 2009.

[6] B. Langmead, C. Trapnell, M. Pop, and S. L. Salzberg, “Ultrafast and memory-efficient alignment of short DNA sequences to the human genome,” Genome Biology, vol. 10, no. 3, article no. R25, 2009.

[7] H. Li and R. Durbin, “Fast and accurate short read alignment with Burrows-Wheeler transform,” Bioinformatics, vol. 25, no. 14, pp. 1754–1760, 2009.

[8] C. Trapnell, L. Pachter, and S. L. Salzberg, “TopHat: Discovering splice junctions with RNA-Seq,” Bioinformatics, vol. 25, no. 9, pp. 1105–1111, 2009.

[9] C. Trapnell, D. G. Hendrickson, M. Sauvageau, L. Goff, J. L. Rinn, and L. Pachter, “Differential analysis of gene regulation at transcript resolution with RNA-seq,” Nature Biotechnology, vol. 31, no. 1, pp. 46–53, 2013.

[10] M. H. Schulz, D. R. Zerbino, M. Vingron, and E. Birney, “Oases: Robust de novo RNAseq assembly across the dynamic range of expression levels,” Bioinformatics, vol. 28, no. 8, Article ID bts094, pp. 1086–1092, 2012.

[11] G. Robertson, J. Schein, R. Chiu et al., “De novo assembly and analysis of RNA-seq data,” Nature Methods, vol. 7, no. 11, pp. 909–912, 2010.

[12] M. Griffith, O. L. Griffith, J. Mwenifumbo et al., “Alternative expression analysis by RNA sequencing,” Nature Methods, vol. 7, no. 10, pp. 843–847, 2010.

[13] C. Trapnell, B. A. Williams, G. Pertea et al., “Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation,” Nature Biotechnology, vol. 28, no. 5, pp. 511–515, 2010.

[14] B. Giardine, C. Riemer, R. C. Hardison et al., “Galaxy: A platform for interactive largescale genome analysis,” Genome Research, vol. 15, no. 10, pp. 1451–1455, 2005.

[15] Y. Kodama, M. Shumway, and R. Leinonen, “The sequence read archive: Explosive growth of sequencing data,” Nucleic Acids Research, vol. 40, no. 1, pp. D54–D56, 2012.

[16] D. Blankenberg, A. Gordon, G. Von Kuster et al., “Manipulation of FASTQ data with galaxy,” Bioinformatics, vol. 26, no. 14, Article ID btq281, pp. 1783–1785, 2010.

[17] H. Li, B. Handsaker, A. Wysoker et al., “The Sequence Alignment/Map format and SAMtools,” Bioinformatics, vol. 25, no. 16, pp. 2078–2079, 2009.

[18] P. J. A. Cock, B. A. Grüning, K. Paszkiewicz, and L. Pritchard, “Galaxy tools and workflows for sequence analysis with applications in molecular plant pathology,” PeerJ, vol. 2013, no. 1, article no. e167, 2013.

[19] C. Camacho, G. Coulouris, V. Avagyan et al., “BLAST+: Architecture and applications,” BMC Bioinformatics, vol. 10, article no. 421, 2009.