Authors: Quek ZB Randolph & Huang Danwei (2019) Publication: Effects of missing data and data type on phylotranscriptomic analysis of stony corals (Cnidaria: Anthozoa: Scleractinia) Journal: Molecular Phylogenetics and Evolution; doi: https://doi.org/10.1016/j.ympev.2019.01.012 Corresponding author: randolphquek@u.nus.edu In this study, we present 16 new scleractinian transcriptomes, many of which have not been previously seqeuenced. Samples are collected from from Singapore, spanning across the scleractinian phylogeny as per Kitahara et al. (2016: The New Systematics of Scleractinia: Integrating Molecular and Morphological Evidence; in The Cnidaria, Past, Present and Future, edited by Stefano Goffredo and Zvy Dubinsky). Protocol: Samples were collected from adult colonies in Singapore, via intertidal and subtidal sampling. Coral tissue was stored in -80 degree Celcius until extraction. Full extraction protocol and library prep can be found in the publication. Briefly, we used a modified Trizol extraction method for extraction of total RNA. Following which, the NEBNext Ultra RNA Library Prep Kit for Illumina was used in library prep, and all 16 libraries were pooled for sequencing across four HiSeq 2500 lanes. Raw reads are deposited into SRA under Bioproject: PRJNA512601 Full details on pipeline used can be found in the publication. Briefly, reads were trimmed using Trimmomatic v0.36 under default settings (Bolger et al., 2014). Trimmed reads were assembled using Trinity v2.4.0 under default settings (Grabherr et al., 2011; Haas et al., 2013), and open reading frames (ORFs) from transcripts were predicted by TransDecoder v4.1.0 (Haas et al., 2013). CD-HIT-EST v4.7 (Fu et al., 2012) was then used to remove duplicate transcripts. To filter for Symbiodiniaceae reads, we mapped the assembled transcripts to 20 coral transcriptomes from reefgenomics.org (Liew et al., 2016), using Qiagen CLC Genomics Workbench v9.5.4 (80% sequence similarity over 50% length; identify putatively coral transcripts) followed by mapping the resulting transcripts to seven transcriptomes and three genomes of Symbiodiniaceae under the same settings. We refer the reader to the supplementary material of the publication for more information (doi: https://doi.org/10.1016/j.ympev.2019.01.012). In terms of quality of assembly, full details of quality comparison can be found in the comparison in the supplementary material (doi: https://doi.org/10.1016/j.ympev.2019.01.012). IMPORTANT: The **Porites lobata** sample is poorly assembled, and we strongly recommend using other available assemblies out there instead for downstream applications and analysis. Samples from this study are: Acropora millepora Astreopora expansa Cyphastrea serailia Diploastrea heliopora Fimbriaphyllia ancora Galaxea astreata Goniastrea retiformis Goniopora columna Herpolitha limax Lobophyllia radians Pachyseris speciosa Platygyra sinesis Plesiastrea versipora ***Porites lobata*** #POORLY ASSEMBLED; STRONGLY RECOMMENDED TO USE OTHER AVAILABLE, HIGH QUALITY Porites lobata TRANSCRIPTOMES Oulastrea crispata Turbinaria mesenterina References: Bolger, A. M., Lohse, M., & Usadel, B. (2014). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics, 30(15), 2114–2120. Fu, L., Niu, B., Zhu, Z., Wu, S., & Li, W. (2012). CD-HIT: accelerated for clustering the nextgeneration sequencing data. Bioinformatics, 28(23), 3150–3152. Grabherr, M. G., Haas, B. J., Yassour, M., Levin, J. Z., Thompson, D. A., Amit, I., … Regev, A. (2011). Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nature Biotechnology, 29(7), 644–652. Haas, B. J., Papanicolaou, A., Yassour, M., Grabherr, M., Blood, P. D., Bowden, J., … Regev, A. (2013). De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nature Protocols, 8(8), 1494–1512. Liew, Y. J., Aranda, M., & Voolstra, C. R. (2016). Reefgenomics.Org - a repository for marine genomics data. Database (Oxford), 2016(2016), baw152.