Rhesus Macaque PacBio Iso-Seq analysis ---------------- Overview ---------------- Using PacBio Iso-Seq, over 2.8 million transcript sequencing reads (Circular Consensus Sequence reads, CCSs) were generated, ranging from 300 to 45,549 nucleotides, from four different rhesus macaque tissues (lymph node, peripheral blood mononuclear cells (PBMC), whole blood, and rectum). These CCSs were processed and then aligned to rhesus macaque and human genomes, separately, for comparative analysis of splicing patterns against existing annotation. Using Iso-Seq pipeline and its supporting Cupcake scripts (https://github.com/Magdoll/cDNA_Cupcake), aligned CCSs were processed into unique isoforms. A recent isoform classification tool, SQANTI (https://bitbucket.org/ConesaLab/sqanti), was used to characterize isoforms with over 30 descriptors. Herein lies the sequences of the high quality, full-length isoforms generated as well as their annotation with respect to the current rheMac8 assembly. Also provided are the filtered, high quality CCS reads that were aligned to rhesus macaque. SQANTI classification data is also provided for both rhesus macaque and human. -------------- Data files -------------- 1. FASTA file: rhe.collapsed.min_fl_2.filtered.rep_corrected.fasta This file contains the nucleotide sequences of all final isoforms that were aligned to rhesus macaque. 2. GTF file: rhe.collapsed.min_fl_2.filtered.rep_corrected.gtf This file contains the annotation information for the isoforms aligned to rhesus macaque. This can be viewed in a genome browser using the rheMac8 assembly. 3. FASTQ file: final.filtered.hq.ccs.fastq This file contains the high quality CCS reads that were generated from the standard CCS protocol. Additionally, siamaeric (artificial) reads are removed. 4. TXT file: rhe.collapsed.min_fl_2.filtered.rep_classification.txt This is a text file with all SQANTI descriptors for each isoform aligned to rhesus macaque (rheMac8 with NCBI Macaca mulatta Annotation Release 102) Annotation is available here: ftp://ftp.ncbi.nih.gov/genomes/Macaca_mulatta/GFF/ref_Mmul_8.0.1_top_level.gff3.gz 5. TXT file: hg38.collapsed.min_fl_2.filtered.rep_classification.txt This is a text file with all SQANTI descriptors for each isoform aligned to human (hg38 with Gencode v25 primary assembly annotation). Annotation is available here: ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_25/gencode.v25.primary_assembly.annotation.gtf.gz -------- Raw data -------- Iso-Seq raw reads are available in the NCBI Sequence Read Archive (SRA) under the BioProject accession no. PRJNA389440. ------------------------- More details and citation ------------------------- Brochu H, Tseng E, Smith E, Thomas M, Law L, Picker L, Gale M, Peng X. Resolving Alternative Splicing Patterns in Rhesus Macaque Transcriptomes using Full-Length Transcriptome Sequencing Analysis. Under review. ------------------- Contact ------------------- * Xinxia Peng xpeng5@ncsu.edu * Hayden Brochu hnathan@ncsu.edu