RNA Sequencing Analysis: Everything You Need to Know
There are three different types of RNA molecules: messenger RNA (mRNA), transfer RNA (tRNA), and ribosomal RNA (rRNA). Combined, their role is to bring copies of DNA strands from the nucleus to the ribosomes for protein production. And this complex mixture of moving parts work together in the gelatinous fluid of a cell to generate life as we know it…
No wonder it’s worth sequencing and analyzing.
DNA vs RNA vs Proteins | A Quick Primer
To get everybody on the same page, here’s a quick primer that everybody can have fun decoding. These three compounds—DNA, RNA, and proteins—have a chain reaction-like relationship.
- DNA is the blueprint for the entire genome. Any and all genetic information for the entire organism is stored with the DNA. For humans, that includes everything from how quickly our fingernails grow to what color hair our mustache will be. All of it depends on the genes contained in every cell.
- RNA is (simplistically speaking) a small strand of DNA. It’s another nucleic acid chain that takes the information from DNA and uses it to create proteins. This process of DNA to RNA is called transcription, and it allows information from DNA (which is stuck in the nucleus) to exit the nucleus and eventually go from info to substance. There are three different types of RNA: messenger, ribosomal, and transfer.
- The proteins are created in the ribosomal factories by the ribosome and three different types of RNA. The process by which this occurs is called translation, the protein then performs its function (whatever that may be).
RNA stands as the link between this profound library of information (DNA) and the fruits of that information (proteins).
Sequencing these strands of nucleopeptides, combined with the current literature of DNA, allows researchers to gather various insights into this bridge between theory and actualization.
With this, let’s move to RNA sequencing.
The Ultimate Guide to RNA Sequencing
RNA Sequencing (RNA-seq) is the technique designed to analyze the transcriptome of gene expression within RNA. This involves both the types of RNA sequences and the quantities of each by using next generation sequencing.
The steps to RNA sequencing analysis include:
- Experimental setup: Identifying what you’re measuring
- Preparing the RNA sample
- Creating the sequencing libraries
- RNA sequencing
- Analyze the RNA sequencing results
Step 1: The Setup
Due to the amount of time and energy (and research grant money) needed to pull off just one experiment, it’s important that the setup is done correctly. For starters, determine whether the experiment of RNA sequencing is meant to provide qualitative or quantitative feedback:
- Qualitative results – Also known as annotative information, this offers a more exploratory approach to RNA sequencing analysis. It is more likely used for identifying anomalies, rare transcripts, and isoforms. It does so by reading genes and gene architecture—instead of focusing on the amounts of each gene expression.
- Quantitative results – Also known as differential gene expression, quantitative results offer you well-defined measurements that can be used to infer gene counts, and the variances associated. This includes the major three:
- Biological variance: variance among control groups or between treatments
- Sampling variance: only a small fraction of genes are sequenced in a sample
- Technical variance: variance that arises from library creation
By defining the variables properly beforehand (target transcript, effect sizes, measured attributes), you should be able to determine how “relevant” the data will be from the onset. This will also allow you to prepare repeat experiments in order to have a statistically significant amount of information to publish meaningful results.
Step 2: Preparing the RNA Sample
Once set up, now it’s time to prepare the sample of RNA you want sequenced. This takes a bit more than just grabbing the nearest assay and pipette. The idea here is to have a healthy, viable sample of RNA that’s isolated and purified.
This happens one step at a time using (typically) one of two extraction methods:
- Solid-phase extraction – RNA binds to silica fibers and is then washed free of contaminants before being mixed back into the solution.
- Organic extraction – This uses lysis properties to isolate the RNA from the cell and then filters the contaminants out.
Step 3: Creating the Sequencing Library
In order for the data and eventual RNA-seq analysis to make sense, the results have to be measured against something. This something is the sequencing library.
Now that you have your isolated and purified RNA sample, these RNA strands (once activated) will begin creating peptide chains and releasing hydrogen into the solution. Semiconductors will be able to measure this reaction and convert the information into data (0’s and 1’s). The sequencing library then acts as a barcode scanner for each of the possible reactions.
Most next generation sequencing technologies will build these libraries for you—utilizing the concept of polymerase chain reaction (PCR) to amplify the copies of RNA available in the sample.
Part 4: RNA Sequencing
Next generation sequencing technologies are rapidly developing new and efficient methods of sequencing RNA. Most of the available technologies are able to provide the RNA sequencing and data needed for analysis. The one you choose depends on a number of factors—however, once decided, ensure that you have the following information available:
- Whether you need single-end or paired-end sequencing
- The length of each RNA read
- The primer binding sequence of the priming site
Part 5: Analyze the RNA Sequencing Results
Sequencing genomic information and extracting usable data is truly at the forefront of modern science. The staggering amount of data contained within DNA—within a single cell—is something that borders on the spiritual phenomenon. And this is revealed when sequencing RNA.
Biologists, mathematicians, computer scientists, and statisticians are needed to interpret the data. It’s “Big Data” meets “microbiology” and manipulating the data to curate usable results takes a substantial amount of computing power and creative imagination.
To that end, a few key RNA sequencing analysis tools have been developed.
- De Novo assembly
- Aligning results to RNA-seq library reference
Prior to using either tool, the data must be processed using the following steps:
- Demultiplexing the sample results. Often the samples are a heterogeneous mixture of multiple tests (coded for each mixture). These need to be decoded.
- The adaptor sequences used to code each mixture are then removed from the data.
- If sequencing errors are suspected in certain reads, they must be filtered out or “trimmed.”
- Normalizing the data using statistical analysis tools allows you to better read millions of data points.
Once complete, the analysis tools below help quantify the results when there is no known library reference (de novo) and when there is one (aligning to reference).
De Novo Assembly
When the genome sequence is novel and has no reference built for it, de novo assembly helps you build a reference transcriptome (hence, “de novo” translating to “starting from anew”). This process, while useful, is incredibly tricky because researchers must manually wade through variants, polymorphism, and any sequencing errors. Determining what is error and what is part of the sequencing makes this a pain-staking process. That’s why, for de novo assembly, you need longer reads.
Aligning Results to Library Reference
When the RNA sequencing results in short reads that can be mapped to the reference library, computational programs can do this quite effectively. The same errors continue to arise in this referencing system—variants, polymorphism, and sequencing errors—however, using statistical approximations, most of these can be filtered out. Additionally, by manipulating the parameters in the experiment and using different statistical models, you can slowly start to better analyze the sequencing data.
Sorting Your Cells | The Preliminary “Step 0” in RNA Sequencing
Prior to the steps listed above, there is a step 0 that may be involved in your workflow—sorting the cells in your sample.
Experiments on homogenous populations make analysis significantly easier, because it removes extra variables from the testing. Using flow cytometers have often been attributed with cell lysis and apoptosis from the internal pressures. This is true for industry-standard flow cytometers that use internal pressures of 20 psi or more.
However, NanoCellect has been changing the industry standard. Their WOLF Cell Sorter runs on internal pressures that are less than 2 psi, making them ten times gentler than the next gentle flow cytometer. In addition, it doesn’t sacrifice efficiency, and it comes with other helpful qualities, like:
- Disposable cartridges: no need to worry about cross contamination between samples, simply switch out for a new one.
- Small and portable: at less than 2 cubic feet it can fit on the benchtop of any lab.
- Intuitive software: it has all the advanced features researchers want, and the convenience beginners need.
To keep your cells happy when tackling RNA sequencing analysis, consider NanoCellect. Here, cells are sorted, counted, and kept viable.
- NCBI. RNA Sequencing: Platform Selection, Experimental Design, and Data Interpretation. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3426205/
- University of Oregon. RNA-seqlopedia. https://rnaseq.uoregon.edu/#exp-design
- Technology Networks. DNA vs. RNA – 5 Key Differences and Comparison. https://www.technologynetworks.com/genomics/lists/what-are-the-key-differences-between-dna-and-rna-296719
- Technology Networks. RNA-seq: Basics, Applications and Protocol. https://www.technologynetworks.com/genomics/articles/rna-seq-basics-applications-and-protocol-299461
- NCBI. RNA Sequencing and Analysis. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4863231/