Rare variations in genomic data have long been the bugaboo of computational biologists, those soldiers at the forefront of personalized medicine. These needle-in-the-haystack mutations are nearly impossible to detect by traditional data science approaches (genomics, GWAS, etc.), but can have a huge, even dominant effect on the biology being studied. With the advantage of being able to quickly obtain the entire genome DNA sequences of people, it was originally assumed, rather intuitively, that a change in DNA that was common in a group of people that all have the same disease and absent in normal people must have something to do with the disease.
This approach, however, yielded surprisingly little fruit in terms of figuring out how the disease worked or identifying new drug targets. It has become increasingly clear that the strength of the DNA change is more important than how common it is in terms of its relevance to the disease. But strong changes were often too uncommon to be found hidden in the incredible pile of DNA sequence information by known mathematical tools. New information is needed to provide statistical and mathematical approaches with a greater degree of discrimination to mine out those rare variations.
Now, in a potential breakthrough, researchers investigated how the variation occurs across tissues. The findings have been published in a Nature paper, “The impact of rare variation on gene expression across tissues.” The group from Stanford employed whole genome and multi-tissue RNA-sequencing data from the tissue library GTEx to identify individuals with either very high or low expression of a gene compared with the population, or gene expression outliers, across 44 human tissues. They then incorporated this new information into a Bayesian statistical model that uses expression data to predict regulatory effects for rare variants. The tissue differential provided an new independent variable, or feature, to bring high impact rare variants out of the noise, a major challenge in the field.
Genecentrix was pleased to find a kindred spirit in these scientists. Our interest in integrating tissue-specificity molecular data with existing approaches to advance drug discovery has been long-standing. The GeneCentrix gene profiler takes the method a step further by point-and-click mapping any gene across tissues to provide information quickly on this important new feature.