Youssef Idaghdour, researcher and Assistant Professor of Biology at New York University Abu Dhabi.
Delving deep into a public health imperative
A New York University Abu Dhabi research team employs advanced analytic tools to better understand the interaction of genomes and environment
New York University Abu Dhabi
|Challenge||Unravel the mysteries of extremely large data sets related to the transmission of malaria.|
|Solution||JMP Genomics workflows and graphical features allow NYU Abu Dhabi researchers to more quickly and effectively make sense of their data.|
|Results||Discoveries are being made that could lead to the control, and perhaps eradication, of malaria.|
According to the Centers for Disease Control and Prevention, reports of what we now know as malaria can be traced back to 2700 B.C., when symptoms were described in the oldest known medical book, the Nei Ching. Hippocrates later wrote of the symptoms of an insect-borne disease that took a devastating toll on ancient Greece. And some 20 centuries later, Shakespeare described a similar malady in The Tempest, Act II, Scene II: “All the infections that the sun sucks up/From bogs, fens, flats, on Prosper fall and make him/By inch-meal a disease!”
Malaria has a long history, indeed, one that Youssef Idaghdour, a researcher and Assistant Professor of Biology at New York University Abu Dhabi, aims to help bring to a halt. His research in general is about understanding how genomes and the environment interact, and he and his team are currently studying malaria.
Malaria is caused by a parasite that infects a particular type of mosquito that in turn bites humans, resulting in anemia, high fever, shaking chills and flu-like illness. If untreated, malaria can be fatal. According to the World Health Organization (WHO), the disease caused an estimated 438,000 deaths globally in 2015; more than two-thirds were children under the age of five. Africa is the continent hardest hit by malaria, and it’s there that Idaghdour now focuses his research.
This research involves the study of the genetic basis of complex traits and their relevance to human health. “We know that complex traits are not a product of just genes or the environment, but both, and that the effects can be interactive,” Idaghdour explains.
Workflows expedite insights into extremely large data sets
Malaria is an especially complex disease to study, requiring that researchers like Idaghdour account for parasite, mosquito and human genomes, as well as the environment.
The NYU Abu Dhabi team is collecting blood samples from 150 children in Burkina Faso before and during infection and after treatment. They gather everything required to conduct a robust statistical analysis – age, where they live, how many parasites are present in the blood and more. From each sample, the team collects DNA and RNA. Then comes the genomic profiling of not only the human DNA and RNA but that of the parasite as well.
“We look at the entire genome,” Idaghdour says. “So just imagine how much data is involved. We’ve generated whole genome sequencing data from which we focus on around 10 million DNA variants per individual in addition to several tens of thousands of data points corresponding to expression levels of messenger RNA of both human and parasite.” In addition to DNA variants and messenger RNA, there are also several thousand microRNA and 866,000 epigenetic markers.
JMP® Genomics helps cut to the chase. Integrating seamlessly with the laboratory’s dedicated SAS server, JMP Genomics boosts the team’s statistical discovery capabilities with workflows specially tailored to genomic research. With SAS and JMP Genomics integration, Idaghdour and his colleagues combine advanced genetic and gene expression analyses with the robust foundation for statistical testing that is characteristic of SAS solutions.
Pairwise relatedness matrix generated using whole genome genotyping data. The matrix is used as a random effect in Q-K Mixed Model to account for relatedness in association mapping with various clinical phenotypes.
Creating workflows to meet specific needs
The complexity of the analytics – with data from multiple genomic sources and the interaction of the environment – is such that “we really need robust statistical pipelines,” Idaghdour says. “JMP Genomics provides several validated and tested workflows with robust statistical features and methods. JMP Genomics and SAS together really speed up the process.”
All the data is run through quality control procedures in JMP Genomics. For expression and epigenetic data, these include normalization, distribution analysis and principal component analysis. QC is followed by supervised analysis, primarily using linear mixed models. For genotyping data, the team runs marker properties followed by association mapping and eQTL analysis.
“The real beauty of it,” Idaghdour continues, is that he and his colleagues are able to create workflows specific to their own needs. “We can then share those with our collaborators and our students – it’s a very neat way in terms of allowing us to adapt the tool to meet our needs.”
Furthermore, “we’re really in love with the graphical features provided in JMP Genomics.” Idaghdour’s research entails both exploratory and analytical phases, and graphical options ease the movement between the two.
“Given the amount of data we’re analyzing, it’s easy to miss interesting hidden patterns in the data,” Idaghdour says. “We like to start our investigations by using unsupervised or exploratory analysis.” JMP Genomics then allows them, for example, to take the graphical outputs of this analysis and select groups of genes or markers for focused analyses. It’s straightforward and efficient, given that the tables and graphs are linked and features can be selected, highlighted and put into subsets for further analysis.
“For example, I might want to examine how both human and parasite gene expression levels change over time. It’s a complex thing to do because we have tens of thousands of human transcripts and more than 5,000 in the parasite.” JMP Genomics graphical features allow the team to interactively visualize the dynamics of these tens of thousands of genes.
The team uses the JMP Genomics Cross Correlation feature with data from both the human host and the parasite. “Cross Correlation is very useful to us,” Idaghdour explains, “because of the various data sets we generate from the same individuals. It allows us to test for correlations between gene expression and epigenetic data as well as other quantitative data.”
Because the children the team is testing are from two villages with high levels of relatedness, they use Q-K mixed model analysis to test for associations between relatedness and various quantitative traits in the data. They also use K-Means clustering to detect consistent patterns of change over time in both the host and parasite.
Heatmap visualizing the results of cross-correlations between over 15,000 gene expression traits (columns) and 13 quantitative traits (rows).
Making gains on a public health imperative
In 1955, the WHO envisioned a global eradication of malaria. Efforts were focused on house spraying, drug treatment and surveillance. The initiative faltered – due to, among other things, resistance to drugs and insecticides, population movements and a lack of funding – and was eventually terminated.
Most of the region that Idaghdour now studies wasn’t included in that effort. But today he has high hopes of helping to extinguish the disease there and beyond. He and his team strive to better understand, for example, how the parasite affects the host immune response and how responses change over time.
“If we can understand these processes,” Idaghdour explains, “we might understand what the main factors are that modulate the course of the infection. If we understand which genes are important to be turned on or off to limit the ability of the parasite to multiply, that’s going to be very important.”
They’ll examine, for example, whether children who are highly susceptible to infection might be carrying specific mutations that make them resistant to drugs. Those mutations can then be monitored and strategies can be devised to determine which drugs to administer. Or, “If we can identify those children who are able to mount an efficient immune response, that in itself is very useful information. Because then we can really dig deeply into, for example, which cell types within the immune system carry the antibodies that are efficient in neutralizing the parasite. And that can be the basis for strategies to develop vaccines, which is one of the main challenges in confronting malaria, because there is no efficient vaccine.
“There are so many questions here that are critical from a public health perspective. We’re looking to find the answers.”