GENALICE is a biomedical big data company founded in 2011. Its 20 employees develop ground-breaking software with the aim of achieving better diagnosis and treatment for patients with complex DNA diseases. Clients include clinics, hospitals, pharmaceutical companies, and genomics research institutions. The startup’s GENALICE MAP Next-Generation Sequencing (NGS) data analysis suite is a data-processing and data-analysis solution that uses smart, algorithm-based software to identify DNA changes. These are measurable genetic variations that might relate to certain biological conditions. This approach to analysis enables fast detection of DNA changes from large-scale samples, delivering higher-quality research with improved outcomes at a competitive price.
Population-scale analysis involves checking entire groups, or ‘cohorts,’ rather than just one sample.
"Doing this in a meaningful way, and enabling the exploration of variants across multiple samples, requires data processing on a huge scale,” says Jos Lunenberg, chief business officer at GENALICE. “Until recently, the biggest cohort size was around 500, but we wanted to go beyond this.”
To further its vision, GENALICE launched the Population Calling analysis module of its GENALICE MAP Suite. Initially the firm tried running the module on local servers, but the available infrastructure was unable to scale to the levels GENALICE needed.
“The aim with Population Calling was to perform analysis on 800 Alzheimer’s disease samples,” says Lunenberg. “We knew that analyzing them using our onsite servers would take too long, so we started looking for a cloud provider that could deliver reliable scale—and speed.”
“It had the best technology,” says Lunenberg of what led GENALICE to Amazon Web Services (AWS). “Three areas were essential to us: unlimited compute, scalable I/O, and infinite bandwidth between nodes and storage. We could see that AWS was the biggest player in the market, plus it was the first mover in the cloud, so we were confident about its depth of expertise. Combined, its market presence and technical excellence were compelling.”
GENALICE started redesigning Population Calling to run on Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3). And, to put its project in the public eye and showcase the power of its analysis module, it organized a live one-hour webinar on October 8, 2015. The aim of the event was to process the complete genomes—the DNA and genes that comprise an organism—of 800 Alzheimer’s disease patients in 60 minutes. The event was coordinated in scientific partnership with Mount Sinai Hospital in New York, which is one of the centers involved in the Alzheimer’s Disease Neuroimaging Initiative (ADNI) longitudinal study, aimed at the early detection of the disease.
“Two weeks before the demonstration, we got a disk from Mount Sinai Hospital with the 800 samples on it," says Lunenberg. "We uploaded the data onto AWS in less than a day, and then started some scalability tests to check that Amazon S3 would be able to handle the number of requests.
“With the Population Calling module running on AWS,” he continues, “we can process one sample on a single node in just six minutes. Through the ‘joint calling’ approach—which is our competitor’s method—it would take 34 hours.”
GENALICE moved on to the live demonstration with confidence that the technology wouldn’t let them down.
The event was a success, running 100 instances in parallel, and processing four terabytes of data. This was made up of 800 complete human genome samples—each only five gigabytes in size, and efficiently stored in the GENALICE Aligned Reads (GAR) file format.
“On that day we processed 800 samples on 100 nodes," says Lunenberg. "Using our competitor’s method, this would have taken more than two full weeks. With AWS, we did it in 60 minutes.”
The scale and speed achieved in the demonstration was a first in the biomedical research field. “In our previous architecture, we hit a ceiling at 500 samples—we just didn’t have the memory to go any further,” Lunenberg says. “In the past, our competitor achieved a run of between 1,500 and 2,000 samples in about four months. With the Population Calling module of GENALICE MAP, this same sample should take little more than a couple of hours to process.”
He continues: “We’ve enjoyed working with AWS. Whenever we encountered an issue, there was information available to help us. This, combined with the quality of our in-house team, allowed us to successfully demonstrate our population-scale analysis capabilities.”
The cutting-edge technology has exciting implications. With GENALICE MAP on AWS, customers can perform large-scale studies cost-effectively. This opens up opportunities to smaller institutions, and even makes powerful biomedical NGS research accessible to individual researchers. “Our technology marks a breakthrough in the way we deal with DNA analysis,” says Lunenberg. “For scientists, it shifts attention toward research and away from data management and processing. We want to make our technology available to more than the happy few."
As an example, he says that by offering GENALICE MAP through AWS, individual researchers can run their analysis on a limited amount of nodes and have their answers available quickly and at low cost. They don’t have to worry about the infrastructure behind it, and instead put their entire focus on the differences in the DNA.
"In the long term, this will improve the quality of research, enable more efficient diagnoses of DNA-related diseases, and ultimately create better outcomes for patients," Lunenberg says. "In the cloud, there are no boundaries to the compute resources available to us. This has made it much easier for us to achieve the goals of this project. Our aim is to extend our use of AWS and make our technology available on a self-service model, bringing population-scale analytics to as many researchers as possible.”
To learn more about genomics in the cloud, visit the AWS Genomics details page.