Supercomputing is the future of genomics research

Release date: 2017-02-10

Today, data torrents are affecting scientists and researchers in genomics and other life sciences in a very profound way. There are two reasons. First, as more and more data sources are added, researchers are unable to manage collected data such as avalanches. Second, researchers lack the ability to quickly calculate data and turn data into valuable. The ability of scientific insights.

At present, genomics is at such an inflection point: the cost of human genome sequencing has been less than $1,000 and is expected to continue to decline (compared to $3 billion in 2003). As sequencing costs decrease, genetic testing becomes more common and the corresponding genomic data continues to rise. Only a single person's genome "runs" produces 0.5TB (1TB = 1024GB) raw data image files, which are complex and contain decentralized, unstructured scientific data that is difficult to manage and analyze.

With the development of sequencing technology, the challenge for researchers is how to manage and analyze these large, unstructured genomic data. Typically, these data are generated in academic research, clinical trials, and pharmaceutical research around the world. Many organizations now require more advanced data analysis and management for drug discovery, disease genetic testing, and the creation of personalized treatments in clinical applications. However, genome sequencing is a complex multi-step process involving DNA sequence reads, genomic sequence splicing, variant region sequence analysis, and resequencing.

The reality is that the technology we have used in the past decade has not been strong enough to analyze these key data. Such technologies are destined to be replaced by new technologies in the future, as the demand for data analysis is growing faster as genetic sequencing companies continue to innovate. At the same time, there is a growing demand for genome sequencing.

Therefore, what technologists need to do is to analyze these data through sophisticated high-performance computing (HPC) or supercomputer and big data technologies to make the management and analysis of genomic data more convenient and efficient.

Big data itself

The pursuit of personalized medicine has produced explosive data growth, because doctors and researchers hope to achieve optimal personalized treatment for different patients based on patient's disease performance and drug tolerance through gene sequencing. . At the same time, with the increasing funding of genomics research, genetic sequencing has become more and more commercial, and has further promoted the development of personalized medicine.

In related cases, Kaiser Permanente collected DNA samples, medical records, etc. from more than 210,000 patients across the United States, and subsequently created the world's largest and most comprehensive database of precision medicine. On this basis, researchers hope to find specific genes affecting various genetic diseases, in order to improve the diagnosis, treatment and prevention of diseases in clinical applications.

Of course, in order to successfully sort out these complex, scattered, unstructured scientific data, researchers need large-scale computing and high-speed analysis of data and flexible computing systems, but traditional computing systems can not keep up with the development of data needs. pace.

Fortunately, with the advent of modern supercomputing technology, research institutions can continue to increase the amount of data and analyze valuable scientific insights.

Manage and share new data

To achieve major scientific breakthroughs in a data-intensive era, research teams need to analyze large data sets faster and more easily. In 2016, the Institute of Translational Medicine (ITMI), the well-known American health system INOVA, purchased the HPC system, enabling researchers to use the genomic database to diagnose patients more accurately and quickly, and to provide higher levels of treatment and care.

ITMI's system is used for data-intensive workloads of 25,000 genomes, and researchers simplify data management by developing and using their own code. With this flexibility, ITMI's burden of managing IT is significantly reduced, while also increasing the ability to research workflows, enabling organizations to invest more resources in more challenging areas of chronic disease.

As research institutions process more and more data, future capital will also favor new supercomputing solutions to improve data management and accessibility. Specifically, these systems will provide faster workflows and faster assembly and analysis operations, increasing efficiency for researchers' research. Relatively speaking, HPC systems query massive databases faster than an order of magnitude, can explore larger data sets, and can conduct more data investigations at the same time.

Why data storage is critical

The biggest challenge in genomics research is that data sets often need to be stored, analyzed, and then stored again. For example, US genetic testing company Human Longevity recently collaborated with pharmaceutical company AstraZeneca to sequence 500,000 DNA samples from clinical trials. The program is expected to generate a comprehensive health record of one million genomic, molecular and clinical data by 2020. This will be an amazing amount of data, all of which must be stored in external storage for later transfer to the computer over the network, analyzed and then stored back to external storage.

This process creates an incredible burden on traditional IT infrastructure. Most storage managers are not able to withstand the stress of these workloads because they do not have the scalability, persistence, and long-term nature of today's biomedical applications.

Supercomputer modern

Data from genomics research will continue to grow exponentially. As technicians provide gigabit and future mega-megabyte solutions, the unpredictable data that was felt a few years ago is now fast and easy to manage and analyze. In addition, the supercomputing system has become more affordable and less complicated.

Supercomputers have multiple functions in genomics, including assisting in the compilation and identification of patterns in research data, and annotating genetic sequences into image modeling.

For research organizations, finding a solution for modern HPC is important because it not only analyzes the data, but also stores the data easily, while other researchers can access it again. Storage systems from Silicon Image Inc. (SGI) make it easy to integrate high-performance computing and data analysis system functions.

Modern HPC systems provide a large-scale, storage-virtualized data management platform designed to manage the vast array of structured and unstructured content generated by life science applications. In this competition to collect, research, link, and analyze key biomedical research data related to a personalized healthcare environment, SGI provides a shortcut for analysis and innovation in research institutions and laboratories.

Conclusion:

Genomics research will promote the identification of disease genes, accelerate the identification of biomarkers, and provide patients with more targeted personalized treatments. At the same time, researchers in genomics are also facing challenges. They need to conduct new high-quality research to provide clinicians with a basis for personalized medicine treatment, and to treat cancer and other diseases through genome sequencing and stem cell research. The leading position of the HPC system has enabled some research institutions to make breakthroughs in the life sciences.

Author: Gabriel Broner
Compilation: Mu Yi Source: Genetic Engineering & Biotechnology News

Source: Flint Creation

Anthocyanin

Anthocyanins,Cyanidin,Proantho Cyanidins

Shaanxi Zhongyi Kangjian Biotechnology Co.,Ltd , https://www.zhongyibiology.com