DNA analysis by machine learning

Domain topic

Nano-Bio Physics

Supervisors

Motivation

Personalized medicine promises to be a revolution in the next decades. Customizing drugs and treatments based on the patient's genome greatly improves the efficiency and saves valuable time. In addition, it allows for custom preventive care, where the genetic information can be used to adapt lifestyles to avoid disease and for frequent and targeted disease screening for early detection. However, for this, the genome of each patient should be analyzed, what is extremely expensive and time consuming. There is still a lack of routines and devices which can be used for cost efficient DNA analysis with high throughput. In addition, current techniques need very extensive biocomputational effort to reassemble the obtained information, what translates in errors, high costs and time. DNA optical mapping allows for rapid screening of each DNA molecule, one by one. In this method, the length and structure of the molecule can be read almost in real time with resolution in the range of kilobasepairs. DNA optical mapping has been used complementary to sequencing, but it is emerging as a stand-alone methodology. It can be used to identify bacteria, determine antibiotic resistance, and read intact chromosomal DNA, just to mention some examples. But the main challenge is doing this with high throughput, for which analyzing and handling the produced data is a crucial step.

Project Description

Here we propose to use machine learning for DNA optical mapping. In Lund, a student will implement and use machine learning to analyze stretched molecules in nanochannels and on surfaces, studied by fluorescence microscopy. In Hamburg, it will be implemented on a laser assisted DNA read out device. On a first stage, we will focus on DNA analysis by molecule length. The molecules will be analyzed and recognized in real time, and a feedback loop will be used to separate them in different microchannels by their length. In a second stage, we will improve the method to read barcoded molecules. As a proof of concept, we will use this method to find target DNA (for example, bacterial DNA which needs to be further analyzed) in a matrix of human DNA (which can to be discarded). The device will separate the bacterial DNA from the rest, so that it can be recovered and analyzed by sequencing.

Methodological keywords

Hydrodynamics, optical systems, real time feedback loops, machine learning, potential for marketing