On graphs and genomes: finding a path through the jungle of repetitive DNA

Jiří Macas

Biology Centre of the Czech Academy of Sciences, České Budějovice

December 2, 2021, 12:20 in S6


The genetic information encoded in the DNA molecules present in the cells of any organism, called the genome, varies enormously in overall length and complexity between species. Interestingly, the size of the genome of plants and animals is not proportional to the number of genes they carry and the corresponding complexity of the organism. Instead, it is mainly determined by the differential accumulation of repetitive DNA, which is composed of DNA sequences that are repeated in the genome from a few copies to millions. Some of these sequences, such as mobile elements (transposons), encode their own ezymatic apparatus that facilitates their multiplication and "jumping" to new genomic sites, and were therefore previously considered genomic parasites. However, there is growing evidence that these repetitive elements can be beneficial to their host genome. Consequently, the genome is now viewed more as an ecosystem in which species (= different types of repeats) compete for resources and are associated with complex relationships ranging from parasitism to symbiosis. However, most of these interactions and their implications for genome evolution and function remain to be elucidated. In this talk, I will present new bioinformatics approaches being developed in our research group to study repetitive DNA in plant genomes. We use high-throughput DNA sequencing to randomly sample multiple genomic regions and represent similarities between these regions as edges connecting vertices (representing genomic sequences) of a virtual graph. We then explore the topology of this graph to detect specific signatures of repetitive DNA sequences that can be extracted and further analyzed, as will be demonstrated with several examples. Finally, we will show how these methods help to elucidate the composition and evolution of plant genomes.