Unexplored regions of the human genome finally unveiled

Unexplored regions of the human genome finally unveiled

You will also be interested


[EN VIDÉO] Interview 1/5: the secrets of DNA
DNA carries genetic information. Each cell has a nucleus made up of chromosomes containing DNA. We interviewed Jean-Louis Serre, professor of genetics, to tell us more about this molecule contained in all living organisms.

It took 20 years to overcome the 8% of the genome that remained unexplored, i.e. the time needed to sequence the first 92%. This colossal work mobilized a hundred researchers around the world, under the consortium Telomere-to-Telomere (T2T), and has just been published in no less than six different scientific publications, all of which appeared in Science1er April 2022 for a special issue on the subject.

Dissect long repeated sequences

Why was this small 8% so difficult to sequence? Because they are located in chromosomal regions known for their particular structure. The telomeres – the region at the very end of the chromosome arms – and the centromere – in the center – are made of long repeated sequences. As their name suggests, these are regions of theDNA where a base (A, T, C or G) or a short sequence of bases repeats a large number of times. The previous techniques of sequencing could not manage these very long pieces of DNA, also called reads. To sequence DNA, you must first cut it into small pieces, then read the sequence and put each piece of DNA back in the right place in the genome, like a puzzle. The Human Genome Project published the first-ever human genome sequencing in 2001, but some pieces of the puzzle were impossible to put back together.

T2T did not sift through the DNA of a human being – a volunteer who accepted that his or her heritage genetic either the reference – but of a human cell in culture; a hydatidiform mole precisely. A mole is a anomaly of fertilization : a ovum without genetic material is fertilized by a sperm. The resulting cell carries two identical pairs of each chromosome (46 XX in 76% of cases and 46 XY in 25%). Despite this somewhat strange origin, there is nothing to suggest that the genome of a mole is different from that of other cells, specifies Megan Dennis, a scientist from the University of California, member of T2T.

The last pieces of the puzzle

Advances in technology have made it possible to read in full the 3,054,815,472 base pairs of chromosomal DNA and the additional 16,569 ofmitochondrial DNA. T2T scientists used theOxford Nanopore Technologiesa sequencer that passes the pieces of DNA to be read through a membrane riddled with tiny pores that are only 1.5 nanometer in diameter at their narrowest point. They were able to read reads ofDNA, longer than 100,000 base pairs with very high precision, and therefore to cover the long repeated sequences that have been problematic until now. The sequencing data was created by Pacific Biosciences, which has a sequencing platform specific to long sequences. The scientists were thus able to place the final pieces in the great genetic puzzle they were assembling.

Thanks to this, the human genome has been enriched by almost 200 million base pairs – 90% of which are located in the centromeres – from which scientists have identified 99 new Genoa, potentially coding and 2,000 other candidates that have yet to be verified. Errors present in the old reference genome have also been corrected. This version 2.0 of the human genome, T2T-CHM13, will surely be at the heart of many genetic discoveries in the future.

Scientists have already identified genetic variants in the centromere region – whether these subtleties contribute to theemergence diseases. Other regions now accessible could reveal the secrets of the evolution of thespecies human. Structurally, the analysis of centromeric sequences could allow scientists to understand why the centromere forms here and not elsewhere, when nothing in particular seems to guide them. In short, the range of possibilities seems endless.

Enrich the database

The T2T will not stop its investigations here. The consortium has already integrated the Human Pangenome Reference Consortium which aims to sequence the DNA of 350 individuals. ” Pangenomics is about capturing the diversity of the human population, and it’s also about making sure we’ve captured the whole genome correctly. », Explain Benedict Paten, co-author articles appeared in Science.

In the meantime, the T2T-CHM13 reference genome complements that of 2001 (named GRCh38) and is available fully annotated on the UCSC Genome Browser, ready to be dissected in every way by scientists around the world.

Interested in what you just read?

fs6