A group of scientists is reporting that they have the first gapless telomere to telomere sequence for a human chromosome. They used nanopore sequencing which allows really long sequencing reads that overcome the problems caused by assembling many short reads. You can check out the paper here:
So these gaps that we currently have are just hard to sequence? Do we know what percentage of the total genome are gaps?
Some are hard to sequence, but most are just hard to assemble. Lots of short repeats. There’s an estimate of the percentage of gaps in the cited article.
Ah, so if you’re doing bit-by-bit then it’s hard to tell how many repeats might be in a given long stretch?
OK, I looked real quick but must have missed it.
Yes, and it’s hard to tell where a bit is in a long stretch of repeats or know where that bit is in relation to other, similar bits. Consider these two fragments: ATATATATATATAT and TATATATATATATAT. How would you fit them together, particularly considering that the full sequence might have hundreds of AT repeats? (This is an artificial example: repeats and sequenced fragments are much longer than that. But you get the idea.)
Larry Moran has a good blog post on this.
Moran’s blog is also where I learned about the complete sequencing of the X chromosome.