Improved Human Genome Published

Patrick · November 20, 2018, 1:22am

AllenWitmerMiller · November 20, 2018, 1:51am

This surprised me from the article:

Specifically, the world’s reference genome was assembled from the nucleic acid sequences of a handful of anonymous volunteers. Other researchers later determined that 70 percent of the reference genome derives from a single individual who was half European and half African, and the rest derives from multiple individuals of European and Chinese descent, according to Salzberg.

My opinion means nothing but I would have thought that a genome database should include DNA from the San people (some of the oldest and most divergent Y-chromosome and mitochondrial DNA haplogroups.)

swamidass · November 20, 2018, 2:05am

@AJRoberts will love this.

This is a low level advance but fundamental for the field. The human reference genome is like the Kilogram for physics, and we need to get it right.

A good question to ponder: what percentage of the genome is 300 million bits? Thoughs paying attention should have a very good estimate without reading the paper.

Patrick · November 20, 2018, 2:10am

About 10%?

swamidass · November 20, 2018, 2:19am

How did you compute that number?

Patrick · November 20, 2018, 2:29am

300 million out of 3 billion = 10%

glipsnort · November 20, 2018, 2:34am

Depends on whether they mean 300 million pieces of DNA or 300 million bits of information.

swamidass · November 20, 2018, 2:34am

Math error here. You forgot a unit conversion. In this context it is best to declare your assumptions too.

@glipsnort I didn’t even think of that meaning too. @AndyWalsh will be vectoring on that one.

glipsnort · November 20, 2018, 2:36am

Judging from the linked piece, they mean base-pairs.

Patrick · November 20, 2018, 2:39am

I agree, I think the author mean bits as in portion of the genome and not as the unit of information. It seems that the reference Genome doesn’t have 10% of the human genome in it. And that remaining 10% could be filled in by looking at the genomes of Africans.

Here is the actual paper, it is base-pairs and number of reads over the genome.

https://www.nature.com/articles/s41588-018-0273-y

Up to 300 Million base pairs are missing from the reference genome.

AllenWitmerMiller · November 20, 2018, 3:14am

I may be getting confused but it seems to me that 3 billion base pairs requires 6 billion bits of computer storage because each base pair is one of four possibilities (ATGC), so that’s 2 bits for each.

Whatever the case, “bits of DNA” seems potentially confusing.

AndyWalsh · November 20, 2018, 3:16am

Perhaps it is a matter for CPAs and they are missing $37.5M in grant funding.

But the real question is: how much bigger would the human genome need to be before its entropy eclipses that of the Sun?

Patrick · November 20, 2018, 3:38am

The paper had nothing to do with “bits”. It was poor reporting by Science Digest. The paper has to do entirely with base pairs.

swamidass · November 20, 2018, 3:39am

Then this looks just about right:

Patrick · November 20, 2018, 3:45am

The paper is very good. It really shows how incomplete the “reference” genome is. And how the reference genome lacks diversity. By adding in diversity from African Genomes and from other parts of the world, we can start doing analysis with all of the diversity that we have from being admixes of admixes of admixes.

AllenWitmerMiller · November 20, 2018, 3:47am

Meanwhile, I was under the impression that the human genome was something like 3.3 million base pairs. I’m not trying to be pedantic but if we are concerned with 10% out of the total number of base pairs, then that 0.3 million would seem quite significant.

Or did I remember incorrectly about the 3.3 million? (No, I haven’t read the article yet because Joshua’s challenge was “without reading the article” or whatever.)

glipsnort · November 20, 2018, 3:52am

Yeah, well, sorta. The problem is that there really is no such thing as “the human genome”. The reference genome is missing a bunch of bits that some humans have.

Heh – this takes me back. I may have been the first person to make that estimate, or at least the first person known to those running the project. Eric Lander asked me to figure out what the ancestry of the primary donor to the reference genome was. According to an email I sent him in 2002, I came up with two estimates (based on small sets of informative markers): (1) 60% African, 40% non-African, and (2) 55% African, 45% non-African. Others did a much better job later, of course.

AllenWitmerMiller · November 20, 2018, 3:56am

This is the kind of fascinating, first-hand anecdote which brings me daily to the Peaceful Science forum. (One sure doesn’t find this kind of post on the AIG Facebook page.)

Collectively, the PS posters have had a great many interesting experiences in life.

Patrick · November 20, 2018, 4:02am

The human genome is estimated to be over 3 billion base pairs but there are still sequencing gaps in it. This paper found as much as 300 million bases pairs isn’t in the reference genome which is more European that it should be. That will biases the analysis using the reference genome. What is needed is a lot of reference genomes over time and location.

swamidass · November 20, 2018, 4:07am

I"m not sure about this. That is not how the reference genome is used.

Topic		Replies	Views
Human and Chimp Similarity (Mind the Controls) Conversation Science	5	1534	March 6, 2019
Is Only 1.5 - 7% of Our DNA Unique to Sapiens? Conversation Science	10	591	July 26, 2021
Heliocentric Certainty Against a Bottleneck of Two? Conversation Adam , Science , Featured	17	19953	July 11, 2018
Scientists Release a New Human Pangenome Reference Conversation Science	1	316	May 17, 2023
Human Genetics Confirms Mutations as the Drivers of Diversity and Evolution Conversation Science	10	479	February 21, 2019

Improved Human Genome Published

Related topics