I was forwarded this article by a thoughtful Christian scholar. I would like a level headed and kind assessment of the science and arguments in this article, so that this scholar can understand how scientists assess work like this.
Well, to start off, the initial assumption is that what is essential today must have been essential from the beginning. A protein that performs function X must always have performed function X, and that no replacement of old genes by new genes relevant to function X can ever happen. This is of course untrue. To take a simple morphological example, consider the mammalian dentary-squamosal jaw joint. Can we agree that a mammal lacking that joint cannot survive, since it would be unable to eat? Can we also agree that a bird’s articular-quadrate jaw joint is also essential to the bird? Yet these joints are not homologous. By Tan’s reasoning, it has just been proven that mammals and birds are not part of the same tree. And while I suspect Tan would agree that they aren’t, he should also agree that the reasoning I just used to arrive at that conclusion is bogus. We actually have fossils of clear intermediates, showing that this particular transition is not impossible. Sadly, molecules tend not to leave fossils, so we can’t use that evidence to show that his molecular claims are bogus; still, we can show by analogy that his claims rely on a faulty assumption.
That’s why anti-evolutionists focus so much of biochem
A random fact-check from the article, as it immediately didn’t sound right to me (my emphasis):
Demuth and colleagues reported 870 primate protein families (with 689 human unique genes) that do not have homologs in rodents and 1773 rodent protein families that do not have homologs in primates (Demuth et al. 2006).
Turns out, if you look at Table 2 of Demuth et al. (2006), you see that these numbers are completely wrong. They report 870 genes in 453 gene families that have expanded in primates, and 1773 genes in 514 gene families that have expanded in rodents. In other words, 453 primate gene families gained an average of 1.92 genes each relative to other lineages, and 514 rodent gene families gained an average of 3.45 genes each relative to other lineages.
Nothing to do do with “lineage-specific protein families” at all. Basic errors in citing other results doesn’t exactly bode well for this article. How many other citations are completely botched, I wonder?
There’s a lot to address in that article, so I thought I would start with the main premises in the abstract.
There are also genes that are not essential for survival but beneficial to survival. The author needs to consider DNA sequence under selective pressure if this is going to be a proper model of evolution, and this would include non-essential beneficial alleles/genes.
Excluding rare mutational events is a mistake since you only need extremely rare events to produce lots of novel genes over the course of life’s history. The author also claims that no new novel genes have been observed to emerge, but there is plenty of evidence demonstrating how they emerge, such as:
The author slyly conflates functional arrangements with specific protein domains. Those are not the same thing. A sequence may not form a specific protein domain, but it can still be functional. I don’t know of any studies that have estimated the functional space in amino acid sequences. The best they have done is look for specific functions, but none have been able to look for function in general.
I can agree with this one. However, genes that are used may not be essential, so we need to be careful of confusing essential with used.
Due to interaction between mutations within a protein, neutral mutations may open up more potential beneficial mutations. Also, disruptive mutations will be selected against and can be removed from the population. There are also mutations that can compensate for deleterious mutations due to interactions between amino acids in the protein. These are called epistatic effects, and need to be considered as part of the author’s model.
Since we have evidence of how novel genes emerge, this conclusion is easily refuted.
The argument seems to be that different groups of organisms that possess distinctive sets of taxonomically-restricted essential genes cannot share a common ancestry. It relies heavily on the proposition that new genes cannot emerge. Since this latter assertion has been eviscerated, laid to rest as it were, here on Peaceful Science, it seems as if Tan’s thesis is fatally flawed.
Yes. The claim is rather vacuous. Essential genes are indispensible for survival. Yes, by definition. That’s what it means to call them essential. And of course there’s plenty of genes that aren’t essential. And further still, the fact that some gene is essential says nothing about whether it always was. Or even what it is that makes it essential right now. Conditional essentiality is a thing. Most bacteria can live without antibiotic resistance in the presence of antibiotic provided the concentration is low enough. But there’s a point at which the concentration becomes so high that antibiotic resistance becomes essential to survive.
It should be obvious that there are probably many genes that are conditionally essential in the same way. If the environmental circumstances were changed, some essential genes would stop being essential.
This is in large part how Craig Venter’s team have been able to get their synthetic genomes so small, by manipulating the environment to remove challenges and provide many essential nutrients in a form that allows the organism to live and reproduce without having to synthesize a lot of components required for growth and reproduction.
Guerzoni and McLysaght 2011 doesn’t say anything about human genes with no chimp homologs either. It refers to the absence of protein-coding homologs.
Could you expand on this, for those of us watching from the gallery?
It probably bears commenting on the overall structure of the paper in addition to the technical details. Tan has a hypothesis about the relatedness of living things, which as I understand it is that organisms can only be related if the essential genes of one are a superset of the essential genes of the other. That’s an intuitive hypothesis especially if one approaches biology from an essentialist perspective; one can see the appeal. Tan then makes a rhetorical case for this hypothesis based on a review of existing literature. That’s perfectly reasonable and typical of scientific papers. Others have (justifiably, in my view) raised questions about the interpretation of that literature and the accuracy of summaries and conclusions. But those issues aside, it is appropriate to ground one’s hypotheses in what has been studied previously.
At this point in the paper, one would expect a description of a method to test or explore that hypothesis. That might be a set of experiments or a mathematical model or a statistical test. Then would follow the results of the test or exploration, and a discussion of whether those results are consistent with the hypothesis. Instead, as I read the paper, Tan essentially takes her rhetorical argument as conclusive and proceeds to ask the question of whether the essential genes of actual organisms exhibit strict subset/superset relationships. Since they do not, she concludes not all organisms are related by common descent, because under her hypothesis they cannot be with those patterns of essential genes. This is the extent of the actual work being reported–gathering data to see what organisms can possibly be related via common descent under Tan’s model. Such an effort is not trivial, but neither is it the sort of work that would be considered original research to publish.
I suppose one could frame the paper instead as putting forward a model–a forest of family trees–and makes a prediction about what one would see under that model–non-nested sets of essential genes. Looking at it from that perspective, one would expect a quantitative assessment of whether actual observations are more likely under the proposed model than available alternatives. Different models can have overlapping predictions and thus be consistent with the same observations. At best what Tan presents is a qualitative assessment that non-nested sets of essential genes are unlikely under the single family tree model. But even if they are unlikely, we immediately have the question of whether the numbers that we see are consistent with that likelihood. Without an answer to that question, the conclusion of the paper is at best very weakly supported. (And even then, such support depends on the rhetorical case for the hypothesis being correct.)
They do indeed. Note that this part of the filter only works if the gene is fairly recently evolved. If a necessary gene evolved in the ancestral amniote, for example, all the non-coding (presumably junk) sequences would have changed beyond recognition, likely lost completely in fact.
I took the path ‘yes’ to unique genes and ‘no’ to essential unique genes. Is that the path you took?
Also, how does one handle the fork after answering ‘yes’ to ‘Are unique genes essential?’ Do you have to take both paths? What do you do if you answer ‘yes’ to both subsequent questions (or ‘no’ to both)?
I’ll be kind.
The paper has major flaws. It purports to show that life is a forest rather than a single tree, but AFAICT does so by assuming that new genes cannot arise and later, due to changing environment (including changes to other genes), become essential; and therefore any essential genes that exist only in a limited taxonomic branch must have always existed.
One particularly bad flaw is this:
For a 153 amino acid long ß-lactamase domain, a typical protein domain with α helixes, ß sheets, and loops, the possibility of finding a polypeptide that functions is one in 10^77 (Axe 2004). The human genome contains 23,000~30,000 protein coding genes, with a median length of 375 amino acids (Brocchieri and Karlin 2005; Wijaya et al. 2013). If we scale according to the length of the protein, the possibility of a 375 amino acid long polypeptide functions as a natural protein is one in 10^189 (= 10^77 (375/153)).
First, that isn’t the calculation for a polypeptide functioning as a natural protein, it’s the probability that a polypeptide functions as lactamase. Second, why scale to the length of the protein? There’s absolutely no reason given as to why longer proteins should be less likely to be functional, and in fact the opposite is expected since longer/larger proteins have more scope for containing binding sites.
We see here a negative example of the value of peer review. If this paper had been under any sort of real peer review it would not have been published, at least without radical revision. Elementary errors of fact abound.
I would say yes. I don’t know of any essential genes in humans that are not shared by other great apes. There are orphan genes specific to humans, but they aren’t essential. There are also plenty of human orphan genes that have orthologous DNA in other great ape species.
The problem for Tan is that there are plenty of papers outlining the mechanisms responsible for producing novel genes. Tan’s claim that evolutionary mechanisms can not produce novel genes runs counter to the literature. If nothing else, Tan should have focused on reasons why the existing literature is wrong.
It is even further flawed by the fact that scientists can find a functioning lactamase in a pool of 10^9 antibodies that have a randomly assembled variable region.
Hi @Roy. A couple of points:
First, beta-lactamase is natural enzyme (in the sense that it occurs in nature). Second, the matter of scaling “probabilities” to polypeptide length is something one can probably find in the literature, and not just in the ID-friendly literature. It’s not appropriate, as the many discussions here at Peaceful Science have shown. But it is a common enough practice that I would not be so harsh with Dr. Tan.
I am wondering - any chance that @Paul_Nelson can join this thread and bring us up to date about his orfan project? I can see how the work he has alluded to may involve lines of reasoning similar to what Tan presents in the paper we are discussing. Other than more data, are there any new approaches being incorporated into the work? Novel ways to estimate essentiality? Better ways to describe functional sequence space? Tools that can identify the “non-coding” to “coding” path on a genome-wide scale?