The Origin of Life: Can Science Show Intelligence Was Required?


#1

Hi everyone,

As many of you are aware, Dr. James Tour, who is one of the world’s top chemists, has forcefully criticized scientific models purporting to explain the origin of life. And yet, in an article titled, Origin of Life, Intelligent Design, Evolution, Creation and Faith, he writes:

I do not know how to use science to prove intelligent design although some others might. I am sympathetic to the arguments and I find some of them intriguing, but I prefer to be free of that intelligent design label. As a modern-day scientist, I do not know how to prove intelligent design using my most sophisticated analytical tools— the canonical tools are, by their own admission, inadequate to answer the intelligent design question.

Over at Evolution News and Views, Dr. Brian Miller, who is a physicist, has argued that intelligent agency was required to coordinate the steps leading to the origin of life. He wrote a series of four articles back in June 2017:

Thermodynamics of the Origin of Life
The Origin of Life, Self-Organization, and Information
Free Energy and the Origin of Life: Natural Engines to the Rescue
Origin of Life and Information — Some Common Myths

I was not satisfied with some of his arguments, so I responded in a post of my own, over at The Skeptical Zone:

Recycling bad arguments: ENV on the origin of life

That was back in June. Dr. Miller has finally responded to my article, in a new series of posts over at ENV:

The Origin of Life: Correcting Common Mistakes on Thermodynamics
The Origin of Life: Dangers of Taking Research Claims at Face Value
The Origin of Life: The Information Challenge

I have to say that Dr. Miller has put up a strong defense of his views. A few brief quotes will serve to convey the tenor of his argument:

The bottom line is that life has higher energy and lower entropy by any definition than the molecules from which it sprang. Therefore, nature would always resist its spontaneous formation. The fact that the building blocks have to be arranged in a highly specific order could simply be added to the entropy as a configurational part. Alternatively, the probability of them coming together properly could be thought of as a separate probabilistic challenge in addition to the entropy challenge. Either way, the entropy/configurational barrier is insurmountable

All … papers that propose solutions to the thermodynamic challenges [to the origin of life] use the same approach. They ignore nearly all practical challenges and completely disassociate their work from realistic experiments. And they assume the existence of an unlimited source of energy, an efficient energy converter (engine), and information. However, the converter and the required information must already exist before the converter could be created. The only explanation for the sudden appearance of such molecular machinery and the information is intelligence.

No chiral building block of life (e.g. right-handed ribose) has been shown to interact with any substance to self-replicate. On the contrary, in all realistic environments mixtures with a bias of one enantiomer tend toward mixtures of equal percentages of both left-handed and right-handed versions. Goldenfeld “solved” the homochirality problem by creating an artificial world that eliminated all real-world obstacles. All simulations that purport to be breakthroughs in origins problems follow this same pattern. Conditions are created that remove the numerous practical challenges, and the underlying models are biased toward achieving the desired results…

In the same way letters combine to form meaningful sentences, the amino acids in proteins form sequences that cause chains to fold into specific 3D shapes which achieve such functional goals as forming the machinery of a cell or driving chemical reactions. And sentences combine to form a book in the same way multiple proteins work in concert to form the highly integrated cellular structures and to maintain the cellular metabolism. The comparison is nearly exact.

The most essential early enzymes would have needed to connect the breakdown of some high-energy molecule such as ATP with a metabolic reaction which moves energetically uphill. One experiment examined the likelihood of a random amino acid sequence binding to ATP, and results indicated that the chance was on the order of one in a trillion. Already, the odds against finding such a functional sequence on the early Earth is straining credibility. However, a useful protein would have required at least one other binding site, which alone squares the improbability, and an active site which properly oriented target molecules and created the right chemical environment to drive and interconnect two reactions — the breakdown of ATP and the target metabolic one. The odds of a random sequence stumbling on such an enzyme would have to have been far less than 1 in a trillion trillion, clearly beyond the reach of chance.

Venema was referencing the research by Michael Yarus, but he misinterpreted it. Yarus states that no direct physical connect exists between individual amino acids and individual codons. He instead argues for correlations in chains of nucleotides (aptamers) between amino acids and codons residing where the latter binds to the former. However, Koonin argued that correlations only existed for a handful of amino acids, and they were the least likely ones to have formed on the early Earth.

[N]o physical explanation exists for the encoding of amino acid sequences into codons, nor can the decoding process either be explained or directly linked to the encoding process. Such a linkage is crucial since the encoding and decoding must use the same code. However, without any physical connection, the code must have preexisted the cell particularly since both processes would have had to have been instantiated around the same time. The only place a code can exist outside of physical space is in a mind.

I’d like to start the ball rolling by making a few quick comments of my own, and inviting others to join in:

(1) Dr. Miller’s claim that “the entropy/configurational barrier [to the origin of life] is insurmountable” strikes me as a very strong one, as it would apply equally well to any chemical precursor of life which has higher energy and lower entropy by any definition than the molecules from which it sprang.

(2) Dr. Miller’s states that all solutions that have been proposed to thermodynamic challenges to the origin of life “assume the existence of an unlimited source of energy, an efficient energy converter (engine), and information.” What kind of information is he talking about here?

(3) Dr. Miller’s criticisms of proposed solutions to the origin of homochirality are substantive, and merit a serious response.

(4) I continue to stoutly maintain, contra Miller, that sequences of amino acids in life contain functional but not semantic information. He quotes from a paper by Shen and Tuszynski, claiming that the semantic structure of a protein sequence is similar to a language structure which goes from “letters” to “words,” then to “sentences,” to “chapters,” “books,” and finally to a “language library.” But the passage he quotes undermines his claim. To be sure, the 20 common amino acids in proteins can be likened to the 26 letters of the English alphabet. But Shen and Tuszynski jump from letters to whole sentences or paragraphs, which are held to be the equivalent of protein sequences. There’s nothing in between, corresponding to words - and hence, nothing remotely like English grammar. That makes the comparison invalid, in my book.

(5) As I understand it, the probabilistic resources available to evolution have been estimated at 10^42 [apparently Dryden, Thomson & White argue that up to 4×10^43 different amino acid sequences could have been explored since the origin of life]. So when Dr. Miller writes, “The odds of a random sequence stumbling on such an enzyme would have to have been far less than 1 in a trillion trillion, clearly beyond the reach of chance,” he is factually mistaken, as one trillion trillion is “only” 10^24.

(6) I can’t comment on Dr. Miller’s claim that Professor Dennis Venema misinterpreted the research conducted by Michael Yarus. Would anyone care to comment? The point here is a crucial one, as it relates to whether we can properly speak of a genetic code, as such. Venema appears to think that this way of speaking is inaccurate, as it assumes that amino acids bind to their codon in a wholly arbitrary fashion. Miller also writes: “Koonin does reference the possibility of the evolution of the modern translation system being aided by chemical attractions between amino acids and pockets in tRNA. But he states that the sequences in those pockets would have been “arbitrary,” so they would not relate to the actual code. As a result, no physical explanation exists for the encoding of amino acid sequences into codons, nor can the decoding process either be explained or directly linked to the encoding process.” Is that a fair summary, in the opinion of readers?

(7) Astonishingly, Dr. Miller says nothing about evidence I put forward in my TSZ article against the standard genetic code having been designed: there are about 10^84 possible genetic codes. The one used by living things is in the top 1 in 100 million (or 1 in 10^8). That means that there are 10^76 possible genetic codes that are better than it. To make matters worse, it’s not even the best code in its local neighborhood. It’s not “on top of a hill,” as it were. Does that make sense, if it was designed?

Just a few thoughts. What do other readers think?


The Semiotic Argument Against Naturalism
#2

I think you did some serious work here. It will take me some time to think about it and read before I respond. Thanks, and I’ll write more later.


#3

Some good news. Brian Miller from DI is going to join us tonight on this thread. I also @herman joins in, as the “thermodynamic argument against the origin of life” is closely related to The Semiotic Argument Against Naturalism.

There is quite a bit of technical information here, and also some substantial disagreement. It will take a few days before I will be willing to weigh in strongly, as I want to do the background reading first. It is well known that I am not an ID proponent, for example, and have a tumultuous relationship with the DI.

With that in mind, before we get into our disagreements, I wanted to request we spend sometime delineating what our common ground actually is before jumping into a technical details. That will give people time to catch up with what actually is being debated here. The fact of the matter is that we do appear to agree in on a lot.

Common Ground

With that in mind, let me kick that off. I think we do have common ground in agreeing on before we start.

  1. No credible mechanism has been put forward for the origin of life yet. Appealing to evolution misses the point, because we need to have a replicable unit first before any sort of natural selection based process can kick in.

  2. Anyone who argues that the Origin of Life is a solved problem is uninformed or agenda driven. We, fundamentally, have not established how the first life arose in science. There are interesting theories, but nothing connects all the critical dots from non-life to life in a satisfactory way. Of course there is sensationalized press, and a mythology regarding the Urey-Miller experiments (and other cases), but anyone reading the science would have to agree we are facing an unsolved problem.

  3. This may ultimately be unsolvable for a whole host of reasons. Perhaps there is no natural mechanism available, or perhaps we just do not have enough information and observation time to understand the mechanisms there. Either way, we would not be able to solve this problem.

  4. A natural mechanism for the origin of life does not somehow constitute a proof against God or against creation in any meaningful sense. For example, the recent novel Origins is not just science fiction, it is also religious fiction, when it claims that demonstrating a process for the origin of life would end religion.

  5. This has very little to do with evolutionary science in biology. Evolution is about how biological systems evolve from other biological systems.

  6. Outside science (not wearing our scientific hats), it is entirely reasonable (even if disputed) to draw an inference to design. This is especially true if we are Christians, already affirm God exists and created us all. No one should act as if science has unsettled that theological claim.

I think most secular scientists would agree with this too. None of this is particularly controversial. In fact, much of this can be considered the consensus view.

The Disagreement?

I suspect the disagreement (from starting to read this) is going to be about:

  1. the validity of specific arguments and specific claims and specific analogies/metaphors
  2. whether this inference to design allowed inside scientific discourse or not
  3. whether a confrontational or non-confrontational relationship with science is best
  4. whether we can distinguish “impossible by any process” (known and unknown) from “possible by unknown process.

I’ll disclose I already see a truckload of #1, think the inference to design here is outside of science #2 (even though it can be warranted), prefer non-confrontation #3, and and think that #4 is not possible.

Still, before we enter the labyrinth, where do you all think we should focus discussion? I doubt we can do all of it without getting lost in the maze. What do you think would be most helpful? Of course, please be sure to articulate what you think our common ground is, and delineate what might be our disagreement. Once we scope this a bit, and some of us catch up on reading, it will be interesting to see how this unfolds.


#4

Hi Joshua,

I look forward to Dr. Miller joining the discussion.

As someone with a philosophical background, I’m used to evaluating arguments when they are expressed in the form of a syllogism. I would very much appreciate it if Dr. Miller could summarize his thermodynamic argument against the origin of life (and any other arguments he thinks might come in handy) in this fashion, in the interests of clarity. That’s all I wanted to say, for now.


#5

5 posts were split to a new topic: Is the Genetic Code Well-Designed?


#6

I first wanted to thank Vincent Torley for the time he spent reviewing my article. Addressing his comments was a very useful exercise for me. I responded to the comments about the optimality of the code in the related thread, and I will address in this post the comments about semantic information. I will soon address other topics.

Semantic Information

The most important points of similarity between letters in a sentence and protein sequences are the rarity and the connection to something external. The fact that protein sequences operate hierarchically in life – amino acids form proteins which form metabolic reaction sequences which form a functional cellular metabolism – highlights the probabilistic challenge. Each level in the hierarchy compounds the problems: very few combinations of proteins will form a coherent metabolic pathway in any environment, and very few combinations of metabolic pathways will form a functional metabolism. Without a full metabolism enclosed in some confined space, the whole system would likely eventually break apart. When an organism dies, it decomposed into simple chemicals.

The reference to something external relates to the fact that most amino acid sequences create chains which do not do anything significant but eventually break apart. However, a small fraction form into a stable enzyme which drives some essential reaction for a cell. Sequences which perform such functions are considered by many experts in the field to have semantic information. A good book on this topic is From Matter to Life: Information and Causality edited by Sarah Walker, Paul Davies, and George Ellis. One of the contributors is Anne-Marie Grisogono who commented (p. 79):

…we might anticipate the appearance of yet more interesting forms of information – for example, information about the environment or about how the autocells reproduce. These constitute two very significant classes of information that represent seemingly abrupt departures from the passive and nonreferential nature of the latent information discussed so far, and arguably underlie the emergence of causal power in information. The fact that such information carries meaning, implications about something other than its own instantiation, places it in a unique category, which we might call semantic information, and its existence raises many deep and important questions and considerations…

In a later chapter Karola Stotz and Paul E. Griffiths further relate the presence of information to that which demonstrates causal power or fine-grained control over an external process. This idea is further developed by Werner Loewenstein in The Touchstone of Life. Loewenstein describes how the precise 3D shape of active sites in enzymes contain information because they direct molecules to chemically interact in ways which would be extremely unlikely or impossible on their own (p. 65)

The heart and soul of all this is the information contained in the protein mold. There is a nook in that mold for every piece and enough information passes hand for three functions: (1) the recognition of each piece, (2) the optimal positioning of the pieces relative to one another, and (3) the weakening or splitting of certain bonds in the pieces or the making of new bonds for welding the pieces together.

The fold is the result of the precise sequencing of amino acids, so the information in the sequence has “causal power” via the active site over the target molecules. An important point is that the active site acts like a factory which manipulates molecules in such a way as to contribute toward the purpose of maintaining a cell. And, the structure of the site and its chemical properties cannot be deduced from the chemical properties of the pre-folded chain such as the nature of the chemical bonds.


#7

Hi Dr. Miller,

I’d like to thank you for commenting on this thread. Re semantic information: the distinction between syntax and semantics is a vital one in linguistics. The sentences “Cows flow supremely” and “Cheetahs run swiftly” both conform to the rules of English syntax, but only the second sentence has a semantic meaning. What parallel distinction can you adduce in the field of biochemistry?

I’d now like to address a few of your comments.

[A] small fraction [of amino acid sequences] form into a stable enzyme which drives some essential reaction for a cell. Sequences which perform such functions are considered by many experts in the field to have semantic information

The most important points of similarity between letters in a sentence and protein sequences are the rarity and the connection to something external.

The rarity of protein sequences is better compared with the rules of syntax, rather than semantic meaning. Only a very small proportion of word sequences conform to the rules of English syntax (“Cows flow supremely” does, but “Wanted the about” does not). This does not guarantee, however, that these sequences of words possess semantic meaning.

A protein sequence, taken as a whole, does bear a connection to something external, but in a meaningful English sentence, not only the sentence as a whole, but also the individual words, have to bear a connection to something external. Sentences are about some state of affairs, but the words that make up these sentences possess their own “aboutness.” I see no analogue to this two-level “aboutness” in proteins. Nor do I see any analogue to the rules of English grammar.

While I’m at it, I’d like to highlight another significant difference between protein sequences and human language:

Human language is unique in comparison to other forms of communication, such as those used by non-human animals. Communication systems used by other animals such as bees or apes are closed systems that consist of a finite, usually very limited, number of possible ideas that can be expressed. In contrast, human language is open-ended and productive, meaning that it allows humans to produce a vast range of utterances from a finite set of elements, and to create new words and sentences. This is possible because human language is based on a dual code, in which a finite number of elements which are meaningless in themselves (e.g. sounds, letters or gestures) can be combined to form an infinite number of larger units of meaning (words and sentences). [Wikipedia article: Language]

Let’s move on.

The fact that protein sequences operate hierarchically in life – amino acids form proteins which form metabolic reaction sequences which form a functional cellular metabolism – highlights the probabilistic challenge.

I see what you’re getting at, but if you’re going to talk about a probabilistic challenge that points to intelligent design, then you need to quantify it. There must be some very small probability p which places an event beyond the reach of chance. What’s your cut-off point? As I mentioned above, Dryden, Thomson & White argue that up to 4×10^43 different amino acid sequences could have been explored since the origin of life. Additionally, there are thought to be 10^24 planets in the universe. Not all of these will support life, but it seems to me that if you want to err on the safe side, you need to allow for about 10^65 amino acid sequences that might have been explored in the history of the cosmos. So, how rare are proteins? I used to believe Dr. Axe’s figure of 1 in 10^77 for the rarity of protein sequences, but after reading articles like this one over at The Skeptical Zone, I am no longer able to take such a figure seriously. See also my review of Axe’s book, Undeniable, and scroll down to the section, “The odds against building a 150-amino-acid protein by chance,” for further arguments against Dr. Axe’s estimate.

Now it’s true, as Dr. Meyer points out in Signature in the Cell, that even the simplest modern-day living organisms need around 250 kinds of proteins in order to survive. But it would be a mathematical fallacy to argue that even if Dr. Axe’s estimate is wrong and the odds of a functional protein sequence are (say) 1 in 10^12, the odds of getting 250 proteins is 1 in (10^12)^250, or 1 in 10^300, which is well below the threshold of 1 in 10^65. I’m sure you don’t need me to point out why.

The heart and soul of all this is the information contained in the protein mold. There is a nook in that mold for every piece and enough information passes hand for three functions: (1) the recognition of each piece, (2) the optimal positioning of the pieces relative to one another, and (3) the weakening or splitting of certain bonds in the pieces or the making of new bonds for welding the pieces together. [Loewenstein]

This doesn’t sound anything like semantics to me. It sounds more like putting Lego pieces together. Comparing it to semantic meaning is unhelpful at best and misleading at worst.

The fact that such information carries meaning, implications about something other than its own instantiation, places it in a unique category, which we might call semantic information, and its existence raises many deep and important questions and considerations… [Grisogno]

As we have seen, “aboutness” is not sufficient to warrant a comparison to semantic meaning. It has to be multi-layered.

I’d also like to point out that Grisogno contends that what she refers to as the “semantic information” of life could have evolved by natural processes. For example, on page 93, she discusses the origin of the “semantic information” in the form of coded replication:

Increasing evolvability and specificity seem to play a part. As more efficient catalysts evolved, they also became more specific, since the closer a catalyst’s fit to one set of reactants, the less it would fit alternate reactants with slightly different properties. A set of highly specific catalysts that vastly accelerate a particular process and have little effect on others lays a foundation for the transition from analogue to digital processing, and for the emergence of complex control structures and for the kind of self-organised criticality behaviour (Adami, 1995; Halley and Winkler, 2008) that seems to be required for autonomous choice.

Note that Grisigno talks about structures in the passage above - in other words, functional information, rather than truly semantic information. Grisogno goes on to express her confidence that “billions of years and massively parallel searches in possibility space” could account for the origin of the “extraordinarily complex processes we see today” (p. 94).

Dr. Miller, you appear to be arguing that semantic information is (i) found in life and (ii) only generated by a mind. The sources you cite who agree with you on (i) also make it quite clear that they do not accept (ii).

Over to you.


#8

@bjmiller great to have you here. I do want to ask we clarify if possible what our common ground is, and do what we can to scope this.

I would also add that I do not believe there is semantic information germane to DNA. Is it possible you mean functional information? Or information that encodes something that performs a function?

The move to syntax is somewhat helpful, but I’m not sure. Human language is much more constrained than DNA. @bjmiller you are claiming a strong parallel here between DNA and langage. No one disputes that there is information within DNA, however, what type of information is it? That is where the dispute lies.


#9

The precise definitions and categorizations one uses to describe genetic information and the exact similarities and differences with human languages are not the central issues but more, dare I say, a matter of semantics. Sorry, I could not resist. The authors I mentioned seemed comfortable closely connecting the terms semantic and functional, but I am happy to use the term functional information. In the context of life, one only needs to recognize the primary distinction between syntax (rules internal to protein molecules) and semantics (connections to something external). Francis Crick simply defined information as precise sequencing, and he was not too interested in the parsing of words.

Staying focused, the key points which need to be addressed are

  1. The likelihood of natural processes bringing together molecules in such a way as to produce a cell.
  2. The association of a cell with design.

Rarity of Proteins and Cutoff Values

The first question is just how many candidate amino acid sequences could have existed on the early earth which were long enough to fold into a stable enzyme or structural protein. To create any, the following steps would have been required:

  1. Significant quantities of a diversity of amino acids would have needed to form.
  2. This prebiotic mixture would have needed to remain separate from contaminants and other destructive influences.
  3. Only one enantiomer of each amino acid would have had to separate from a racemic mixture.
  4. Individual amino acids would have needed to combine into long chains through only peptide bonds.
    This combination of steps is so unlikely that not one candidate chain would have likely formed in the entire history of the earth, so the story ends before it really even starts.

Let’s ignore all of these problems and assume some source produced enormous quantities of randomized homochiral amino acid sequences for millions of years. And, let’s assume that a miracle membrane formed which was selectively and unidirectionally semipermeable: it only allowed the right molecules in and the right molecules out through one-way portals. In addition, it formed around the ideal collection of molecules to serve as a staging ground for cell formation. How many of the sequences would likely have moved through the membrane to help drive any given reaction in cellular metabolism? Even if our source produced trillions of chains every day, the concentration would have been dramatically diluted very quickly. The nascent cell would have been lucky to have seen a few dozen before some random event broke it apart.

The extremely high estimates of sequences by Dryden et al. were based on the number of genes in living organisms, so they are not as relevant to the origin of life. Other commonly heard numbers (e.g. 10^40) resulted from crude estimates which did not take into account realistic physiochemical constraints. Instead, they assumed that the entire planet was covered with a highly-concentrated solution of chains. As a result, a realistic upper limit of a trillion potential candidates is likely extremely generous. Even this number is dwarfed by the probability of forming a functional enzyme.

The criticism of Axe’s research makes a fascinating study in philosophical bias. His research was reviewed by experts in the field of protein-fold evolution, and they found both his methods and conclusions valid. Arguments against his research fall into two categories. First, they focus on Axe’s methods, and these usually refer to the blog post by Arthur Hunt. Axe has addressed Hunt’s criticisms which was courteous. However, he really did not need to do so. Hunt has nowhere near the experience in the field as Axe’s JMB reviewers. And, no expert in protein-fold evolution has reaffirmed Hunt’s views. In addition, all other studies on fold density in sequence space, which started with actual enzymes, uniformly demonstrate extreme rarity.

The second line of attack is to reference experiments which started with random sequences and then looked for some “function” being generated. Some experiments also used selection. The error in such arguments stems from the use of the term function. I will illustrate with an analogy. Imagine a chimpanzee named George is trained to randomly deform and combine pieces of scrap from a junkyard. One could ask how long would George need to assemble a functional rifle. The answer depends on the definition of functional. If one meant something which could be used as a weapon, the answer would be not very long. A long pipe could be used as a club. However, if one meant a functional firearm, the answer would be never.

Similarly, the aforementioned experiments found such functions as binding to ATP. Others which used selection could generate the ability to efficiently break apart ATP. Still others reported 3D structures. The problem is that none actually tested for structures which demonstrated all of the characteristics of a true enzyme:
• Chain forms into a stable 3D shape which includes an active site.
• The active site binds to target molecules and orients them properly for target reaction(s).
• Amino acid side chains drive target reaction(s).
• After reaction(s) complete, active site releases substrates, and protein returns to the original configuration.

A key stage in the origin of life is when a protocell’s internal environment moves away from equilibrium and maintains a homeostatic state. At this point, the proto-metabolism must maintain reactions which move energetically uphill. Such reactions require an enzyme to connect the uphill reaction to a downhill one such as the breakdown of an energy-carrying ATP-like molecule. Such an enzyme requires an active site with at least two binding sites, and it must drive and interconnect the two reactions. The site must also repel water molecules to make room for the substrates. Given that the specificity for just one binding site relates to a probability of 1 in a trillion, the probability of a random sequence performing all of these tasks must be far greater than the number of candidate molecules.

Upcoming Posts: Identification of design in cell and responses to other comments


#10

Hi Dr. Miller,

Thank you for your response. I’d just like to comment on a few points you raised:

The authors I mentioned seemed comfortable closely connecting the terms semantic and functional, but I am happy to use the term functional information. In the context of life, one only needs to recognize the primary distinction between syntax (rules internal to protein molecules) and semantics (connections to something external).

So you’ve conceded that functional information would be a better term to use than semantic information. Thank you. For my part, I acknowledge that there is indeed a valid biological distinction (which needs to be explained) between rules internal to protein molecules and connections to something external. However, this is quite unlike the distinction between syntax and semantics. The latter term doesn’t refer to a set of external connections, but to the meaning of the words in a sentence.

Staying focused, the key points which need to be addressed are

  1. The likelihood of natural processes bringing together molecules in such a way as to produce a cell.
  2. The association of a cell with design.

I completely agree.

The extremely high estimates of sequences by Dryden et al. were based on the number of genes in living organisms, so they are not as relevant to the origin of life.

Fair point. What upper limit would you propose then, for the number of amino acid sequences explored on the primordial Earth prior to the origin of life?

The criticism of Axe’s research makes a fascinating study in philosophical bias. His research was reviewed by experts in the field of protein-fold evolution, and they found both his methods and conclusions valid. Arguments against his research fall into two categories. First, they focus on Axe’s methods, and these usually refer to the blog post by Arthur Hunt. Axe has addressed Hunt’s criticisms which was courteous. However, he really did not need to do so. Hunt has nowhere near the experience in the field as Axe’s JMB reviewers. And, no expert in protein-fold evolution has reaffirmed Hunt’s views. In addition, all other studies on fold density in sequence space, which started with actual enzymes, uniformly demonstrate extreme rarity.

Both of the links you cite go back to 2011. I can’t tell you how many times I’ve linked to the first one, by Axe himself, while writing pro-ID articles for Uncommon Descent. However, the article criticizing Axe’s 1 in 10^77 figure which I linked to over at The Skeptical Zone was written only last year, in 2016. Why hasn’t Axe responded? As for the second link, did you read Arthur Hunt’s response on the same thread?

The article over at The Skeptical Zone is very systematic and absolutely devastating. I am astonished that Axe has not responded to it. I’ll just quote a few paragraphs, to convey the approach taken by the author:

The experimental evidence will consist of a sample of papers that correspond to three types of experiments. I’ll just call them tybes A, B and C.

Type A experiments will be references to papers wherein the researchers generated either totally random protein sequences of lengths over 50 amino acids with no bias for structure, function or folding and then subsequently screened large libraries for functions they decided on.

Or they will be experiments where the researchers randomly assembled larger proteins from smaller fragments of already functional proteins in a type of fragment-shuffling and recombination, which is incidentally how the evolutionary process is thought to have actually made most of the proteins in extant life (from insertions and duplications of smaller parts of already existing proteins into others).

Type B will be references to papers wherein the researchers generated protein sequences of various lengths, but this time with the random sequences constituting only a smaller portion of a larger fixed structure. In these experiments the “sequence neighborhood” of these fixed folded structures are then probed for functions by generating variations of these structures with mutations in a subset of the proteins. These types of experiments demonstrate that if you have a functional protein already, there’s usually another function close by in sequence space, meaning most of the functionality is arranged in interconnected networks that can be navigated by relatively few substitutions. Which basically makes Axe’s claim doubly absurd, since there’s probably another function nearby.

Type C will be references to papers that are similar to types A and B, but the sizes of the proteins are less that 50 amino acids, typically in the 7-22 amino acid range. These types of experiments demonstrate that cellular functions don’t need to be carried out by huge, multi-domain proteins and that such large extant proteins could be generated by the stepwise accretion of smaller fragments over the course of evolution (and probably even a random process at or around the origin of life).

The paper also critiqued Axe’s methodology:

It should be obvious that you can’t just take an enzyme, throw mutations into it and then when it stops working, say “this means there’s no other function out there”. That obviously doesn’t follow. For those reasons, Axe’s number just can’t be trusted as an estimation of that. It is a number that represents an estimate of a single very particular thing, which is the frequency of sequences 150 amino acids in length, that adopt the known Beta-lactamase fold and catalyze break down of ampicillin. For that reason it is NOT an estimate of the frequency of all functional proteins in all of 150 amino acid sequence space.

I hope the foregoing quotes are sufficient to convince you that the paper is a substantive one.

I also linked to my own critique of Axe’s 1 in 10^77 estimate. A couple of years ago, I contacted a leading biochemist (who happens to be a Christian) and asked him about Axe’s figure. He replied:

I think it’s fair to argue that modern proteins (50 to 1000’s of amino acids in length) probably didn’t come into existence in one fell swoop by selection from huge sequence pools, as the probability of success would be vanishingly small. Nevertheless, there is no reason that I can see that primitive proteins had to be very large. For example, protein monomers of 10-15 residues could assemble into four-helix bundles or higher oligomers. The sequence information required for stable four-helix bundles is pretty minimal, largely having hydrophobic residues at buried positions. Michael Hecht’s work at Princeton has shown that randomized sequences designed to adopt a four-helix bundle topology (albeit in this case, all of the helices were covalent connected) frequently have primitive enzymatic activity. I assume that once a useful cellular activity arose by chance, then mutagenesis, recombination to fuse protein segments, and selection would provide a path to the proteins and enzymes we now find. So I think the counterargument to the ID folks is not that sequence populations of 10^80 needed to be searched to find a 100-mer with robust enzyme activity, but rather that random populations of a few million relatively small proteins could contain a few molecules from which to start the evolutionary process.

The biochemist also mentioned that other work from Professor Hecht’s lab had shown that short proteins that fold into 4-helix bundles have unexpectedly specific ligand binding properties. The biochemist regarded Dr. Axe’s work as highly biased, because he had based his studies and calculations on very large sequences of amino acids, even though much shorter sequences (such as polypeptides) are known to have biological functions.

You also wrote:

The second line of attack is to reference experiments which started with random sequences and then looked for some “function” being generated… [T]he aforementioned experiments found such functions as binding to ATP. Others which used selection could generate the ability to efficiently break apart ATP. Still others reported 3D structures. The problem is that none actually tested for structures which demonstrated all of the characteristics of a true enzyme:
• Chain forms into a stable 3D shape which includes an active site.
• The active site binds to target molecules and orients them properly for target reaction(s).
• Amino acid side chains drive target reaction(s).
• After reaction(s) complete, active site releases substrates, and protein returns to the original configuration.

The TSZ paper I quoted from above found an enzyme that catalyzes ATP hydrolysis. The protein was catalyzing the conversion of ATP to ADP. Later, the author of the TSZ paper references another experiment, where “the frequency of enzymes with Esterase functions is apparently ONE IN 1000 random sequences 140 amino acids in length (the initial library is 1000 members, the subsequent selection steps are from populations of only 10). Let that sink in. 1 in 1000. Not 1 in 10 to the seventy-seventh power. Just one in one thousand.”

Now you might argue that these enzymes don’t do all the things on your bullet list. Not being a chemist, I couldn’t say. In any case, as far as I know, you’re the first person who’s added these extra stipulations. I don’t recall Axe doing so, and as the foregoing quote from the TSZ paper shows, there are legitimate criticisms that can be raised of his methodology.

I think it’s fair to say that Dr. Axe has some explaining to do.

Such an enzyme requires an active site with at least two binding sites, and it must drive and interconnect the two reactions. The site must also repel water molecules to make room for the substrates. Given that the specificity for just one binding site relates to a probability of 1 in a trillion, the probability of a random sequence performing all of these tasks must be far greater than the number of candidate molecules.

The mathematical question we need to ask here is: given the 1 in a trillion specificity for the first binding site, once it is generated, what proportion of possible sequences would meet the requirements for the second binding site? The point I’m making is that we can’t say P(AB)=P(A).P(B), as if the two were totally independent events. Rather, we should say that P(AB)=P(A).P(B|A).

You also seem to be assuming that a proto-enzyme would be absolutely useless until it acquired all of the properties of what you call a “true enzyme.” That’s a big assumption, and I would question it.

I would refer you to a recent article dated November 1, 2017, which mentions two new papers from biochemists and biologists at the University of North Carolina at Chapel Hill and the University of Auckland, supporting the “peptide-RNA” hypothesis over the “RNA-world” hypothesis. The papers “show how recent experimental studies of two enzyme superfamilies surmount the tough theoretical questions about how complex life emerged on Earth more than four billion years ago.”

The special attributes of the ancestral versions of these enzyme superfamlies, and the self-reinforcing feedback system they would have formed with the first genes and proteins, would have kick-started early biology and driven the first life forms toward greater diversity and complexity, the researchers said…

At the heart of the peptide-RNA theory are enzymes so ancient and important that their remnants are still found in all living cells and even in some sub-cellular structures, including mitochondria and viruses. There are 20 of these ancient enzymes called aminoacyl-tRNA synthetases (aaRSs). Each of them recognizes one of the 20 amino acids that serve as the building blocks of proteins…

The 20 aaRS enzymes belong to two structurally distinct families, each with 10 aaRSs. Carter’s recent experimental studies showed that the two small enzyme ancestors of these two families were encoded by opposite, complementary strands of the same small gene. The simplicity of this arrangement, with its initial binary code of just two kinds of amino acids, suggests it occurred at the very dawn of biology. Moreover, the tight, yin-yang interdependence of these two related but highly distinct enzymes would have stabilized early biology in a way that made inevitable the orderly diversification of life that followed.

Thoughts?


#11

@bjmiller on this point, @vjtorley has done his homework. I can add technical details if they come up, but there are solid reasons why biologists reject Doug Axe’s take on proteins.

This does not mean abiogenesis is plausible. Frankly, scientifically speaking, we just do not know how it happened. It is a reasonable place to wonder if God intervened. Maybe He did directly create the first cell. You seem to be trying to take that well agreed upon fact farther.

However, relying on a fringe theory of protein science that is factually contradicted by empirical evidence and rejected by most biologists is not helping your argument. I wonder if there is something more interesting in your own contributions here from thermodynamics. At the very least, you are much more qualified to comment on your own contributions. Defending Axe is not really going to get us very far.

Also, to be clear, I have met Axe and mean him no ill-will in these assessments. None of this is personal. He strikes me as a deeply convinced person who really believes what he is saying about proteins. Having just read his contribution to the Crossway book on Theistic Evolution, it is also clear he is convinced he must take a confrontational approach to science. So I see why he has been willing to sacrifice so much to make his case. Doug does not seem dishonest. It seems, rather, that he is just honestly wrong on the facts here.

If you are hoping to strengthen the case for design in abiogenesis, it will be more convincing if you can do so without relying on his work.

Very interesting find @vjtorley. Had not seen this yet. Very interesting indeed.


#12

Now you might argue that these enzymes don’t do all the things on your bullet list. Not being a chemist, I couldn’t say. In any case, as far as I know, you’re the first person who’s added these extra stipulations. I don’t recall Axe doing so, and as the foregoing quote from the TSZ paper shows, there are legitimate criticisms that can be raised of his methodology.

In Doug Axe’s JMB article, he explicitly states that he is focusing on true enzymes.

The focus here will be upon enzymatic function, by which we mean not mere catalytic activity but rather catalysis that is mechanistically enzyme-like, requiring an active site with definite geometry (at least during chemical conversion) by which particular side- chains make specific contributions to the overall catalytic process.

Any expert in the field would immediately recognize that his description includes the four criteria I mentioned. In contrast, none of the experiments starting with random sequences cited as counterevidence on Skepticzone or elsewhere identified any genuine active sites. The mechanisms behind generic catalysis and the nano-manipulations performed by the latter (Please see video) are qualitatively different. I would be interested in references to any research articles which either started with a genuine enzyme in nature with an active site and demonstrated that true enzymatic folds are not rare or started with random sequences with the help of selection and demonstrated the formation of a true enzyme with an active site. Also, can all of the other studies which demonstrated rarity in enzyme folds really be ignored so easily? Doug’s research was simply one in a succession of studies. Even the biochemist you cited stated “modern proteins (50 to 1000’s of amino acids in length) probably didn’t come into existence in one fell swoop by selection from huge sequence pools…” (See below). The active-site distinction is crucial in addressing the issue of the origin of life for reasons which I will soon discuss.

Information in Enzymes

A helpful understanding of information relates to the reduction of uncertainty or the constraining of outcomes. Specifically, highly improbable reactions must be directed to proceed in the right location. On the early earth, millions of chemical reactions could possibly take place. And, the energetically uphill reactions required for the first metabolic pathways in a protocell are strongly disfavored. As mentioned, they require that the breakdown of a high-energy molecule is linked to the target reaction. The only way to accomplish this feat is for an enzyme to force the two reactions together through an active site which accomplishes several goals such as the following:

  1. The active site must bind to at least two molecules in such a way as they are properly oriented and positioned next to each other.
  2. The active site must first catalyze the breakdown of the high-energy molecule.
  3. It must then chemically bond part of the broken molecule to a target molecule to help drive another reaction, which the enzyme will often direct.
  4. Finally, the new molecules must be released, and the enzyme returns to its original configuration.

The amino acid sequence must fall within the category of sequences which correspond to the active site forming into the proper configuration to bind to the right molecules and then drive the right reactions in the right order. The information related to the sequence corresponds to the improbability of a random sequence falling into that category. This improbability grows with increased fine-grained control over the reaction. I am linking the Shannon measure of information to functional specificity. Shannon’s measure of information also directly corresponds to entropy, for both equate to the average of the log of the number of microstates (e.g. sequences) which correspond to a given macrostate (e.g. functional active site). The matching of an amino acid sequence to an external category of patterns also links Dembski’s idea of specificity to other authors’ use of the term “semantics” to the idea of causal control. Everything leads back to the question of how improbable is generating the right sequence.

As mentioned, all studies on actual enzymes concluded that sequences are extremely rare. And, studies on ATP binding indicated that a binding site relates to a probability of 1 in a trillion. In addition, the probability of an active site forming is much worse than simply the product of the probabilities of two binding sites. For, the two sites must reside in the right proximity, and all substrate molecules must have the right orientations. Even more challenging, the active site must have amino-acid side chains properly positioned to break off the right atoms and then reconnect the right ones in the right locations to link at least two reactions. Therefore, the probability for such an enzyme forming is far worse than the possible number of trials in a local environment.

Thermodynamics and Early Metabolism

Once a protocell moved away from equilibrium into homeostasis, it would require an internal mechanism to process energy, so it could keep driving uphill reactions. This engine would involve several chemical steps, most of which would require their own enzymes. Its goal would be to recharge high-energy molecules which would then be used in other processes directed toward maintaining homeostasis. The protocell would also need to manufacture at least some of its building blocks which would require additional metabolic pathways. In summary, the point of moving into homeostasis would require immediate access to multiple highly-targeted enzymes. The probability for multiple sequences originating which correspond to the right sets of coupled reactions is exponentially smaller than for just one. Information for the sequences must be provided from the outside.

The parallels between human language and the required information to establish the first metabolism are not essential to my argument, but they are too irresistible to go unmentioned:

  1. Letters form words --> AA sequences form alpha helices and beta sheets.
  2. Words form sentences --> Ordered alpha helices and beta sheets with additional AAs form domains which could be smaller proteins.
  3. Sentences form compound sentences --> Multiple domains form larger proteins.
  4. Sentences form paragraphs --> Sets of proteins drive sequential reactions in a pathway.
  5. Paragraphs form chapters --> Pathways interlink to form a functional metabolism.

Identification of Design in Fabrication

The identification of design in relation to the origin of life is demonstrated at two levels. First, experiments and theoretical attempts to understand the origins process consistently require intelligent direction at key junctures. And, the overall characteristics of a minimally functional cell demonstrate clear signs of design. So, design is seen both at the front end of the fabrication requirements and the back end of the final product. In terms of the former, all origins scenarios invoke teleology at nearly every step.

I think it’s fair to argue that modern proteins (50 to 1000’s of amino acids in length) probably didn’t come into existence in one fell swoop by selection from huge sequence pools, as the probability of success would be vanishingly small. Nevertheless, there is no reason that I can see that primitive proteins had to be very large. For example, protein monomers of 10-15 residues could assemble into four-helix bundles or higher oligomers. The sequence information required for stable four-helix bundles is pretty minimal, largely having hydrophobic residues at buried positions. Michael Hecht’s work at Princeton has shown that randomized sequences designed to adopt a four-helix bundle topology (albeit in this case, all of the helices were covalent connected) frequently have primitive enzymatic activity. I assume that once a useful cellular activity arose by chance, then mutagenesis, recombination to fuse protein segments, and selection would provide a path to the proteins and enzymes we now find. So I think the counterargument to the ID folks is not that sequence populations of 10^80 needed to be searched to find a 100-mer with robust enzyme activity, but rather that random populations of a few million relatively small proteins could contain a few molecules from which to start the evolutionary process. [Biochemist]

What is so striking about these comments is that they essentially affirm the conclusion that fully functional enzymes are too rare to come about by chance. The problem with the proposal of bundles combining is the driving tendency of natural processes would have been to break apart the sequences, not link them into longer chains. And, no selective process was available on the early earth to save those which performed some generic enzyme activity. Long before any sort of replication emerged, the protocell would have needed fully functional enzymes which could pick out the right molecules and the right two-step reactions from a sea of other possibilities. And, even in the most optimistic protein replication theories, a fully functional enzyme is already needed to enable replication. In addition, even if such an assistant enzyme formed, how could it ever find other chains and a developing protocell floating around somewhere on an entire planet? Outside assistance was needed.

Increasing evolvability and specificity seem to play a part. As more efficient catalysts evolved, they also became more specific, since the closer a catalyst’s fit to one set of reactants, the less it would fit alternate reactants with slightly different properties. A set of highly specific catalysts that vastly accelerate a particular process and have little effect on others lays a foundation for the transition from analogue to digital processing, and for the emergence of complex control structures and for the kind of self-organised criticality behaviour (Adami, 1995; Halley and Winkler, 2008) that seems to be required for autonomous choice.
[Grisogono]

Grisogono’s narrative also invokes goal direction for the same reasons. How would nature know to select for the right catalysts? She continues to call upon design later in the chapter when she postulates how autocatalytic cycles could have formed and then be selected for efficiency and the ability to self-replicate. However, autocatalytic cycles are spontaneous. They might only need the help of simple chemical catalysts. In contrast, the reactions in life are not spontaneous or too slow, and they are often energetically unfavorable. Nature would have selected against the latter long before the emergence of the essential targeted enzymes. Only intelligent design could explain how a specific set of improbable reactions would have been selected from such a vast set of other possibilities to reach the future goal of a functional cell. And, only intelligence could explain the creation of numerous information-rich sequences in the same locale which corresponded to those very reactions.

Origins experiments also consistently confirm the need for design. Those which produce most of the building blocks of life require numerous highly orchestrated steps. The production of RNA alone includes over a hundred, many of which require trained chemists. Such complex protocols constrain the molecules to combine into the desired products. The information provided by the protocols plays the same role as the information in the AA sequences of enzymes, and the highly-controlled reaction steps play the role of the active site in reassembling the molecules. Information/design is required in both cases.

As I described, natural processes would have constantly worked against the formation of a cell, and the probability of random arrangements of molecules forming even a single metabolic pathway are too great to be explained by chance. Every origins scenario must involve molecules being forced into fantastically improbable configurations through pathways which move against the driving forces of thermodynamics and the known tendencies of organic reactions. And, they must move toward the end point of a highly-integrated system of multiple reactions acting toward the purposeful goal of life. Such processes always point without exception without ambiguity to design.

The possibility of future discoveries overturning the most fundamental laws of thermodynamics are remote, and the likelihood that chemists will someday discovery how the formation of life could overcome nearly every natural tendency known from chemistry is exceedingly unlikely. Natural tendencies include the following:

  • Homochiral --> Racemic
  • Complex molecules --> Simple molecules or biologically inert tars
  • High concentrations of pure solutions --> Low concentrations of mixed
  • Out of equilibrium (homeostasis) --> Towards equilibrium

Identification of Design in Final Product

One common myth is that a cell could have gradually developed over millions of years as metabolic reactions were added one by one. In reality, with each step away from equilibrium, nature would have pushed a protocell more forcefully back toward simple chemicals. A fully functional metabolism would have been needed very soon. The maintenance of such a metabolism would have required numerous well interconnected metabolic pathways. If only one enzyme were missing or if one essential molecule were not continually supplied, the entire system would disintegrate.

In addition, a minimally complex cell demonstrates unmistakable signs of intelligence which invoke such words as foresight, agency, and coordination. It contains nanotechnology for information processing, manufacturing, auto-assembly, and much more. The conclusion of design is consistent with all experiments, theoretical analyses, and human experience. Therefore, one can conclude design as confidently as a SETI researcher who discovered a signal from space which contained the schematics of a spaceship.


#13

Also, have a wonderful Christmas and New Year.

And, thank you for the stimulating discussion.


#14

Hi Dr. Miller,

I hope you’re enjoying Christmas. It’s already Boxing Day over here in Japan.

Your last post was very meaty, so I hope you’ll forgive me if I take a while to respond. For now, I’d just like to say that I don’t believe the first living thing contained modern, long-chain proteins (50 to 1000’s of amino acids in length) of the kind investigated by Dr. Axe. The first living thing was likely much simpler than that.

You correctly point out that natural tendencies would have tended to push any molecules back to a primitive state (racemic, low concentration, simple or biologically inert, and closer to equilibrium), but the vital question is: how quickly? I would also argue that while these natural tendencies would have been very strong for large and complex biomolecules, they would not have been so strong for smaller ones.

Perhaps your best argument is that the set of reactions required to build even a simple living thing would have been so fantastically improbable that only a guiding intelligence could have selected the pathway. You are probably correct here, but if I were arguing this point with an atheist, I would want to have some numbers on which to base my argument, even if they were merely back-of-the-envelope estimates.

The only back-of-the-envelope estimate I’ve seen is Koonin’s, and I’ve criticized his logic in my review of Axe’s book here (scroll down to the first mention of “Koonin” and you’ll find the relevant passage.) While we’re on the subject of Koonin, I’d like to note that his criticisms address a specific scenario for the origin of life (the RNA world) which I’ve rejected above, in favor of a peptide-RNA model. Koonin also concedes that ribozymes could have formed on the primordial Earth, even under a “chance” model. I gather Dr. Axe would disagree on this point.

That’s all I’ll say for now. If Joshua wishes to weigh in, I’d be interested to hear what he has to say.


#15

You correctly point out that natural tendencies would have tended to push any molecules back to a primitive state (racemic, low concentration, simple or biologically inert, and closer to equilibrium), but the vital question is: how quickly? I would also argue that while these natural tendencies would have been very strong for large and complex biomolecules, they would not have been so strong for smaller ones.

The thermodynamic forces would act at every stage. In particular, many of the building blocks for life are relatively high-free-energy molecules, so they would never have formed in significant quantities. In addition, the drive towards racemic mixtures and dilution would have acted pervasively and the former immediately. And, the prebiotic molecules would have constantly faced the challenge of destructive cross-reactions. Many of these processes would have acted on time scales of seconds or less.

Perhaps your best argument is that the set of reactions required to build even a simple living thing would have been so fantastically improbable that only a guiding intelligence could have selected the pathway. You are probably correct here, but if I were arguing this point with an atheist, I would want to have some numbers on which to base my argument, even if they were merely back-of-the-envelope estimates.

Here is the start of a quick calculation of probabilities. First, producing an AA chain would require homochiral AAs to form peptide bonds of sufficient length. I will make several very generous assumptions:

  1. Average AA length for enzyme which could couple two reactions is assumed to be 100.
  2. The chance of a chain of any given length adding a new AA as opposed to breaking apart is assumed to be 80%, so chance of forming 100 unit chain is (.80)^100.
  3. Starting mixture is assumed 80% homochiral with L-AAs, so the probability for only L-handed AAs forming 100 unit chain is (.80)^100.
  4. The probability of two AAs forming peptide bond as opposed to an alternative bond is 80%, so the chances of 100 AAs forming right bond is (.80)^100.

The chance of a chain forming which is a candidate for a functional enzyme is then
P = (.80)^100 * (.80)^100 * (.80)^100 which is roughly 1 in 10^29

As I mentioned before, the chance of a chain having an enzyme-generating sequence is less than 1 in 10^24. Therefore, the probability of a chain having the right length, chirality, and bonds combined with the right sequence is less than 1 in 10^50. Any metabolic pathway would require at least two enzymes, so the odds drop to less than 1 in 10^100.

The number of trials available depended on the number of regions on the early earth with significant concentrations of a minimal base set of amino acids and nucleotides. If the AA base number were significantly less than 20, the number of needed AAs in a chain would increase making the problem worse. Realistic origins experiments suggest that the number of regions was likely zero.

An additional problem is that an AA chain with weakly catalytic activity cannot slowly evolve into a true enzyme. The chemical reactions and binding in an active site result after the chain folds. The relevant AA side chains reside at positions far apart in an unfolded polymer, so any chemical activity before folding is not related to the chemical activity in the active site. A proper sequence must be present before any active-site activity commences.

However, the problem is even worse when one considers the energy barriers which are ignored by the previous calculations. The chances of a collection of molecules on the early earth coming together with the internal energy equivalent to a simple cell was calculated by physicist Harold Morowitz, and he determined that the odds would have been less than 1 in 10 to the power of a billion (Morowitz, Energy Flow in Biology, p. 98). His calculation was based on the canonical ensemble from statistical mechanics, so the result is independent of the specific path, the time allotted, and the number of steps. The analysis assumed equilibrium conditions, but taking into account non-equilibrium processes does not improve the odds for reasons I described in my other articles.

The challenge of ending with the right configuration of atoms and reactions in a minimal cell is equally problematic. The number of possibilities calculated for a yeast cell is more than 10 to the power of 10 billion. The first bacteria cells were simpler, but the search space is still absolutely enormous, and life-permitting configurations would have been an infinitesimal fraction.

While we’re on the subject of Koonin, I’d like to note that his criticisms address a specific scenario for the origin of life (the RNA world) which I’ve rejected above, in favor of a peptide-RNA model. Koonin also concedes that ribozymes could have formed on the primordial Earth, even under a “chance” model. I gather Dr. Axe would disagree on this point.

The peptide-RNA scenario you mentioned requires highly coordinate reactions, and some would have been energetically unfavorable, certainly the ones leading to nucleotides. Therefore, the proposed model would have required a suite of true enzymes which could interconnect targeted reactions, so it faces all of the probabilistic hurdles mentioned above. The challenges of forming ribozymes are just as daunting for the same reasons. The barriers of thermodynamics and searching the vast space of possibilities are inescapable.


#16

@bjmiller this appears to be a flawed calculation, and also gets to the heart of some my critiques of your blog posts. To be clear, we are avoiding for a moment the information content of the AA chain, but exclusively focusing on forming a long chain.

The probability calculation you lay out here just seems wrong. In fact, if it were true, polymer chemistry in general would not work at all. Polystyrene, for example, would not be possible to form. Neither would nano-tubes or sheets.

For #1 I disagree with the assumption, but we can leave that for now (see above).

#2 is predicated on a particular set of conditions that need not hold. For example, if we have an evaporating pool of water, the concentration of the amino acids is increasing, and because a water is consumed by cleavage, then there is a strong push towards “condensing” (or forming a peptide bond). Moreover, why is that only monomers can add to the chain? If we have solution of peptides, the peptides can join with each other.

It turns out that wet dry cycles (very unsurprisingly) can build up peptide chains easily, without a catalyst, avoiding the incorrect bonds too. This is in a questionable journal, but has been cited quite a bit and is done by a legitimate research team (see for example https://link.springer.com/article/10.1007%2Fs00239-017-9799-3) . My guess is that this was so blindingly obvious to most organic chemists that they had difficulty publishing it elsewhere. The key point is that your specific claim here, if correct, would prove polymer chemistry impossible; we would just never see non-biotic polymers anywhere.

#3 is false too. It is possible (or even likely) that the first proteins were composed of amino acids that had a preference for forming bonds with those of the same chirality. Moreover, it is not clear that the amino acids for a functional protein need to be all L-AA. All that is important is that they eventually they all become L-AA, not that they start out this way.

#4 is false too. It is not clear that the amino acids for a functional protein need to be exclusively peptide bonds. I’m not sure exactly what you mean by “alternative bonds.” Perhaps the ester bond? However, read the paper I just linked to. That is less stable than the peptide bond, and the peptide bond is favored thermodynamically (right?).

So that calculation in to your final P is subject to dispute.

Once again, I dispute this because it is not at all clear that that this is required for a metabolic pathway. Maybe it was, but how do you know?

Why would we think this? I do not follow the reasoning.

That I agree with. I imagine everyone does. But the key question is how small of an infinitesimal fraction are we talking about? That requires dividing one large number by another much larger number (both of which are uncomputable), with enough confidence to make a statement absent evidence. I’m not sure that is within the bonds of science.


#17

@bjmiller can you please clarify your intention (i.e. end goal) with your argument?

Most scientists would agree with the quoted statement here. What exactly is the point you are trying to make beyond this?


#18

Hi Dr. Miller,

I’d like to thank you for providing the calculations that I requested. I’d just like to ask: how you shown them to Dr. James Tour? What does he think of them?

You contend that the drive towards racemic mixtures and dilution would have acted on time scales of seconds or less. Do OOL researchers in the field agree with that assessment? I’m just curious: if they do, then the attention given to Sutherland’s OOL scenarios in science news reports makes no sense at all.

Hi Joshua,

Thank you very much for your detailed response above. As I indicated previously, it strikes me as quite likely that God created the first cell. However, I agree with you that any scientific argument for that conclusion needs to be a very solid one.

By the way, I should inform you both that I’ll be on holiday in Australia for the next six days, so it may be a little while before my next response. I’d just like to wish you both a happy New Year.


#19

I think I’ve just demonstrated an experiment that invalidates one of the key claims. It turns out a simple system can in fact produce peptides.

The key question is now, however, more pointed. How can we trust the mathematical arguments being offered here if they lead to a solid conclusion that something is impossible, and then a simple experiment demonstrates them possible? I do not think we can trust them.

Remember, there is already so much agreement here…

That is where the agreement lies. Most biologists would say those are plausible beliefs. Some would even say they are warranted beliefs. I think the case for this, however, is not aided by incorrect arguments. My push back @bjmiller is not about your final conclusion, but about the “mathematical work” you put forward do to get there.

Is it possible you are right about the conclusion, but wrong in your reasoning to demonstrate it correct?


#20

#3 is false too. It is possible (or even likely) that the first proteins were composed of amino acids that had a preference for forming bonds with those of the same chirality.

The only reference I have ever seen related to this claim is based on the Salt-induced Peptide Formation (SIPF) reaction. The authors of a primary study start by stating that no chiral bias exists with AAs under normal conditions.

Because this energy difference is very small (for amino acids in the order of 10^−38 to 10^−35 J; proportional to Z5, Z being the atomic number) it cannot lead to a significant stereoselective differentiation by itself, but still requires amplification mechanisms to yield a stereoselective preference for one chiral form in the prebiotic reaction pathways leading towards the origin of life.

They then describe the exploitation of a parity violation in Cu(II) based on the weak nuclear force to bias, in some AAs, L-L binding over D-D binding. Under certain circumstances Cu(II) can form a complex which generates this bias, but the conditions include starting with unrealistically high concentrations of Cu(II). Moreover, the bias was only highly significant for Valine, and some AAs actual showed dominance for D-D binding. Most significantly, the reported experiment only used pure L-L or D-D amino acids. The researchers did not demonstrate bias of chiral binding over non-chiral binding (L-D). I suspected they tested this possibility, but the results were too disappointing to report. As a consequence, the mechanism has only been used as a possible explanation for dominance of L-proteins over D-proteins today, not the selection of L-AAs in protein formation.

Moreover, it is not clear that the amino acids for a functional protein need to be all L-AA. All that is important is that they eventually they all become L-AA, not that they start out this way.

#4 is false too. It is not clear that the amino acids for a functional protein need to be exclusively peptide bonds. I’m not sure exactly what you mean by “alternative bonds.” Perhaps the ester bond? However, read the paper I just linked to. That is less stable than the peptide bond, and the peptide bond is favored thermodynamically (right?).

The current opinion seems to be that homochirality was a prerequisite for functional enzymes due to the need for chains to form stereoregular structures.

Homochirality is now believed to be not just a consequence of life, but also prerequisite of life. Because stereoregular structures such as a protein beta sheets, for example, do not form with mixtures of monomers of both candidness, as described below.
Herdewijn and Kisakurek (Eds.), Origin of Life, p. 218

For the same reason, AAs need to form solely alpha-peptide bonds. Examples of other bonds include the beta-carboxyl group of aspartic acid, the gamma-carboxyl group of glutamic acid, and the epsilon-amino group of lysine. Studies conducted several years back indicated that the non-alpha bonds formed as often as the alpha bonds. (For more detailed discussion, see Mystery of Life’s Origins, p. 157)

However, the situation is actually far worse that I described. All OLL experiments which yielded multiple amino acids also produced several other byproducts in greater total abundance than nearly all of the AAs. As a result, the possible number of bond types formed in any realistic scenario would be quite large. Therefore, my estimate of 80% for forming the alpha-peptide bond is very likely over an order of magnitude too high.

The key question is now, however, more pointed. How can we trust the mathematical arguments being offered here if they lead to a solid conclusion that something is impossible, and then a simple experiment demonstrates them possible? I do not think we can trust them.

One has to be careful distinguishing between what is possible in highly controlled experiments and what is plausible in realistic environments. For instance, the Ester-Mediated Amide Bond experiment you referenced started with 100mM concentrations of amino acids and lactic acid, and the solution was then dehydrated and rehydrated under very controlled conditions. In the end, it did not produce pure peptide chains but chains with ester bonds, even after numerous dry-wet cycles. The ester bonds would have prevented proper chain folding.

In addition, the chances are quite remote for a pond on the early earth to meet all of the corresponding conditions:

  • It must have contained very high concentrations of lactic acid (or the equivalent) and amino acids in the right proportions.
  • Some heat source must have evaporated the entire pond with the right amount of energy for the right amount of time to remove the water, yet not damage building chains.
  • After each evaporation, the pond must have filled with water with additional alpha-hydroxy acids without removing the contents.
  • The AAs and chains must have been shielded from reacting with surrounding molecules.

Equally problematic, the maximum chain length achieved was 14, and that length was only for the smaller AAs. The ones with longer side chains created products which were “more complex.”


A recent article reviewed experiments attempting to generate polypeptides, and it described how the proportion of chains in all circumstances drops off exponentially with length as described by the Flory–Schulz distribution. Based on this equation, the authors estimated that the proportion of chains 40 units in length would have been 1 part in 10 trillion. They responded to this discouraging result by proposing a model for smaller chains combining into larger ones, but they acknowledged that it was founded purely on speculation. In reality, even this meager estimate results from experiments which used ideal (even miraculous) conditions. In realistic settings, the proportion would drop with length far faster. As a result, my estimate of 80% for the addition of an AA to a chain was again extremely generous.

Let me repeat my calculation with more realistic numbers. I will assume a chain of 40 AA in length, so I will us the 1 in 10 trillion figure. And, I will assume a solution of molecules matching the output of the most successful OOL experiment based on the most likely conditions. I will also combine the probabilities of the right bond forming for any given AA with another AA, as opposed to another molecule, and that associated with the homochirality condition. As a result, I will drop my estimate of an L-AA bonding to another L-AA with an alpha-peptide bond to 10%. My calculation for a candidate chain then becomes the following:
P = (.10)^40 * 10^-13 = 10^-53 (i.e. clearly implausible)
I have not shown Jim Tour my calculations, but I will be sure to do so after more analysis.

The remarkable nature of enzyme chemistry is that the active site consists of key side chains in close proximity which work together for some function such as ligand binding or driving specific chemical reactions. They also create the right local environment which is crucial. However, those AAs are not often close together in the unfolded AA sequence. Their coordinated activity only begins after the protein properly folds to form the active site. Even a small difference in the final folded structure can completely eliminate one of the key functions. The positioning and orientation of ligands need to be accurate to within the width of an atom, and the same holds true for the positions of side chains which drive the reactions. Therefore, any observed catalytic activity before folding is not typically associated with the driven reactions after folding. In other words, the chemistry at the active site is not the sum of chemical activities induced along the unfolded chain which can gradually improve through mutations and then combine together after the folding is complete.

My focus is on the point where a protocell creates an internal environment which is held away from equilibrium. At that point, enzymes must be present which interconnect two reactions, so the enzyme must be highly specified. I would be interested in anyone’s knowledge of the smallest enzyme which interconnects two reactions, where one goes uphill. I am very confident that the size would be well above 40 AAs.

The challenge with the OOL is that along the path to a minimally functional cell numerous metabolic pathways would have to form and interconnect. And, as the system moves away from equilibrium, they would have to become increasingly efficient and complex. However, the idea that selection can drive this process forward is implausible until the proteins are encoded into DNA and then the corresponding decoding process originates. RNA is too unstable for the job. Until that time, improved enzymes could not be the result of random changes to the sequences since proteins do not self-replicate. New proteins would have to originate outside the cell through some miraculous process and then find their way into the cell. In fact, even after the encoding-decoding initiated, selection would still not operate until the entire cellular system developed high-fidelity self-replication.

You contend that the drive towards racemic mixtures and dilution would have acted on time scales of seconds or less. Do OOL researchers in the field agree with that assessment? I’m just curious: if they do, then the attention given to Sutherland’s OOL scenarios in science news reports makes no sense at all.

I was careful to state that many of the processes would have been very fast, not all. Fast reactions would include dilution and many of the cross-reactions. The breakup of proteins would have taken years to decades. This timeframe might seem long compared to waiting for a coffee at Starbucks, but it is minuscule compared to the time required for some new enzyme to find its way inside the membrane of a protocell which could reside hundreds of miles away. The racemization would have been much slower, but not likely slower than any process which could have pushed towards homochirality.

Sutherland’s research is not taken very seriously by many is the OOL community since the reactions required so numerous highly orchestrated steps. Even he has become more sober-minded as of late about their relevance.

That I agree with. I imagine everyone does. But the key question is how small of an infinitesimal fraction are we talking about? That requires dividing one large number by another much larger number (both of which are uncomputable), with enough confidence to make a statement absent evidence. I’m not sure that is within the bonds of science.

Nearly all OOL scientists would fully acknowledge that the proportion of configurations of molecules forming life to non-life is too small for nature to stumble upon life by chance. For instance, Eugene Koonin estimated that the likelihood for the simplest translation mechanism from RNA to proteins based on ribozymes forming was 1 in 10^1000. The actual process of translating from AA sequences in a protein to RNA and then back to a protein using the same code would have been much more difficult. Researchers instead argue that natural driving tendencies helped beat the odds. However, nearly every known natural tendency would have made the odds far worse than pure chance.

Every attempt to explain the OOL falls into three categories. Those experiments based on highly realistic conditions produce negative results. Those which use a modest degree of investigator control produce meager to very modest results. And, those experiments which produce seemingly encouraging results use high levels of investigator control – numerous highly specified steps are used to force desired outcomes. In short, the constant message is that intelligent design is an essential component of the origin of life.