Sal's Flower?

The question is what is a realistic number for a new gene from a random string to emerge fixed in a population. When you think through the detail it may be improbable in a vertebrate population no matter what assumptions you make. You have a long waiting time issue when coordinated mutations are involved.

Please show your work.


Human intuition can be a poor judge of probabilities. For example, how many people do you think you need before you have a 50% chance of two of those people sharing a birthday?

Can you cite one example of a “coordinated mutation” in vertebrates?


That depends on what fraction of strings are functional. As it is evidently reasonably high…

No, it doesn’t need to be fixed in the population.

May be. Isn’t.

1 Like

What do you mean “strings are functional”. We are observing thousands of de novo genes that can build a functioning organisms in Sal’s flower.

Sal’s flower has thousands of de novo genes in each animal type.

Are humans and mice different types?

1 Like

Other then getting in a rhetorical discussion about coordinated mutations it can take lots of searches the find function in a protein coding gene. The amount depends on the specific function but it can vary from numbers as small as 10^8 for a smaller protein to estimates well above the evolutionary resources of vertebrates.
Hayashi paper fitness landscape.pdf (396.6 KB)

In this case Hayashi estimated 10^70 searches to find a wild type protein. This is a very difficult problem for de novo gene formation in Sal’s flower. The only response to this is to claim that function is high in sequence space where function is not defined. The data does not support this type of claim as there are many proteins that are highly conserved over millions of years.

The origin of de novo genes and “lost and found” genes in Sal’s flower creates a large problem for the claim that common descent explains the pattern.

Trying to model a process that can account for de novo genes is one of most difficult problems in evolution and may not be solvable unless a deterministic process is discovered.

We don’t need a wild-type protein. Why do so many humans have non-wild-type MYH7 alleles?


I’m going to have to request you don’t bring up that unrelated topic in this thread.


No, they estimated by a rather naive extrapolation from an abstraction of the degree of ruggedness(not actual map) of the fitness-landscape, that it would take 10^70 different variants of their arbitrary starting sequence, with 35 substitutions, to find a sequence with a fitness comparable to the wild type using only an adaptive walk. Which is why they think recombination was likely involved in discovering the wt fitness level, which they are totally right about for all the same reasons they state:

Recombination among neutral or surviving entities may suppress negative mutations and thus escape from mutation-selection-drift balance. Although the importance of recombination or DNA shuffling has been suggested [30], we did not include such mechanisms for the sake of simplicity

It should also be said here that the wt D2 domain (the one replaced by the arbitrary sequence) is related, that is homologous to the D1 domain. So the wild-type sequence actually evolved by duplication and divergence from an (initially) very similar protein domain, which of course strongly biases the sampling in sequence space to an area into the immediate surroundings of an already existing, highly fit protein domain.

They discovered a protein domain with the function in question simply by mutation and selection of an arbitrary protein, as that was the question they initially set out to address: Can an arbitrary sequence evolve towards acquiring a biological function? The answer is yes. As they state:

However, we achieved 240-fold improvement in phage infectivity through seven cycles of random mutagenesis on the replaced polypeptide and selection of the phage clone with the highest infectivity from a library of only about ten mutant phage clones in each generation. The evolvability of arbitrary chosen random sequence suggests that most positions at the bottom of the fitness landscape have routes toward higher fitness.

They then did work to discover how rugged the fitness landscape of the protein domain’s function is. They did not set up an experiment to prove, and never claimed, that an adaptive walk from their starting sequence can reach the wild-type fitness by adaptively walking there by single substitutions.

You’ve got that statement by the authors wrong EVERY time you’ve brought it up. Will it be another year, or two, and then I will have to repeat all this to you again?


Nope. Those are just genes, period. Nobody said they’re “de novo”. I suspect that a high proportion arise through duplication, some from domain switching, and such. Some will have arisen from non-coding sequences, which is perhaps what you mean. But by no means all. You have no clue.


It has occurred to me that by “de novo”, Bill actually means poofing into existence from nowhere. I would suggest that 0% of genes arise in that way.


Oh, and a thread devoted to Sals’ flower should probably start with the flower itself. This is the original publication frequently referenced (or image-mined, if you prefer) by Sal Cordova:

And here is the image in question, with my additions (the trees) showing that the distribution of gene presences and absences is most parsimoniously explained by evoking gain or loss on the standard tree:


What do you mean “strings are functional”. We are observing thousands of de novo genes that can build a functioning organisms in Sal’s flower.

Sal’s flower has thousands of de novo genes in each animal type.

See my response in the other thread.

Assuming that the figure posted by @John_Harshman is what you are calling ‘Sal’s flower’, it is not showing de novo genes, simply genes only annotated in one of the four taxa studied. De novo would imply that had no homology to any closely related taxa. In any event…

The existence of an annotation is not demonstration of functionality for the annotated region.


What exactly do you mean by “de novo”? What the figure records is the presence or absence of particular orthologs, based on annotation of the genomes. The annotation doubtless contains some noise, but I wouldn’t expect that to confound the general picture.

More significantly, do you know the difference between absence of orthologous protein-coding genes and absence of homologous sequences? Human pseudogenes are orthologous to protein-coding genes in other species; some human genes are paralogous to genes in other species. Both would be marked as “absent” in the human lineage.


Are they de novo genes? It doesn’t say they are in the paper. It says they don’t have detected orthologs among the four depicted species. That’s absolutely not the same as saying they’re de novo genes Bill.


Thats right I don’t have a clue as to their mechanistic origin. The quantity of “de novo” genes relative to Sal’s flower is a big challenge and brings strongly into play the Genesis 1 description that these “kinds” as described in the Bible were seeded on earth as described.

The @Winston_Ewert dependency graph hypothesis could be tested to understand the critical functions of these genes relative to the distinct “kinds” they belong to:

Fish or more specific kind.
Bird or more specific kind.
Mammals or more specific kind…Perhaps primate and rodent “kinds”

1 Like