The question is what is a realistic number for a new gene from a random string to emerge fixed in a population. When you think through the detail it may be improbable in a vertebrate population no matter what assumptions you make. You have a long waiting time issue when coordinated mutations are involved.
Please show your work.
Human intuition can be a poor judge of probabilities. For example, how many people do you think you need before you have a 50% chance of two of those people sharing a birthday?
Can you cite one example of a âcoordinated mutationâ in vertebrates?
That depends on what fraction of strings are functional. As it is evidently reasonably highâŚ
No, it doesnât need to be fixed in the population.
May be. Isnât.
What do you mean âstrings are functionalâ. We are observing thousands of de novo genes that can build a functioning organisms in Salâs flower.
Salâs flower has thousands of de novo genes in each animal type.
Are humans and mice different types?
Hi WD
Other then getting in a rhetorical discussion about coordinated mutations it can take lots of searches the find function in a protein coding gene. The amount depends on the specific function but it can vary from numbers as small as 10^8 for a smaller protein to estimates well above the evolutionary resources of vertebrates.
Hayashi paper fitness landscape.pdf (396.6 KB)
In this case Hayashi estimated 10^70 searches to find a wild type protein. This is a very difficult problem for de novo gene formation in Salâs flower. The only response to this is to claim that function is high in sequence space where function is not defined. The data does not support this type of claim as there are many proteins that are highly conserved over millions of years.
The origin of de novo genes and âlost and foundâ genes in Salâs flower creates a large problem for the claim that common descent explains the pattern.
Trying to model a process that can account for de novo genes is one of most difficult problems in evolution and may not be solvable unless a deterministic process is discovered.
We donât need a wild-type protein. Why do so many humans have non-wild-type MYH7 alleles?
Iâm going to have to request you donât bring up that unrelated topic in this thread.
No, they estimated by a rather naive extrapolation from an abstraction of the degree of ruggedness(not actual map) of the fitness-landscape, that it would take 10^70 different variants of their arbitrary starting sequence, with 35 substitutions, to find a sequence with a fitness comparable to the wild type using only an adaptive walk. Which is why they think recombination was likely involved in discovering the wt fitness level, which they are totally right about for all the same reasons they state:
Recombination among neutral or surviving entities may suppress negative mutations and thus escape from mutation-selection-drift balance. Although the importance of recombination or DNA shuffling has been suggested [30], we did not include such mechanisms for the sake of simplicity
It should also be said here that the wt D2 domain (the one replaced by the arbitrary sequence) is related, that is homologous to the D1 domain. So the wild-type sequence actually evolved by duplication and divergence from an (initially) very similar protein domain, which of course strongly biases the sampling in sequence space to an area into the immediate surroundings of an already existing, highly fit protein domain.
They discovered a protein domain with the function in question simply by mutation and selection of an arbitrary protein, as that was the question they initially set out to address: Can an arbitrary sequence evolve towards acquiring a biological function? The answer is yes. As they state:
However, we achieved 240-fold improvement in phage infectivity through seven cycles of random mutagenesis on the replaced polypeptide and selection of the phage clone with the highest infectivity from a library of only about ten mutant phage clones in each generation. The evolvability of arbitrary chosen random sequence suggests that most positions at the bottom of the fitness landscape have routes toward higher fitness.
They then did work to discover how rugged the fitness landscape of the protein domainâs function is. They did not set up an experiment to prove, and never claimed, that an adaptive walk from their starting sequence can reach the wild-type fitness by adaptively walking there by single substitutions.
Youâve got that statement by the authors wrong EVERY time youâve brought it up. Will it be another year, or two, and then I will have to repeat all this to you again?
Nope. Those are just genes, period. Nobody said theyâre âde novoâ. I suspect that a high proportion arise through duplication, some from domain switching, and such. Some will have arisen from non-coding sequences, which is perhaps what you mean. But by no means all. You have no clue.
It has occurred to me that by âde novoâ, Bill actually means poofing into existence from nowhere. I would suggest that 0% of genes arise in that way.
Oh, and a thread devoted to Salsâ flower should probably start with the flower itself. This is the original publication frequently referenced (or image-mined, if you prefer) by Sal Cordova:
And here is the image in question, with my additions (the trees) showing that the distribution of gene presences and absences is most parsimoniously explained by evoking gain or loss on the standard tree:
What do you mean âstrings are functionalâ. We are observing thousands of de novo genes that can build a functioning organisms in Salâs flower.
Salâs flower has thousands of de novo genes in each animal type.
See my response in the other thread.
Assuming that the figure posted by @John_Harshman is what you are calling âSalâs flowerâ, it is not showing de novo genes, simply genes only annotated in one of the four taxa studied. De novo would imply that had no homology to any closely related taxa. In any eventâŚ
The existence of an annotation is not demonstration of functionality for the annotated region.
What exactly do you mean by âde novoâ? What the figure records is the presence or absence of particular orthologs, based on annotation of the genomes. The annotation doubtless contains some noise, but I wouldnât expect that to confound the general picture.
More significantly, do you know the difference between absence of orthologous protein-coding genes and absence of homologous sequences? Human pseudogenes are orthologous to protein-coding genes in other species; some human genes are paralogous to genes in other species. Both would be marked as âabsentâ in the human lineage.
Are they de novo genes? It doesnât say they are in the paper. It says they donât have detected orthologs among the four depicted species. Thatâs absolutely not the same as saying theyâre de novo genes Bill.
Thats right I donât have a clue as to their mechanistic origin. The quantity of âde novoâ genes relative to Salâs flower is a big challenge and brings strongly into play the Genesis 1 description that these âkindsâ as described in the Bible were seeded on earth as described.
The @Winston_Ewert dependency graph hypothesis could be tested to understand the critical functions of these genes relative to the distinct âkindsâ they belong to:
Fish or more specific kind.
Bird or more specific kind.
Mammals or more specific kindâŚPerhaps primate and rodent âkindsâ