Not in general, no, neither is the paper. A new gene arises when the fitness contribution from a locus passes a so-called genic threshold.
Gene birth model
We use the Wright-Fischer framework to model well-mixed populations of fixed size N , composed of asexually reproducing haploid individuals. For each individual i , we consider the fitness contribution Fi of a single locus in its genome. Here, fitness represents exponential growth rate, which is equivalent to the quantities considered in experiments that measured DFEs (e.g., Böndel et al . ( 2019 )). We describe a locus as genic if it consistently contributes a fitness advantage above a predetermined genic threshold.
It is thus not really a matter of the number of mutation or the size of the locus, but the total fitness effect of mutations that occur in that locus. This is an entirely reasonable simplification, since a locus can of course be very far from, or very near, to a functional gene, and thus could involve the accumulation of many small-effect, or just a single large-effect mutation to transform it into a beneficial gene.
Paramecium tetraurelia is a unicellular ciliate, which is a model organism for the study of many biological processes. The Paramecium genome has diploid micronuclei and highly polyploid macronuclei (exact number unknown), is approximately 87Mbp in size, and has gone through at least three successive whole genome duplications.
Can you do the math from there? Or do you need more?
Ahh I see now I misunderstood that table. The “model time-step” is not the mean time to birth of a new gene, but the mean time for occurrence of a beneficial mutation in a 100bp locus given the mutation rate and generation time. It’s not clear from the legend to the table what population size was used to get the numbers in column 3. It seems to me the mean time would depend a lot on population size.
Scatter plots comparing gene birth time in systems with static vs. fluctuating DFE.
For each point, the parameters for the static DFE and the initial parameters for fluctuating DFE are the same. The x-axis represents average time of gene birth across 100 replicate systems with static DFE, and the y-axis represents the average time of gene birth across 10 replicate systems when the DFE fluctuates, for (A) N = 100, (B) N = 1,000, and (C) N = ∞. The grey points that accumulate at x = 2,500 represent parameter values at which gene birth did not occur in static DFE systems. See also Fig4–Figure Supplement1,2,3.
As is logical the time to gene birth is heavily dependent on population size. This is where the model only considering a single 100bp locus becomes critical to understand what the rate of gene-birth would be for a realistic population size with hundreds of thousands to millions of times more non-coding DNA than just a single 100bp locus.
You need to understand that the model works with the fitness contribution of a single 100bp locus, which is taken as a proxy for function. Once the fitness gain passes the genic threshold, that is taken to be the birth of a new gene. The probability of function is necessarily subsumed by the fitness effect.
It would make no sense to include any of the numbers from Art’s pandasthumb article, as these are estimates for the probability of a particular fold with a particular function, rather than just the probability of any fitness-contributing functional gene. The latter is what de novo gene evolution from non-coding DNA would be all about, not trying to hit particular sequence-structure-function targets.
Across all population sizes, gene birth is more likely when the frequency f and average size p of beneficial mutations are higher, the size of deleterious mutations n is lower, and the DFE is more long-tailed (Fig3–Figure Supplement3).
If you are a far distance from a functional sequence the mutations are very unlikely to be beneficial. A search is required to find beneficial function and now wait times become too long.
Sure, if. But why should we assume we are far from a functional sequence? As this shows, we are actually much closer to folding protein sequences in non-coding DNA than you would naively suspect:
As we also know from studies such as the ones I cite in this post, simply obtaining a folding protein sequence means it is likely to have a useful biological function.
I’m sorry to have to tell you Bill, but this premise you can’t seem to let go of, that function is rare in sequence space, is just wrong. There is no support for it. Once you remove the strange and unrealistic requirement for a specific target sequence or target structure with a specific target function, but allow for any fitness improving function, functions are ubiquitous and unavoidable.
This particular example he conducted is falsified, that doesn’t mean he (or others) could not go on testing for Dependency Graph patterns in other data. My understanding is there is also a methodological problem with this interpretation on phylogenetic data, which means it might not be a very good test, but that is not my concern. I maintain that Ewert is testing a valid scientific hypothesis.
Oh no you don’t! There are threads about this on multiple sites, here, TSZ, and Panda’s Thumb that I am aware of. I’m confident you were part of some of those discussions, and are well aware of the criticisms. Please don’t be disingenuous.
So… the model clearly shows how, given realistic parameters, de novo genes with useful functionality arise from non-coding stretches of DNA.
Your reply is to repeat the same assertion you were throwing around before we discussed the paper. It seems like we’ve made exactly zero progress, which is of course frustrating to all parties concerned.
It seems you have still not understood the model. Do you have a background in stochastic mathematics, Bill? I’m aware of the fact that this math is well beyond basic algebra and geometry, but it shouldn’t be beyond the capabilities of someone who has taken a few semesters of math at the university level.
Of course, not everyone has that sort of background, and maybe you are one of them? It’s not a badge of shame to not have that math background, of course. It just means that we would need to adjust expectations on both sides of the conversation to make sure we can interact productively.
The observed pattern is expected under the evolution model (something to do with deletions???). If you want to say that design looks exactly like evolution, then OK, it’s not falsified - but you would have conceded that evolution IS the designer. It may also be a useless test, again not falsified, but still useless.
I have stated before, and will repeat now: Ewert should refine his methods, test more data, and validate the results. Even if his test were clearly positive for ID, no one would accept it without repeating the experiment.