Mendel's Accountant

Written in “Go”? Seriously?
Hell, I’d code it in VBA just too piss off IT departments… But to code it today for public review?

2 Likes

For what it’s worth, I’m not sure that the Go version of Mendel’s Accountant is the same as the Fortran version. It looks like it started out as a port, but it’s had some new features and further development work done on it since then, with the latest release being on 5 May 2019. It doesn’t seem to be linked from the Mendel’s Accountant website nor the SourceForge download page, whose latest version is dated 2016. The Go version isn’t developed by Sanford et al but by two other developers whose relationship to the original developers I haven’t been able to ascertain.

5 Likes

ChangeLog::

revision 1.24
date: 2006-05-02 07:29:49 +0900; author: johnrb; state: Exp; lines: +35 -21;
Fixed line that prevented cases with one linkage block from running.
Added new input parameter ‘tracking_threshold’, mainly to allow user
specification of the width of the fitness bins in the output plot of
the fitness distribution.

init.f90 ::

if(track_neutrals) tracking_threshold = 0.
! If back mutations are allowed, set the tracking threshold to zero
! so that all mutations are tracked.

if(genome_size == 0) genome_size = 1.e8
! When tracking_threshold is input as zero, this means that all
! mutations are to be tracked. In this case, set tracking_threshold
! to be equal to the minimum fitness value to prevent various
! numerical overflow problems that would arise otherwise.
tracking_threshold = max(1./genome_size,tracking_threshold)

This code is an unbelievable mess.

3 Likes

Yeah. I was trying to understand what was going on–but it’s spaghetti code, either by design or simply by poor version control management. This is the first time I looked at the code for MA–didn’t even know it was opensource. Some of the default parameters look quite strange to me from a biological perspective:

#inputs.f90
max_fav_fitness_gain = 0.01
fitness_dependent_fertility = .false.
max_fav_mutn_per_indiv = 10000

#medel.in
&computation
tracking_threshold = 0.0
extinction_threshold = 0.0
max_del_mutn_per_indiv = 10000
max_neu_mutn_per_indiv = 100
max_fav_mutn_per_indiv = 100
random_number_seed = 42
reseed_rng = F
write_dump = F
write_vcf = F
restart_case = F
restart_dump_number = 0
plot_allele_gens = 25
global_allele_analysis = T
verbosity = 1
data_file_path = ‘./’
/

It looks like there are just flat limits to the number of beneficial and neutral mutations even possible? It’s hard to tell from the variable naming, but I’m very curious about this:

fitness_dependent_fertility = .false.

It seems this block of code will run if mutational tracking is enabled and the number of offspring with favorable mutations exceeds half the hard limit–which looks like defaults to either 10000 or 100?

Basically, if any two offspring in the simulation could mate to produce an F1 with greater than the parameter max_fav_mutn_per_indiv, the program stops. lol, seems legit?

[Edit]: Just to add, this same if-then evaluation also exists for deleterious mutations, but it does seem the value max_del_mutn_per_indiv is set to much higher.

mating.f90

! Track this mutation if its fitness effect exceeds the value
! of tracking_threshold or if track_neutrals is true.

if(fitness_effect>tracking_threshold .or. track_neutrals) then

 if(mutn_type == fav) then
    
    ! Test to see if the storage limit of array fmutn_offsprng 
    ! has been exceeded.  (Note that we are using the first  
    ! slot to hold the actual mutation count.)
    
    num_mutn = fmutn_offsprng(1,hap_id) + 1 
    
    if(num_mutn + 1 > max_fav_mutn_per_indiv/2) then
       write(6,*) 'Favorable mutation count exceeds limit'
       write(9,*) 'Favorable mutation count exceeds limit'
       stop
    end if
1 Like

I think it would be really interesting to replicate some of their papers in SLiM, to see if your get the same results. And then to track down the specific places that cause the difference.

It is too Sanford’s credit that they make the code open source.

1 Like

What does stopping do though? Is that a failure or is it a success?

It’s just a value that is set in the program. I’m not sure it has any simulation value–other than limiting the number of generations/iterations performed? I see it defined with two different values in two different locations. It may be user-defined. Would be good to ask others who have played with the program before @CrisprCAS9 @Sweary_Biochemist @Rumraket

#inputs.f90
max_fav_mutn_per_indiv = 10000

#medel.in
&computation
tracking_threshold = 0.0
extinction_threshold = 0.0
max_del_mutn_per_indiv = 10000
max_neu_mutn_per_indiv = 100
max_fav_mutn_per_indiv = 100

I have no idea.

I’m not sure “failure or success” can really be applied to the sim, beyond its desperation to prove mutational degradation.
Normally the sim runs for a predefined number of generations (10000, I think), under the understanding that trends in fitness should be discernable by that point, and that tracking the spread of mutations becomes increasingly memory taxing as generations pass (I cannot confirm the validity of this claim), but in the case of “favourable mutations exceeds limit” it literally just halts. It can do this as early as 170 generations in my hands, and that was, I think, with a 50:50 ratio of favourable:deleterious (which resulted in a mild decrease in fitness at the time of sim termination).

In its current iteration, it seems literally hard-coded to prevent fitness increases.

3 Likes

I suspect that’s due to the maximum limits in fitness per mutation and the maximum limit of total fitness gain. Both parameters can be tweaked somewhat. However, playing with MA yesterday–the weird version that displays everything in Chrome–it did seem to hard limit the maximum total fitness parameter. Or, I just couldn’t figure out how to adjust it above 0.01.

Maybe there is a reason for that, it’s just not clear to me why the limit for favorable fitness is set much lower than the limit for deleterious fitness if this is a computational resource concern.

2 Likes

The answer is MA was deliberately written to support the YEC “Genetic Entropy” claims. It does not model actual evolution. It has been rigged from the start to produce only YEC favorable results which is why the constant use of MA as support for GE is so disingenuous.

6 Likes

“All models are wrong, but some are useful” - (credited to) George Box

Of course, ‘useful’ is defined in terms of one’s purposes:

For modeling life on Earth, the ‘Mendel’s Accountant’ software is, if anything, ‘anti-useful’.

For supporting a fringe theory lacking support from actual science, it seems to be useful enough.

4 Likes

Sure–even if we didn’t ascribe intentional deception to the programmers (maybe they are just exceptionally bad with numbers/biology), MA doesn’t appear capable of reproducing wetlab experimental results. It is effectively useless as a simulation tool in that way.

I think the goal with looking at the code is to explicitly identify where the program is designed to fail.

5 Likes

So, I started fiddling with SLIM and it’s a really great program. Should give a straightforward way to test for GE with known parameters.

Of note, MA can only do very small populations. The larger the population, the more effective is selection. So the fact that SLIM can work on much larger populations is important.

8 Likes

Definitely makes it important to use similar values then.

I want to try to simulate the LTEE using the hypermutator mutation rate (0,1 pr individual/generation), which in the LTEE shows fitness increase:

Sanford uses the three different mutation rates in his MA run as we can see on the figure below, where he first lets it run with the normal E coli mutation rate before artificially changing it to “simulate” the evolution of higher mutation rates in the LTEE.

So I’m going to run MA version 2.5 to reproduce the LTEE (where the hypermutators show continous fitness increases too). I just plugged in the numbers from the legend to this figure:

Which is the only place in that .pdf where Sanford defines any of the values he use to simulate the LTEE.

I tried this earlier with the real mutation rate of E coli. If you set MA to the real mutation rate of E coli (~10-3 pr individual/generation), effectively nothing happens because this mutation rate is so low it takes approximately 10 000 generations to even accumulate 5 mutations in MA. As we can also see in his figure, with such a low mutation rate the number of mutations almost doesn’t change.

Now notice what Sanford claims in the legend to that figure. He says:
“Given the LTEE time frame, by now this particular population should have accumulated roughly 2000 slightly deleterious mutations, even while fitness scores have been increasing due to a handful of beneficial reductive mutations”

Doesn’t that leave the impression that this is what happens in his MA simulation? That fitness increases there too? That is what that reads to me like it does.

Okay, so these are the settings from the legend to the figure that I am using:
Generations = 60 000
Population = 10 000
Overall mutation rate = 0.1(if this is set to 0.001 almost nothing changes as too few mutations accumulate to show any noticeable fitness differences even over 60 000 generations. though it does decrease a tiny tiny bit in fitness)
Deleterious mutation rate = 0.001 (the lowest deleterious mutation rate of those depicted, therefore most generous to Sanford)
Genome size= 4.5 Mbp
Major mutation(the fraction of deleterious mutations defined as those with major effect)=0,1
Clonal reproduction.

Everything not explicitly defined by Sanford above in the figure I have let be at default values:

&basic
case_id = 'k0ea39',
mutn_rate = 0.1,
frac_fav_mutn = 0.001,
reproductive_rate = 2.0,
pop_size = 10000,
num_generations = 8000,
/

&mutations
fitness_distrib_type = 1,
fraction_neutral = 0,
genome_size = 4.5E+06,
high_impact_mutn_fraction = 0.1,
high_impact_mutn_threshold = 0.1,
num_initial_fav_mutn = 0,
max_fav_fitness_gain = 0.01,
uniform_fitness_effect_del = 0.001,
uniform_fitness_effect_fav = 0.0001,
fraction_recessive = 0.0,
recessive_hetero_expression = 0.5,
dominant_hetero_expression = 0.5,
multiplicative_weighting = 0.0,
synergistic_epistasis = F,
se_nonlinked_scaling = 0.1,
se_linked_scaling = 0.0,
upload_mutations = F,
allow_back_mutn = F,
polygenic_beneficials = F,
polygenic_init = 'AAAAAA',
polygenic_target = 'TCGTCG',
polygenic_effect = 0.001,
/

&selection
fraction_random_death = 0.0,
heritability = 0.2,
non_scaling_noise = 0.05,
fitness_dependent_fertility = F,
selection_scheme = 2,
partial_truncation_value = 0.5,
/

&population
recombination_model = 1,
fraction_self_fertilization = 0.0,
num_contrasting_alleles = 0,
max_total_fitness_increase = 0,
dynamic_linkage = T,
haploid_chromosome_number = 23,
num_linkage_subunits = 1000,
pop_growth_model = 0,
pop_growth_rate = 1.0,
bottleneck_yes = F,
bottleneck_generation = 0,
bottleneck_pop_size = 0,
num_bottleneck_generations = 0,
/

&substructure
is_parallel = F,
homogenous_tribes = T,
num_indiv_exchanged = 1,
migration_generations = 0,
migration_model = 1,
tribal_competition = F,
tribal_fission = F,
tc_scaling_factor = 0.0,
group_heritability = 0.2,
social_bonus_factor = 1.0,
/

&computation
tracking_threshold = 0.00001,
extinction_threshold = 0.0,
max_del_mutn_per_indiv = 1598.4,
max_neu_mutn_per_indiv = 1000,
max_fav_mutn_per_indiv = 1000,
random_number_seed = 42,
reseed_rng = F,
track_neutrals = F,
write_dump = F,
restart_case = F,
restart_dump_number = 0,
plot_allele_gens = 100,
verbosity = 1,
data_file_path = './',
/

&interface
num_tribes = 2,
restart_case_id = 'test00',
restart_append = F,
auto_malloc = T,
/

I’ll update this post with the plot of fitness when it’s done running.

Okay so it seems like the simulation is running slower and slower. It’s reached about 17500 generations by now and it seems like the latest 3000 generations took as much time as the first 14500. Here’s a plot of fitness so far:


I’m sure that any second now fitness will suddenly turn around and start looking like in the LTEE.

Apparently 858 deleterious and 1 beneficial mutations have fixed so far.

del mutn/indiv =  1036  tracked del/ind =  1036  fav mutn/indiv   =    1.0265

              Fraction of mutations retained versus fitness effect
 effect:  5.E-01  1.E-01  2.E-02  5.E-03  1.E-03  2.E-04  5.E-05  1.E-05  2.E-06  5.E-07
 domint: 0.0002 0.0011 0.0066 0.1299 0.6411 0.8769 0.9795 1.0676 1.0069 1.0998
deleterious selection threshold   = 1.511E-03
deleterious fraction unselectable = 0.647

            Allele summary statistics (tracked mutations only):
    (Statistics are based on     10364539 tracked deleterious mutations
                                    10265 tracked   favorable mutations
                         and            0 tracked     neutral mutations.)
     Very rare   Polymorphic     Fixed      Total
       (0-1%)      (1-99%)      (100%)
        11266        1637         858       13761 deleterious
           10           1           1          12 favorable
            0           0           0           0 neutral

generation =     17000  population size = 10000  frac recessive = 0.0000
before sel: geno fitness =  0.86331  geno s.d. = 0.03712  pheno s.d. = 0.09609
after  sel:                 0.86448              0.03462               0.09475
before sel geno-pheno corr = 0.3914    after sel geno-pheno corr  =    0.3729
del mutn/indiv =  1115  tracked del/ind =  1115  fav mutn/indiv   =    1.0695

It doesn’t look to me like those supposedly “rare beneficials of large effect” are “masking” the fitness decline in Sanford’s MA as it’s supposedly doing in the LTEE.

11 Likes

I am still very curious why we need variables for max_favorable_fitness gain and why the computation variables are set by the type of mutation…

I don’t know how to use SliM(I only have windows), is anyone trying to replicate Sanford’s MA LTEE “simulation” with similar values?

2 Likes

Yes, I looked into it a bit.

Seems like if you have 1000:1 deleterious to beneficial mutations below the cutoff, you do see declines in fitness, and this is not rectified by selection.

However, the cutoff is moveable, because it is inversely related to population size. By increasing the population size, mutations that were not selectable will become selectable. If their fitness is, for example, -0.001, a population size of 2000 puts those deleterious mutations in the selectable range. So, with a larger population size, the same set of mutation fitness and rates do not result in fitness declines. That’s true even if deleterious to beneficial mutations are implausibly at the 1000:1 ratio.

This means that GA’s very small population size limits are critically important. Just by increasing population size to reasonable amounts, GE goes away.

This also means that LTEE gives a good way to test GE, as @Rumraket was trying to show. Basically, there should be fitness decreases if GE is true, but there are fitness increases. That seems to be a direct test of GE, that GA cannot actually recapitulate, but SLIM can.

8 Likes

This has clarified for me a fairly large conceptual gap in GE. The way he defines “unselectable” is fundamentally flawed. There is no sharp line under which fitness changes are unselectable, because the location of the line ultimately depends on population size. So many of the observations of GE in GA will just go evaporate by using a population larger than 1000. The reason why is just because the threshold line moves!

Selectable vs. unelectable is not statically related to fitness.

To be clear here, I don’t think GA was artificially limited to small population sizes. It is hard to do large population forward simulations. SLIM has been under active development by a large team, and they built a very powerful forward simulator that can do gigantic population sizes. That was not easy.

2 Likes