I’ll just copy-paste the relevant parts of my previous response to @bjmiller’s invoking the Tokuriki and Tawfik paper (Bershtein et al 2006) paper here:
No, they don’t. One of them (Bershtein et al 2006) was deliberately set up to exclude several well-characterized mechanisms of evolutionary change in order to better understand, in isolation, the consequences of a single mechanism of change in the absense of the effects of the others. It only allowed the effects of mutations within the reading frame of the protein. Potentially compensatory chromosomal mutations were avoided by deliberately only mutating the plasmid genes with PCR, and then transforming competent cells to measure the fitness effects of those mutations.
The TEM-1 gene was cloned into a plasmid (as it occurs in nature) under its endogenous promoter. Recloning after each round of mutagenesis confined the mutational drift to the open reading frame of TEM-1. Our in vitro random mutagenesis protocol was optimized for high reproducibility and was calibrated to obtain, on average, two mutations per gene per round of mutagenesis. We maintained three populations of randomly drifting TEM-1 genes: one population under no selection (Lib0), and the rest under purifying selection at ‘high’ and ‘low’ stringencies. Each population, or plasmid library, was separately mutated, ligated into an empty vector and transformed into E. coli host cells; it then underwent purifying selection: ‘high’ selection pressure (250 mg ml21 ampicillin; Lib250; Supplementary Fig. 2), and ‘low’ selection pressure (12.5 mg ml21 ampicillin; Lib12.5). After growth on selection plates, plasmid DNA was extracted from the surviving E. coli colonies, and the TEM-1 genes were subjected to the next round of mutagenesis. Altogether, ten successive rounds of mutagenesis and purifying selection were performed. Loss of diversity was less than 50% per round, and a diversity of at least 10^6 variants per library was maintained throughout.
As expected, a rapid fitness decline was observed in Lib0 (no selection). The fitness of the selected populations (Lib12.5 and Lib250) remained unchanged under the threshold of selection, and decreased above that threshold (Supplementary Fig. 3).
This figure is from supplementary materials of Bershtein et al 2006.
This completely rules out the possibility of compensatory duplications, other forms of regulation of gene dosage, compensatory chromosomal mutations, and so on.
And even then, it is noteworthy that the aspect of the protocol that involved purifying selection was still able to maintain structural integrity of the protein against the prevalence of deleterious mutations.
Fig. 3. The fitness ‘landscape’ of the TEM-1 gene.
The fitness dynamics of the different TEM-1 libraries is presented as a function of mutational input. The average fitness (W) of a given population was defined as the fraction of β-lactamase variants that confer resistance at a given concentration of ampicillin (see Methods). Wild-type TEM-1 exhibited W=1 for all ampicillin concentrations ≤ 2500 µg/ml. All fitness measurements are detailed in Supplementary Table 1.The rapid fitness decline of the unselected library Lib0 is shown at 12.5 μg/ml of ampicillin (○). The fitness of the libraries subjected to purifying selection remained unchanged at concentrations under the applied selection thresholds, as exemplified here by Lib12.5 at 50 μg/ml ampicillin (∆), and Lib250 at 500 μg/ml (F). At concentrations exceeding the selection thresholds, constant decreases in fitness were observed, exemplified by Lib12.5 at 500 μg/ml ampicillin (◊). Note that the impact of ampicillin is much higher on freshly transformed cells (as in the purifying selections) than on ongrowing, replicated colonies (as in the fitness measurements). Thus, the threshold ampicillin concentration for the fitness measurements was found to be ≤100μg/ml for Lib12.5 (selected with freshly transformed cells at 12.5 μg/ml ampicillin), and ≤1000 μg/ml for Lib250 (selected at 250 μg/ml).
The other paper you cited (Lundin et al 2018) explored the fitness effects of mutations and found, completely unsurprisingly that most mutations are deleterious. They didn’t find anything which supports the view that protein evolution can only go downhill as mutations accumulate. Their protocol did not even include a lineage evolving under purifying selection. All mutations were created directly in DNA by PCR and then inserted in the bacterial chromosome and their fitness effects were tested. When the effects of multiple mutations in combination were tested, it was again the in absence of purifying selection.