Evidence of a synthetic origin of sars-cov-2

More interesting than my own take, the one from Alex Wasburne, one of the author of the preprint. It’s really a must read, IMHO. Even probably a vital read.

I don’t see any good response to the reality that the same restriction sites are found in basically all the other related coronaviruses, and that other closely related viruses score even higher on their supposed metric of designedness.

3 Likes

I don’t see that Washburne addressed any of the substantive criticisms of his paper.

For example (from the first twitter discussion mentioned above):

Oct 21
OK- so it’s (a) cherry picking of (b) a signal that doesn’t scream ‘engineering’ that (c) isn’t even more extreme than some other natural viruses. And now this is the point where we note every single one of the sites [or lack of] also occurs in a natural coronavirus as well.

The implication upon further review is that, by Washburne’s criteria, all coronaviruses are engineered. Do you think that is the case, @Giltil, after reviewing some of the criticisms (and not Washburne’s non-reply)?

I recommend that you read the linked discussion, especially the bits about recombination. And, when Washburne posts an authentic response to this, be sure to let us know.

4 Likes

I am a bit late to pick up on the conversation, so I will go over what was said here.

It depends on how you define “pseudoscience”.

While there is a substantial debate on the demarcation problem of what separates science from non-science, for the purpose of this conversation I would say that explanations which are testable and falsifiable would count as science sensu lato.

Of course, there are many ideas that would qualify under this lenient criteria. It includes things that range from speculations without any support to ideas that have already been thoroughly falsified, which could be referred to as ‘fringe science’ at best or ‘bad/defunct science’ at worst respectively.

If you define ‘pseudoscience’ as ‘unfalsifiable’, then this prepint paper may not be technically pseudoscience. However, ‘pseudoscience’ is also often used to refer to hypotheses that - while being testable and falsifiable - are perpetuated without any supporting evidence and/or in spite of evidence to the contrary. This is probably the sense of ‘pseudoscience’ @dsterncardinale used here. And after reading the paper and the responses, I think he is right. The conclusions made in the preprint is completely unsupported by the data.

Reviewing the preprint

1. What does the preprint say?

The authors of the paper conclude that the SARS-CoV-2 genome contains ‘fingerprints of genomic manipulation in the form of restriction enzyme (RE) sites. For those who aren’t familiar, REs are endonucleases that cut DNA in a specific site by binding to a recognition site (RS) that is 4-8 bp long. These enzymes are found in bacteria and archaea as part of their defense mechanism against viruses by cutting and disabling the viral DNA. Each RE has it’s unique RS. For example, the RS of EcoRI is the following:
image
(top is 5’ to 3’)

REs that cut DNA like this in a “zig-zag” leave DNA overhangs, also called “sticky ends” because DNA fragments with compatable overhangs can pair up and be ‘glued’ together by a DNA ligase. This allows us to cut and paste different fragments together like this:

The authors claim to show that the recognition sites of REs (specifically Bsal and BsmBI) are distributed in a specific way that is very reminiscent to how scientists would artificially assemble a genome. Specifically, the scientists genetically engineer the viral genome by removing/adding RS by means of inducing silent/synonymous mutations which don’t significantly affecting fitness. RS are added/removed in such a way that the RS are regularly spaced out such that you have a minimal number of fragments that are not too long or too short. The specific claims is that:

  1. The synonymous mutations that added/removed BsaI/BsmBI RS in SARS-CoV-2 compared to wild coronaviruses “is extremely unlikely to have arisen by random evolution”.
  2. The distribution of the BsaI/BsmBI restriction sites “is anomalous for a wild coronavirus and more likely to have originated from an infectious clone designed as an efficient reverse genetics system.”

2. Why the conclusion is unwarranted.

We must first make clear that RS in viruses are not unusual at all. This is exactly why prokaryotes have REs to defend themselves against viruses. This figure below shows the RS located within several coronavirus genomes (credit to Alex Crits-Christoph):

Note that the SARS-CoV-2 (on the top) does not stand out whatsoever. Now, the authors focus on the RS of two specific REs (BsaI and BsmBI). Here are the RS for these two REs (credit to Alex Crits-Christoph):

Again, all these RS are observed among wild coronaviruses. But the authors specifically claim that the distribution of these sites in SARS-CoV-2 is out of the ordinary. Specifically, they took 72 coronavirus genomes, and digested (in silico) each genome 1000 times, each digestion was done with a random pair of REs drawn from a set of 214 REs. From each digestion, they took the length of largest fragment, and expressed it as a percentage of the total length of the genome. From this, they produced a null distribution (grey data points and box plots below). Then they added the data points for the maximum fragment length produced by the REs BsaI+BsmBI after digesting the genomes of engineered coronaviruses and SARS-CoV-2 using (colored points) They note that the largest fragment of SARS-CoV-2 is a significant outliers to this normal distribution, similar to engineered coronaviruses.

This is really weird to me.
First: Comparing 2 different things. Why would they look at the RS pattern of SARS-CoV-2 using two specific REs; and compare that to the RS pattern of wild coronavirus using RANDOMLY SELECTED pairs of REs? The benefit of using the 1000 randomly selected RE pairs is that they could get the huge number of data to construct that grey box plot null distribution. However, I fail to see how this RS pattern of random RE pairs is comparable to the RS pattern of BsaI+BsmBI. The fact that the latter is an outlier with respect to the former could just as easily point to the conclusion that the RS pattern of BsaI+BsmBI within coronavirus genomes (artificial or not) is not typical compared to that of random RE pairs. In other words, their null distribution is not a proper NULL. If this is true, than the RS pattern of BsaI+BsmBI in wild coronavirus genomes would similarly deviate from random RE pairs. However, they didn’t include these data points in the figure.

Secondly: The largest fragment. The authors note that evenly spaced RS is a signature of engineered Coronaviruses. Taking this assumption for granted, I would thus expect them to analyze the lengths of EACH fragment and see if the fragment lengths are about the same, corresponding to them being evenly spaced, but that’s NOT what they did. As noted before, they only looked at the length of the largest fragment. As explained by people in the comments under the preprint, while the largest fragment length is reduced when RS are evenly spaced, the statistics of maximum length is not robust since it has a high variance, which leads to a high false positive rate; i.e. the longest fragment being short often does not equate to evenly distributed RS. And why look at the longest fragment? Why not also the shortest? RS that are very close together are not desirable for genome assembly since that would be redundant. Remember! We want evenly spaced RS. So let’s look at the restriction map of SARS-CoV-2 and see what the shortest fragment is:


Oops. It’s a very small 643 base pair fragment. NOT what you would expect if one had designed the genome with evenly spaced out RS.

Third: The kill shot. The BsaI+BsmBI enzymes belong to a specific class of REs called type IIS, which cut the DNA at a certain distance away from the RS, like this:
image
The benefit of this are two fold. With this you can create unique sticky ends, such that fragments will be stuck together in a specific order. This also allows for simultaneous digestion and ligation, because the RS are removed and not present in the ligated product, which means the end product cannot be digested again. This is very useful in assembling large genomes. Just to make it clear, the RS are removed during this process. Thus, if this method was used to assemble the SARS-CoV-2 genome, then you wouldn’t expect the genome to contain these recognition sites anymore. What is funny is that the authors of the preprint specifically cited this study where researchers used exactly this method. As you can see, the final product would not have these RS anymore (The following is Figure S9 from the study).


The RS (which are colored red) would not be there after the sticky ends (denoted by red lines) are ligated.

Now, people have responded to this by pointing out that one COULD in theory design these RS in a way such that the RS will be present in the assembled genome. However, (1) there is no reason why one would bother with that. You wouldn’t be able to benefit from the simultaneous digestion and ligation. The scenario that the authors of preprint propose for how SARS-CoV-2 was constructed is way more complicated than necessary, as explained by Friedemann Weber.
And (2) in order for the RS to be retained in the final product, the RS have to be oriented in a specific manner, but they aren’t as pointed out by Santiago Sanchez. Again, this pretty much rules out the biochemical possibility that these RS and the corresponding REs were used to artificially assemble the genome.

This should already be enough, but ere are a whole lot more details that I have seen. This paper is just a mess. Just quickly, regarding the mutations, they analyzed the mutations involved in the gain/loss of BsaI/BsmBI recognition sites within SARS-CoV-2 with respect to the wild relative. At minimum, to get these RS from a wild relative, one would need to have 5 specific mutations. These were single synonymous mutations, and they simulated the mutations in silico to show that the odds for obtaining these 5 mutations is very low. However, RNA viruses are subject to strict purifying selection, so synonymous mutations are expected to occur at far greater rates (comment by Jackson Emanuel under the preprint). Furthermore, mutations at these particular sites that cause gain/loss of the RS are common among coronaviruses, pointed out by Kristian G. Andersen, Zhihua Chen, and Alex Crits-Christoph.

EDIT: They also made the mistake of estimating how many changes you need to induce to go from a relative (a cousin) to SARS-CoV-2. However, what they should do is estimate the changes needed to go from an ancestor. Scientists have reconstructed the likely ancestor (recCA) and there are only a ONE mutation needed in this ancestor to get all these RS that are seen in SARS-CoV-2.

In short, this preprint study is picking up a signal from random noise.

Read it. I am not impressed. The only relevant portion merely repeats what they already reported in the preprint. Not responding tot he specific criticism I have seen and (some) repeated here. nauseam

The majority of this piece is spend on flaunting credentials, expressing their frustration with the rejection by mainstream academia, the reason why he co-founded a platform to “improve science communication”, and espousing their strong convictions and motives that was behind the study.

This is just the typical reaction I have seen among those peddling crankery who don’t receive the respect THEY think they deserve.

9 Likes

To follow up on my comment, I looked into the comment section of the biorxiv submission and came across a interesting response (offered by a Jared Roach):

The distribution of random fragment lengths is a beta distribution. (Roach JC. Random subcloning. Genome Res. 1995 Dec;5(5):464-73. doi: 10.1101/gr.5.5.464. PMID: 8808467.) Maximum fragment length is not a robust statistic - it has high variance.

This means that any attempt to assign a statistical significance to Washburne et al.'s finding is flawed, even doomed (the statisticians following this are welcome to correct me). Basically, Washburne’s argument reduces to “this is what Washburne would have done, therefore artificially assembled!” (It looks that way to Washburne! Where have we heard that sort of logic?)

The distribution of the restriction sites is actually not suspicious, and the design he proposes to be optimal isn’t really as well. (Others in this thread are addressing this point.) The sad thing is that the far-right loony bin is already seizing on this incorrect assertion (and, @Giltil, there is no doubt that this study is an inane flight of fancy) and will undoubtedly follow with insidious threats (and worse) on the scientific community.

1 Like

Why did you title this as “evidence” when you only refer to what people write about the evidence? Have you looked at the evidence, or only the rhetoric?

All the issues still remain, which is why we are asking about your cognitive processing of the evidence, not theirs. They clearly cherry-picked restriction sites that looked anomalous, making their statistical argument circular, and their description of the process of making DNA constructs is from the early 1990s, completely ignoring more modern methods.

Gil, why don’t they acknowledge the existence of Gibson assembly?

And I’m being generous. This claim from the paper (Fig. 1) is just a lie:

Directed assembly of ~30kb CoV genomes requires several design considerations…

It doesn’t require them at all, as assembly does not require using restriction endonucleases at all.

Gil, I have more than 30 years experience with this stuff, and many other people here have similar experience.

  1. Why did you title this “evidence” when you won’t even discuss the evidence?
  2. Does Alex Washburne have ANY experience in making constructs that approaches ours?
1 Like

Here it is

https://twitter.com/WashburneAlex/status/1584931335453298688

Could you please explain exactly why you think it is “vital” that we read some kook’s self-aggrandizing propaganda? Because I’m not seeing any reasons myself.

4 Likes

Who are these “folk,” and what did they actually say?

Gil, you have people here who between them have more than a century of experience in assembling expression constructs. I think I can safely say that all of us can see how sophomoric Washburne’s preprint is.

How much experience does Washburne have, specifically?

1 Like

Well, as I stated above, the far-right insane asylum has picked up on this as another piece of evidence that COVID-19 is a man-made creation. And I fully expect the decibel level for imprisoning (or worse) the leaders of the health and scientific community to grow exponentially, using this inept preprint as support. It is important to publicize, immediately and with certainty, that Washburne is wrong, that there is precisely zero scientific credibility behind his conclusions.

To recapitulate - the restriction enzyme site profile Washburne describes is not particularly novel or unique (the response by Roach quoted above proves this), it doesn’t make sense when it comes to assembling an infectious viral clone (as @Mercer attests to, and as I, who also knows something about making infectious clones of RNA viruses, can confirm as well), and it doesn’t make any sense at all even if one were to go to the trouble of making a restriction digest-and-ligation-friendly system to synthesize and manipulate an RNA virus.

This isn’t going to sway the lunatic fringe, but answering each reply of Washburne’s with tens-hundreds of social media responses that give an honest and accurate accounting of this is unfortunately needed to reach the larger lay public.

We can start in our own cozy discussion board here, as it were.

3 Likes

That in no way addresses that point. It’s just “too soon to tell, more research needed” hand waving. The point is that they should’ve taken this into account from the get go.

They looked at the genomes of wild coronaviruses, determined the loci of the recognition sites for restriction enzymes BsmBI and BsaI. And then they simply counted the changes needed to add and/or remove these recognition sites to get the same recognition sites seen in SARS-CoV-2.

From this they propose the following scenarios for “How to make SARS2” by inducing single synonymous mutations to add/remove recognition sites.


Among their scenarios, the minimal number of mutations needed is 5, so they recon that the likelihood for these 5 mutations to occur naturally is rather remote, hence the odds for SARS2 to have naturally evolved from its close relatives is similarly remote. Hence why they conclude:

The type of mutations (synonymous or silent mutations) that differentiate the restriction sites in SARS-CoV-2 are characteristic of engineering, and the concentration of these silent mutations in the recognition sites is extremely unlikely to have arisen by random evolution.

However, the mistake is two-fold: 1) Nobody thinks SARS2 evolved from its relatives/cousins. The proper way to test the natural evolution hypothesis is to reconstruct the common ancestor of SARS2 and it’s closest relatives. 2) Not just that, since these viruses often undergo recombination, we must also take recombination into account to reconstruct the likely ancestor of SARS2. This has already been done, it’s called “recCA”.

So, how many mutation are required to go from recCA to SARS2 in terms of these recognition sites? JUST ONE (credit Kristian G. Andersen)


Top is SARS2 and bottom is recCA.

This is why you have to include recombination. As you can see, the genomic regions that possess/lack the exact recognition sites seen in SARS2 are already present among the wild relatives. Recombination allows for these regions to be inherited from many lineages without the need for more mutations! However, the authors wrongly assumed that according to a natural origin scenario SARS2 evolved from a single (cousin) lineage without recombination, which would require more mutations than what is actually needed.

In short: Regarding this point, the authors are only considering a straw man version for the natural origin of SARS-CoV-2 (evolving from a cousin lineage w/o recombination). Thus their conclusion that this natural evolution is extremely unlikely is erroneous.

5 Likes

Another response I have found:

4 Likes

Two well-written nuggets from it:

The analysis of a single, selectively chosen combination of two restriction enzymes (here: BsaI and BsmBI) is not suitable to prove human intervention. If one analyzes only one combination of two restriction enzymes suitable for a particular virus, it is predictable that this combination will be significantly less suitable for other virus isolates to construct a reverse genetics model.

The assumption of purely random mutations in a viral genome is not valid, since most mutations disrupt or destroy the amino acid sequence of the viral proteins and are thus under selective pressure. In addition, the authors would also have had to analyze all acceptable combinations of restriction enzymes in this case as well.

3 Likes

Hmm, OK.

It might be helpful if you went point by point thru every single argument that has been made in this discussion so far and showed how this authentic response addresses each and every one of them.

I look forward to it!

1 Like

I would very much like to do this but unfortunately I have very little time for it at the moment. Sorry for that.

Correct, thank you.

1 Like

You don’t need to be at the far-right to think that the LL hypothesis has merit.

It reminds me of the way Fauci and Collins tried to disqualify the three eminent researchers behind the GBD by labeling them as fringe scientists. The problem is that the path set out in the GBD is now widely recognised as having been the best response to the pandemic.

Akin to P-hacking?

1 Like

That’s OK. It wasn’t really a serious request, anyway. Anyone can see his supposed “authentic response” is nothing of the sort.

1 Like

You do have to ignore the science, the facts, to give Wasburne’s ideas anything more than a cursory thought. That is a pretty apt description of the far-right.

5 Likes