Evidence of a synthetic origin of sars-cov-2

I am a bit late to pick up on the conversation, so I will go over what was said here.

It depends on how you define “pseudoscience”.

While there is a substantial debate on the demarcation problem of what separates science from non-science, for the purpose of this conversation I would say that explanations which are testable and falsifiable would count as science sensu lato.

Of course, there are many ideas that would qualify under this lenient criteria. It includes things that range from speculations without any support to ideas that have already been thoroughly falsified, which could be referred to as ‘fringe science’ at best or ‘bad/defunct science’ at worst respectively.

If you define ‘pseudoscience’ as ‘unfalsifiable’, then this prepint paper may not be technically pseudoscience. However, ‘pseudoscience’ is also often used to refer to hypotheses that - while being testable and falsifiable - are perpetuated without any supporting evidence and/or in spite of evidence to the contrary. This is probably the sense of ‘pseudoscience’ @dsterncardinale used here. And after reading the paper and the responses, I think he is right. The conclusions made in the preprint is completely unsupported by the data.

Reviewing the preprint

1. What does the preprint say?

The authors of the paper conclude that the SARS-CoV-2 genome contains ‘fingerprints of genomic manipulation in the form of restriction enzyme (RE) sites. For those who aren’t familiar, REs are endonucleases that cut DNA in a specific site by binding to a recognition site (RS) that is 4-8 bp long. These enzymes are found in bacteria and archaea as part of their defense mechanism against viruses by cutting and disabling the viral DNA. Each RE has it’s unique RS. For example, the RS of EcoRI is the following:
image
(top is 5’ to 3’)

REs that cut DNA like this in a “zig-zag” leave DNA overhangs, also called “sticky ends” because DNA fragments with compatable overhangs can pair up and be ‘glued’ together by a DNA ligase. This allows us to cut and paste different fragments together like this:

The authors claim to show that the recognition sites of REs (specifically Bsal and BsmBI) are distributed in a specific way that is very reminiscent to how scientists would artificially assemble a genome. Specifically, the scientists genetically engineer the viral genome by removing/adding RS by means of inducing silent/synonymous mutations which don’t significantly affecting fitness. RS are added/removed in such a way that the RS are regularly spaced out such that you have a minimal number of fragments that are not too long or too short. The specific claims is that:

  1. The synonymous mutations that added/removed BsaI/BsmBI RS in SARS-CoV-2 compared to wild coronaviruses “is extremely unlikely to have arisen by random evolution”.
  2. The distribution of the BsaI/BsmBI restriction sites “is anomalous for a wild coronavirus and more likely to have originated from an infectious clone designed as an efficient reverse genetics system.”

2. Why the conclusion is unwarranted.

We must first make clear that RS in viruses are not unusual at all. This is exactly why prokaryotes have REs to defend themselves against viruses. This figure below shows the RS located within several coronavirus genomes (credit to Alex Crits-Christoph):

Note that the SARS-CoV-2 (on the top) does not stand out whatsoever. Now, the authors focus on the RS of two specific REs (BsaI and BsmBI). Here are the RS for these two REs (credit to Alex Crits-Christoph):

Again, all these RS are observed among wild coronaviruses. But the authors specifically claim that the distribution of these sites in SARS-CoV-2 is out of the ordinary. Specifically, they took 72 coronavirus genomes, and digested (in silico) each genome 1000 times, each digestion was done with a random pair of REs drawn from a set of 214 REs. From each digestion, they took the length of largest fragment, and expressed it as a percentage of the total length of the genome. From this, they produced a null distribution (grey data points and box plots below). Then they added the data points for the maximum fragment length produced by the REs BsaI+BsmBI after digesting the genomes of engineered coronaviruses and SARS-CoV-2 using (colored points) They note that the largest fragment of SARS-CoV-2 is a significant outliers to this normal distribution, similar to engineered coronaviruses.

This is really weird to me.
First: Comparing 2 different things. Why would they look at the RS pattern of SARS-CoV-2 using two specific REs; and compare that to the RS pattern of wild coronavirus using RANDOMLY SELECTED pairs of REs? The benefit of using the 1000 randomly selected RE pairs is that they could get the huge number of data to construct that grey box plot null distribution. However, I fail to see how this RS pattern of random RE pairs is comparable to the RS pattern of BsaI+BsmBI. The fact that the latter is an outlier with respect to the former could just as easily point to the conclusion that the RS pattern of BsaI+BsmBI within coronavirus genomes (artificial or not) is not typical compared to that of random RE pairs. In other words, their null distribution is not a proper NULL. If this is true, than the RS pattern of BsaI+BsmBI in wild coronavirus genomes would similarly deviate from random RE pairs. However, they didn’t include these data points in the figure.

Secondly: The largest fragment. The authors note that evenly spaced RS is a signature of engineered Coronaviruses. Taking this assumption for granted, I would thus expect them to analyze the lengths of EACH fragment and see if the fragment lengths are about the same, corresponding to them being evenly spaced, but that’s NOT what they did. As noted before, they only looked at the length of the largest fragment. As explained by people in the comments under the preprint, while the largest fragment length is reduced when RS are evenly spaced, the statistics of maximum length is not robust since it has a high variance, which leads to a high false positive rate; i.e. the longest fragment being short often does not equate to evenly distributed RS. And why look at the longest fragment? Why not also the shortest? RS that are very close together are not desirable for genome assembly since that would be redundant. Remember! We want evenly spaced RS. So let’s look at the restriction map of SARS-CoV-2 and see what the shortest fragment is:


Oops. It’s a very small 643 base pair fragment. NOT what you would expect if one had designed the genome with evenly spaced out RS.

Third: The kill shot. The BsaI+BsmBI enzymes belong to a specific class of REs called type IIS, which cut the DNA at a certain distance away from the RS, like this:
image
The benefit of this are two fold. With this you can create unique sticky ends, such that fragments will be stuck together in a specific order. This also allows for simultaneous digestion and ligation, because the RS are removed and not present in the ligated product, which means the end product cannot be digested again. This is very useful in assembling large genomes. Just to make it clear, the RS are removed during this process. Thus, if this method was used to assemble the SARS-CoV-2 genome, then you wouldn’t expect the genome to contain these recognition sites anymore. What is funny is that the authors of the preprint specifically cited this study where researchers used exactly this method. As you can see, the final product would not have these RS anymore (The following is Figure S9 from the study).


The RS (which are colored red) would not be there after the sticky ends (denoted by red lines) are ligated.

Now, people have responded to this by pointing out that one COULD in theory design these RS in a way such that the RS will be present in the assembled genome. However, (1) there is no reason why one would bother with that. You wouldn’t be able to benefit from the simultaneous digestion and ligation. The scenario that the authors of preprint propose for how SARS-CoV-2 was constructed is way more complicated than necessary, as explained by Friedemann Weber.
And (2) in order for the RS to be retained in the final product, the RS have to be oriented in a specific manner, but they aren’t as pointed out by Santiago Sanchez. Again, this pretty much rules out the biochemical possibility that these RS and the corresponding REs were used to artificially assemble the genome.

This should already be enough, but ere are a whole lot more details that I have seen. This paper is just a mess. Just quickly, regarding the mutations, they analyzed the mutations involved in the gain/loss of BsaI/BsmBI recognition sites within SARS-CoV-2 with respect to the wild relative. At minimum, to get these RS from a wild relative, one would need to have 5 specific mutations. These were single synonymous mutations, and they simulated the mutations in silico to show that the odds for obtaining these 5 mutations is very low. However, RNA viruses are subject to strict purifying selection, so synonymous mutations are expected to occur at far greater rates (comment by Jackson Emanuel under the preprint). Furthermore, mutations at these particular sites that cause gain/loss of the RS are common among coronaviruses, pointed out by Kristian G. Andersen, Zhihua Chen, and Alex Crits-Christoph.

EDIT: They also made the mistake of estimating how many changes you need to induce to go from a relative (a cousin) to SARS-CoV-2. However, what they should do is estimate the changes needed to go from an ancestor. Scientists have reconstructed the likely ancestor (recCA) and there are only a ONE mutation needed in this ancestor to get all these RS that are seen in SARS-CoV-2.

In short, this preprint study is picking up a signal from random noise.

Read it. I am not impressed. The only relevant portion merely repeats what they already reported in the preprint. Not responding tot he specific criticism I have seen and (some) repeated here. nauseam

The majority of this piece is spend on flaunting credentials, expressing their frustration with the rejection by mainstream academia, the reason why he co-founded a platform to “improve science communication”, and espousing their strong convictions and motives that was behind the study.

This is just the typical reaction I have seen among those peddling crankery who don’t receive the respect THEY think they deserve.

9 Likes