Especially when you take the birthday paradox into account. It is very likely that overprinting will arise with functional sequences at that density.
I think that writing as though we don’t know much about overlapping genes until you write a paper is offensive.
Adding this without mentioning all of the work already done and published by others is offensive, too.
I don’t see any difference between that and Josh’s characterization.
We do. For specific enzymatic activities. It’s routinely more than one in 10^8 - 10^9. “Folds” is meaningless in this context, as every random protein folds, and there is no one-to-one correspondence between functions and folds.
We have many more than a few. The hundreds we have from catalytic antibodies alone are far more than a few.
Then there are many others.
I find that hard to accept. Scientists cite the relevant data; they don’t use hearsay as you are doing.
Cite the data instead of making an assertion from a questionably authoritative position, please.
The term “highly specified fold” is nonsensical. Folds are structural classifications of proteins. They don’t correspond to functions, as Ann Gauger claims.
My comments were based not on reading Ann’s mind (it was a false dichotomy to imply that either I was making it up or got it from the article), but conversations I have had with her in the past. I don’t have references for my conversations.
By sophisticated I mean intricate. I don’t have a simple objective account of this any more than I do for any such non-empirical principle (e.g. simplicity, elegance, etc.).
I think you should distinguish between overlapping genes and overprinting. Overlapping genes are genetic constructs observed in modern-day genomes. Overprinting is an inferred evolutionary process, and many intelligent design proponents don’t believe that genes originated through such processes, or don’t approach overlapping genes from that direction. I don’t agree with them, but making the conceptual distinction will aid in understanding and representing their views them better, if you want to do that. I think there is some work to do in slowing down and actually getting their claims right a bit more.
I’m most interested in antisense overlapping genes as they can be detected with the methods I favour. They are very rare amongst current virus annotations. There are good reasons to think that many overlapping genes are not yet annotated though, which makes comparisons based on current annotations highly problematic. Also most work claiming to review properties of overlapping genes pertains to small overlaps on the same strand, which isn’t the category that I’m interested in.
I would be interested in cancer references, but it won’t apply directly to my work which is in prokaryotes. I would be curious how the ENCODE data is relevant to protein-coding, I thought that relates just to transcription.
The published results on antisense frames being longer that you refer to pertain to the “-1” reading frame, I suspect, as most papers on the topic deal with this frame. Properties differ across frames and haven’t been studied in much depth.
I didn’t mean to suggest that 1 in 10^-10 was the lower limit on functionality - I just cited it as one example of a claim from an expert which is much lower than some have claimed. The claim made in the talk was that the relevant number is much smaller than this, the point being made was just what he took to be the surprising rarity. The literature has multiple claims of much lower numbers than this, as most of you reading this probably know.
I don’t find this conversation very productive, other than some of the scientific details.
I maintain that you misrepresented Ann’s argument, and that you later drawing in various irrelevant topics does not help - notably, her article said nothing about Axe’s work. I have explained that the inference she made was fully coherent, but based on premises that I disagree with, and that you have misrepresented it as being internally contradictory or incoherent.
A “fallacy”, or for something to be “fallacious”, requires an invalid inference. The inference from more sophisticated design to a smarter designer is not fallacious. Instead, the point of critique is the premise, which we both disagree with. It pays to be careful about such issues, but I find that scientists typically confuse them, which makes for unnecessarily messy discussions and talking past each other.
I don’t currently have time to check for a bunch of exact citations as John Mercer demands. I don’t find his tone helpful, and as it doesn’t seem that he has much interest in being helpful I don’t see the point in engaging further there. I also suspect there are major empirical disagreements on the nature of proteins and protein evolution, with my views having largely been shaped by talks I’ve heard recently from leaders in the field, but given the approach so far I don’t think it’s worth trying to sort through those details here.
Conflating your inference with Josh’s implication is not helpful. I don’t infer anything of the sort from what he wrote.
The reason it isn’t is that you are avoiding any scientific details.
False. I didn’t demand any “bunch” of exact citations. I’d settle for a few.
As I pointed out, I don’t find your tone to be helpful, either, as one should obviously infer from my description of your unwarranted self-aggrandization as “offensive.”
I stand by that, and recommend that you show it to colleagues for their opinion. If you puff yourself up like that in manuscripts and grant applications, you’re going to get hammered.
I don’t find your vague citation of “talks I’ve heard recently from leaders in the field” to be helpful in any scientific context.
Yeah, this class of overlapping genes is not going to be relevant to positive sense RNA viruses. If I find a few moments, I may scrounge around to see if anyone has looked at ribosomal profiles for (-) strand RNAs in plants (or animal cells) infected with (+) sense RNA viruses. This had never occurred to me.
Paging Dr. Pot, you have a call from Dr. Kettle on line 1. Take it from this full professor… In terms of hubris and self-appointed authority, brand-new PhDs have few rivals.
But in any event, I have enjoyed this discussion a great deal and some of the points may have given me fresh insights into my lab’s current work, so I appreciate everyone sharing their knowledge and expertise here. @art - I’m telling you, we have a collaboration in our future if you’re interested.
Perhaps. But there is a lot of examples. I’m going to start another thread when I can to show you what I mean.
As understand it overprinting is not a claim about ontogenetic mechanism. Overprinting means genes that overlap in their coding regions but in different frames, usually antisense to one another. This distinguishes it, for example, from alternative splicing, which is in the same frame. At least that is how I have been using the term overprinting.
Overprinting is typically used to refer to the mechanism of gene origin for overlapping genes. See the frequency of the phrase “originated by overprinting” and similar.
And usually overlapping genes are not in antisense. I will be interested in the references you find, but there are very few long antisense overlaps in viruses or elsewhere that I am aware of. In prokaryotes, for instance, NCBI by default does not permit annotation of a gene embedded in antisense to another.
I suppose that is consistent with my usage. We could just as easily say “originating by overlapping.”
Overlapping and overprinting are observations in biology. Either way, they shed light on the origin of new genes.
Am I missing something here about these definitions @art and @nlents?
None of your business, Josh! Must you be involved in everything that comes out of the PS forum?
j/k I wrote Art about some RNA structure stuff. You know a little bit about what I’m doing lately and I’m currently drowning in RNA genes and I don’t have that much expertise there, so may need to bring in some expert collaborative assistance for peace of mind, if nothing else.
Wikipedia and several papers seem to support my usage:
An overlapping gene is a gene whose expressible nucleotide sequence partially overlaps with the expressible nucleotide sequence of another gene.[1]In this way, a nucleotide sequence may make a contribution to the function of one or more gene products. Overprinting refers to a type of overlap in which all or part of the sequence of one gene is read in an alternate reading frame from another gene at the same locus. Overprinting has been hypothesized as a mechanism for de novo emergence of new genes from existing sequences, either older genes or previously non-coding regions of the genome.[2]Overprinted genes are particularly common features of the genomic organization of viruses, likely to greatly increase the number of potential expressible genes from a small set of viral genetic information.
@Zachary_Ardern note that the references here directly contradict many of the claims you made. Are these references incorrect?
I do agree we should make a distinction between what we observe (overlapping and overprinting) and its use as an evolutionary mechanism (gene arising by…). It seems however that overprinting is just the alternate reading frame subset of overlapping genes.
“directly contradict many of the claims you made”. This is hyperbole and does not encourage me to bother engaging. What claims that I made are contradicted in the quoted part? I welcome any actual corrections.
I only see one - the use of overprinting to refer to “a type of overlap”. I believe that it is most common and most helpful to use “overprinting” to refer to the inferred mechanism of origin, following a long line of papers that have that usage.
You say “overprinting is just the alternate reading frame subset of overlapping genes” - this makes no sense to me. All overlapping genes are necessarily in alternative reading frames, otherwise they are isoforms of the same gene. Maybe though by “alternate reading frame” you mean directly opposite reading frames [same codons in antisense], or something else (?) - I’m really not clear on what you’re saying here.
You stated overprinting was about an evolutionary mechanism, not merely the observation of an out of frame overlap. This contrasts with my understanding, repeated at wikipedia.
You said that overprinting is rare in viruses.
Note this is only true of some classes of viruses, not all. This ends up being instructive as the pressures that give rises to overprinting:
You seemed to argue that overprinting implied evolutionary origins, which is not the case. Overprinting is a proposed mechanism of de novo proteins.
Perhaps you are right, and this wikipedia article is wrong. But these are several discrepencies.
That isn’t quite right, for several reasons.
- Different isoforms of the same gene do not usually overlap at all
- Alternate splicing products do usually overlap, usually with the same reading frame.
- It is also possible for proteins to overlap in same reading frame, but not be #1 or #2, though I’m not sure how common this is.
So #2 and #3 are overlapped genes, but #1 (isoforms) usually isn’t. These are in-frame overlaps though.
More than merely overlap, overprinting adds the additional requirement: alternate reading frame. This can be on the antisense strand (shifted by 0, 1, or 2) or in the sense strand (shifted by 1 or 2). So overlapped genes are a large class that includes overprinted genes. The key distinction of overprinting is that it is in a shifted reading frame.
There is nothing about the definition of overprinting that presumes evoltuionary mechanism.
I think you are misreading those papers. Can you show me some examples? Overprinting has been proposed as a mechanism of de novo gene origins, but this doesn’t mean overprinting is always about de novo gene origins. For example, in viruses, overprinting is not a step towards de novo genes, but a stably maintained feature of the genome.
Of course maybe you did find some literature that uses the term your way. If that is the case, it would be great to know. There is another body of literature that is using it the way I’m describing. I’d be curious to understand the relationship between those two bodies of literature.
From my personal experience working with adenovirus genomes, you can’t increase the adenovirus genome by more than about 5% without running into capsid packaging problems (i.e. too much DNA to fit into the viral particle). Overlapping ORFs are found in adenoviruses. Is this one of the conditions you are alluding to?
Absolutely. There are more!
I disagree with most of your claims here and I am somewhat frustrated by your approach. I don’t think that this conversation is living up to your aims for Peaceful Science.
Wikipedia does seem to use “overprinting” as meaning an overlapping gene, I acknowledged this as the one point of disagreement, but this is not the standard usage in the literature which I have read - see below on this.
Regarding viruses I said specifically that long antisense overlaps are rare. This is what I have seen in the literature, and what I have been told by experts in it. If you have a bunch of examples to the contrary I would be very happy to see them, as I would like a set to study.
I also said that we don’t know how common overlapping genes are. This is true - genomes are imperfectly annotated, and the rules for doing so regarding overlapping genes seem quite arbitrary - and few overlapping genes have been characterized. The part of the Wikipedia article you quoted doesn’t dispute this.
I said explicitly that overprinting is a mechanism. In light of this, this claim, with the word mechanism in bold, did not make sense to me. “You seemed to argue that overprinting implied evolutionary origins, which is not the case. Overprinting is a proposed mechanism of de novo proteins.”
Regarding “overprinting is just the alternative frame subset of overlapping genes”. Perhaps most of the confusion is coming from the fact that I am working with prokaryotic overlapping genes, as you know, so I am not dealing with alternative splicing. However, the overlapping genes literature that I’m aware of only refers to alternative frame overlaps (none of the other classes) as “overlapping genes”.
You make it sounds like the literature I am referring to regarding the usage of the term overprinting is highly abnormal. “maybe you did find some literature …” Frankly this comes across as gaslighting, and I don’t think that it is okay. If you go to google scholar and type in overlapping gene overprinting from a quick scan it looks as if 8 or more of the top 10 hits use “overprinting” to refer solely to the mechanism.
As a minor point actually related to the science rather than the previous bickering, you’ve suggested that size constraints in viruses explain the number of overlapping genes there. That hypothesis is commonly believed, but see e.g. Brandes & Linial (2016) “Gene overlapping and size constraints in the viral world”
Seems we disagree. Maybe you are right. When you have time, please show some references. I meant and mean no disrespect.
Looking forward to reading this one! Thanks.
I said if you go to google scholar and type in overlapping genes overprinting that nearly all of the top hits on this topic use overprinting in the way I have described. It is in numerous paper titles. Overprinting refers to the mechanism of gene origin, and “overlapping gene” as the result. I think you should simply accept that what I said about this was correct, and retract your claim that I am misreading the papers. I think that you are basically engaging in gaslighting here, after I have worked for 3 years on this topic, and I find it highly offensive. I don’t claim to be an expert on much, but this is one area where I do know what I am talking about.
What else do you want references for?
From your side, it would be good to see references showing that there are lots of long antisense overlapping genes in viruses, if you disagree with me on this.
Also references using “overlapping genes” to refer to something other than genes in alternative reading frames would be interesting - though again I am quite sure that the standard meaning of “overlapping” is genes in alternative reading frames.
I can promise you this isn’t gaslighting. That’s quite an accusation @Zachary_Ardern. I have no reason to lie to you. Sometimes disagreements are just honest disagreement.