Random Control for DNA Function

T_aquaticus · January 19, 2021, 5:10pm

Larry Moran has a good blog post on scientific controls as they relate to detecting function in the genome. I think Moran has done a great job describing why controls are important, and what those controls could be when looking for function in the human genome. Larry Moran cites a paper by Sean Eddy:

What are your thoughts? Would random DNA sequence have function according to the criteria set out in the original ENCODE paper? Could random DNA sequence have actual function, and if so, what impact would it have on this control?

Rumraket · January 19, 2021, 5:33pm

Yes, definitely.

Of course random unselected DNA could also turn out, merely by chance, to have a useful biological function. That’s pretty much how junk-DNA evolves de novo into functional protein coding genes.

CrisprCAS9 · January 19, 2021, 5:33pm

I’m not sure if I’ve read that paper before or not, but I’ve used that exact thought experiment in conversations before. It does a great job of getting into the nature of biochemical interactions that most people just don’t intuitively understand. The sorts of biochemical interactions ENCODE describe are not directly indicative of function, at least not in any way that is interesting.

Not only would I expect to see such ‘ENCODE-functionality’ from random sequences, I’d be extremely surprised by its absence.

Random sequences could certainly have function in a general sense, in that they could do ‘something’ that we might find interesting or otherwise alter fitness.

I don’t think it would impact the utility of the control in any way we care about. It is still a baseline comparison: On average 1Mb of random DNA has X biochemical interactions and Y ‘actual’ functions compared to W biochemical interactions and Z ‘actual’ functions in 1Mb of some biological sequence.

evograd · January 19, 2021, 5:59pm

I’m surprised it hasn’t been done already, to be honest. It seems like a relatively easy set of experiments to do these days.

T_aquaticus · January 19, 2021, 6:04pm

Agreed. It also calls into question the many claims ID proponents have made about the rarity of function in sequence space.

T_aquaticus · January 19, 2021, 6:12pm

A good project for a grad student, perhaps?

I would expect upper limits on the length of synthesized DNA fragments, so you would have to clone one or two segments at a time. I would expect an upper limit on BAC size, but I’m sure there are ways around that as well.

T_aquaticus · January 19, 2021, 6:17pm

If we want to use random sequence as a “no function” control then it could be problematic if random sequence has actual function. However, many negative controls in other data sets have real hits, and significant results are often defined as “number of hits over a random control”. Also, if ENCODE would consider almost any random sequence to be 80% functional then it certainly calls their methods into question.

colewd · January 19, 2021, 6:44pm

The challenge here is to look at function during all phases of a eukaryotic animal life span starting from embryo development. Certain transcription activity is different depending on the phases from initial embryo development to the adult animal.

T_aquaticus · January 19, 2021, 6:47pm

That’s not a challenge, especially if you use a model organism like mice. The larger challenge is covering all tissue types within each age range.

Let’s say we do as you suggest with our random chunk of DNA and get good coverage across tissue types. We will also use the methods and criteria set out in the ENCODE paper. We find that 80% of the random DNA sequence has function according to those criteria. Would this indicate that function is easy to find in DNA sequence?

Rumraket · January 19, 2021, 7:37pm

It actually has:

In our paper in this week’s PNAS ](http://dx.doi.org/10.1073/pnas.1307449110), we take a stab at answering this question with one of the largest sets of randomly generated DNA sequences ever included in an experimental test of function. We tested 1,300 randomly generated DNAs (more than 100 kb total) for regulatory activity. It turns out that most of those random DNA sequences are active. Conclusion: distinguishing function from non-function is very difficult.

To test DNA for function, we used a new technique to measure whether a piece of DNA can regulate a downstream gene (a barcoded DsRed reporter gene). One way to define functional DNA in the context of this experiment is ‘any piece of DNA that reproducibly regulates the reporter gene.’

We tested about 2,000 native sequences from the genome (more about that in my next post), and, as a negative control, we also tested random DNAs, DNAs created by scrambling the sequences of genomic DNA.

CrisprCAS9 · January 19, 2021, 8:24pm

As do the numerous random sequence experiments with both RNA and peptides. Sequence space is lousy with function, if you use a sufficiently broad definition of ‘function’. Of course, ID proponents insist on using the narrowest possible definition when discussing random sequence experiments and the broadest possible when discussing junk DNA, because otherwise their entire argument would fall apart.

It either does something sufficiently important the region will show up on GWAS or insufficiently important to be relevant to any discussion of ID. And guess what: We already have the answer! Most of the genome is non-functional at the level relevant to ID.

CrisprCAS9 · January 19, 2021, 8:24pm

We don’t want to use them as ‘no function’ controls, because ‘function’ is nearly impossible to define in a way that is universally useful. We want to use it as exactly what it is: a random control. Or, as your original reference calls it, a ‘noise control’. In any given noisy system some of the noise will, by chance, be ‘meaningful’ by whatever standard is relevant for the system in question. The question is: Does the sequence of interest differ quantitatively or qualitatively from a random sequence in some way, be it biochemical interactions or any other definable metric.

Exactly so.

I think we are well past the point of ENCODE’s methods for ‘functional’ annotation being called ‘into question’. But then, that is primarily a problem of the immense difficulty of defining ‘function’ in the first place.

colewd · January 19, 2021, 8:24pm

That would be very interesting. Also test the function in the mouse stem cells as this is the starting point before differentiation.

It does not help a blind and unguided claim if that is your objective. Your starting point is a living organism.

T_aquaticus · January 19, 2021, 8:26pm

Huh? We are talking about the evolution of new functional sequence in already living species. Of course our starting point is a living organism.

CrisprCAS9 · January 19, 2021, 9:45pm

You don’t think the development of novel functionality from random sequences is supportive of evolution? What?

system · January 26, 2021, 9:45pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Junk DNA, Rana, and Cardinale Conversation Design	19	1025	February 9, 2023
Death of the junk DNA myth? Conversation Science	22	1633	February 1, 2024
Functions are not so rare at all, and definitely not isolated, in sequence space of biopolymers Conversation Science	41	2891	July 12, 2021
Does ID have Hypotheses? Conversation	96	2781	June 17, 2021
Constructive Neutral Evolution Conversation Science	82	5557	July 15, 2020

Random Control for DNA Function

Related topics