Random Control for DNA Function

Larry Moran has a good blog post on scientific controls as they relate to detecting function in the genome. I think Moran has done a great job describing why controls are important, and what those controls could be when looking for function in the human genome. Larry Moran cites a paper by Sean Eddy:

What are your thoughts? Would random DNA sequence have function according to the criteria set out in the original ENCODE paper? Could random DNA sequence have actual function, and if so, what impact would it have on this control?

4 Likes

Yes, definitely.

Of course random unselected DNA could also turn out, merely by chance, to have a useful biological function. That’s pretty much how junk-DNA evolves de novo into functional protein coding genes.

2 Likes

I’m not sure if I’ve read that paper before or not, but I’ve used that exact thought experiment in conversations before. It does a great job of getting into the nature of biochemical interactions that most people just don’t intuitively understand. The sorts of biochemical interactions ENCODE describe are not directly indicative of function, at least not in any way that is interesting.

Not only would I expect to see such ‘ENCODE-functionality’ from random sequences, I’d be extremely surprised by its absence.

Random sequences could certainly have function in a general sense, in that they could do ‘something’ that we might find interesting or otherwise alter fitness.

I don’t think it would impact the utility of the control in any way we care about. It is still a baseline comparison: On average 1Mb of random DNA has X biochemical interactions and Y ‘actual’ functions compared to W biochemical interactions and Z ‘actual’ functions in 1Mb of some biological sequence.

2 Likes

I’m surprised it hasn’t been done already, to be honest. It seems like a relatively easy set of experiments to do these days.

2 Likes

Agreed. It also calls into question the many claims ID proponents have made about the rarity of function in sequence space.

A good project for a grad student, perhaps?

I would expect upper limits on the length of synthesized DNA fragments, so you would have to clone one or two segments at a time. I would expect an upper limit on BAC size, but I’m sure there are ways around that as well.

If we want to use random sequence as a “no function” control then it could be problematic if random sequence has actual function. However, many negative controls in other data sets have real hits, and significant results are often defined as “number of hits over a random control”. Also, if ENCODE would consider almost any random sequence to be 80% functional then it certainly calls their methods into question.

The challenge here is to look at function during all phases of a eukaryotic animal life span starting from embryo development. Certain transcription activity is different depending on the phases from initial embryo development to the adult animal.

That’s not a challenge, especially if you use a model organism like mice. The larger challenge is covering all tissue types within each age range.

Let’s say we do as you suggest with our random chunk of DNA and get good coverage across tissue types. We will also use the methods and criteria set out in the ENCODE paper. We find that 80% of the random DNA sequence has function according to those criteria. Would this indicate that function is easy to find in DNA sequence?

It actually has:

In our paper in this week’s PNAS ](http://dx.doi.org/10.1073/pnas.1307449110), we take a stab at answering this question with one of the largest sets of randomly generated DNA sequences ever included in an experimental test of function. We tested 1,300 randomly generated DNAs (more than 100 kb total) for regulatory activity. It turns out that most of those random DNA sequences are active. Conclusion: distinguishing function from non-function is very difficult.

To test DNA for function, we used a new technique to measure whether a piece of DNA can regulate a downstream gene (a barcoded DsRed reporter gene). One way to define functional DNA in the context of this experiment is ‘any piece of DNA that reproducibly regulates the reporter gene.’

We tested about 2,000 native sequences from the genome (more about that in my next post), and, as a negative control, we also tested random DNAs, DNAs created by scrambling the sequences of genomic DNA.

2 Likes

As do the numerous random sequence experiments with both RNA and peptides. Sequence space is lousy with function, if you use a sufficiently broad definition of ‘function’. Of course, ID proponents insist on using the narrowest possible definition when discussing random sequence experiments and the broadest possible when discussing junk DNA, because otherwise their entire argument would fall apart.

It either does something sufficiently important the region will show up on GWAS or insufficiently important to be relevant to any discussion of ID. And guess what: We already have the answer! Most of the genome is non-functional at the level relevant to ID.

1 Like

We don’t want to use them as ‘no function’ controls, because ‘function’ is nearly impossible to define in a way that is universally useful. We want to use it as exactly what it is: a random control. Or, as your original reference calls it, a ‘noise control’. In any given noisy system some of the noise will, by chance, be ‘meaningful’ by whatever standard is relevant for the system in question. The question is: Does the sequence of interest differ quantitatively or qualitatively from a random sequence in some way, be it biochemical interactions or any other definable metric.

Exactly so.

I think we are well past the point of ENCODE’s methods for ‘functional’ annotation being called ‘into question’. But then, that is primarily a problem of the immense difficulty of defining ‘function’ in the first place.

1 Like

That would be very interesting. Also test the function in the mouse stem cells as this is the starting point before differentiation.

It does not help a blind and unguided claim if that is your objective. Your starting point is a living organism.

Huh? We are talking about the evolution of new functional sequence in already living species. Of course our starting point is a living organism.

1 Like

You don’t think the development of novel functionality from random sequences is supportive of evolution? What?

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.