What is Junk DNA?

We have a genome, so does every other living organism on earth. Some regions of the genome carry information in the form of specific nucleotide sequences that are transcribed and translated to functional proteins. Other regions are transcribed to produce functional RNA molecules. The collection of DNA regions encoding functional proteins or RNA is called coding DNA. The rest of the genome which doesn’t code for functional proteins or RNA is called noncoding DNA.

Some regions of noncoding DNA are functional like regulatory sequences (which control the rate at which coding DNA regions are transcribed), many introns, telomeres etcetera.

Other regions of noncoding DNA are nonfunctional and are called junk DNA like the GULO pseudogene (and huge number of other pseudogenes), a lot of introns etcetera.

Mutation and recombination produce coding and noncoding DNA. For example, a gene duplicate might go on to acquire new function and sequence composition (becoming a novel coding DNA region) or may suffer mutational changes that break it’s original function (becoming noncoding and nonfunctional).

How much of DNA is junk (noncoding, nonfunctional DNA)? The answer would depend on a lot of factors. I recently conducted a basic comparative genome analysis of Mycobacterium tuberculosis (causes tuberculosis) and Mycobacterium leprae (causes leprosy) to look for the genetic basis for the some of the marked differences between both sister “species”. Interestingly, relative to M.leprae, M.tuberculosis has ~6 pseudogenes and over 3000 functional genes. However, relative to M.tuberculosis, M.leprae has over 1000 pseudogenes. In fact, >50% of the genome of M.leprae is made up of pseudogenes, insertion sequences and others. This isn’t new and has been known for a long time, but it beautifully explains one annoying characteristic of M.leprae, which is its difficulty to be cultured in the lab (in contrast to M.tuberculosis which is easy to rear in the lab).

You see, many genes which encode proteins needed for catabolizing carbon sources like galactose (which can form part of the nutrient content of agar media) have become pseudogenized in M.leprae. They are now useless pieces of DNA, but their orthologs are still functional in M.tuberculosis. This explains why M.leprae is seriously difficult to grow in the lab as it has lost the genetic material (on a massive scale) needed to utilize agar media nutrients. This massive pseudogenization (and genome downsizing as well) most likely reflects it’s transition from being free-living to a parasitic lifestyle (reductive evolution). I think this is what happened (reductive evolution) to the great apes when we lost the need to endogenously make vitamin C thanks to an abundant supply of vitamin C rich diets. Note that some of these pseudogenes in M.leprae are conserved (when you look at multiple Mycobacterium species) which suggests they are functional (and so are most likely not junk).

In humans, noncoding DNA comprises most of our genome. Some are functional, while others aren’t and there is good reason to think a huge chunk of our noncoding DNA is nonfunctional. If anyone thinks otherwise, they should pass the onion or amoeba test in flying colours.