Absence of sequences

Not sure what to make of this, but the topic strays close to matters such as junk DNA, function, neutral evolution, and the like.

The abstract:

Minimal absent words (MAWs) are minimal-length oligomers absent from a genome or proteome. Although some artificially synthesized MAWs have deleterious effects, there is still a lack of a strategy for the classification of non-occurring sequences as potentially malicious or benign. In this work, by using Markovian models with multiple-testing correction, we reveal significant absent oligomers, which are statistically expected to exist. This suggests that their absence is due to negative selection. We survey genomes and proteomes covering the diversity of life and find thousands of significant absent sequences. Common significant MAWs are often mono- or dinucleotide tracts, or palindromic. Significant viral MAWs are often restriction sites and may indicate unknown restriction motifs. Surprisingly, significant mammal genome MAWs are often present, but rare, in other mammals, suggesting that they are suppressed but not completely forbidden. Significant human MAWs are frequently present in prokaryotes, suggesting immune function, but rarely present in human viruses, indicating viral mimicry of the host. More than one-fourth of human proteins are one substitution away from containing a significant MAW, with the majority of replacements being predicted harmful. We provide a web-based, interactive database of significant MAWs across genomes and proteomes.



How cool! It’s the bullet-holes-in-the-planes story.


Well from my superficial reading of this, so far, it does make sense that there would be not-allowed sequences of DNA or amino acids. Either because they have properties that lead them to interact deleteriously with other cellular processes (hence are selected against), or because they look like sequences of pathogens recognized by the immune system. Very fascinating.

1 Like

Could someone list a few of the forbidden oligomers in humans? Curious.

Relevant table from the paper:

Their database:

1 Like

Table 3.

List of significant genomic MAWs in (A) Homo sapiens , (B) Mus musculus and © Pan Troglodytes . List of significant peptide MAWs in (D) H. sapiens and (E) M. musculus

MAW q -value Correction method
A Genomic MAWs ( Homo sapiens )
TATTATGCGCG 1.61e-05 Bonferroni
TTTCGCGAAATT 1.64e-05 Bonferroni
AATTTCGCGAAA 2.04e-05 Bonferroni
CGCGCATAATA 2.12e-05 Bonferroni
AAATTGGCGCAGG 8.30e-04 Bonferroni
CCTGCGCCAATTT 8.94e-04 Bonferroni
GGCGATTTTTGGG 9.98e-04 Bonferroni
TTTGGGCGCAACA 1.12e-03 Bonferroni
TGTTGCGCCCAAA 1.71e-03 Bonferroni
ATTTTTTACGGGC 1.96e-03 Bonferroni
ATTTCGCGAAAT 2.39e-03 Bonferroni
CCCAAAAATCGCC 2.92e-03 Bonferroni
GCCCGTAAAAAAT 4.36e-03 Bonferroni
D Peptide MAWs (Homo sapiens)
TVAER 4.40e-05 Tarone
EQAVP 0.000102 Tarone
TVIEL 0.001373 Tarone
AKITL 0.001424 Tarone
ATPAD 0.001916 Tarone
DLKQV 0.002061 Tarone
ALQVI 0.002852 Tarone
VDEAR 0.004138 Tarone
LVVPR 0.004417 Tarone
ELFGV 0.005102 Tarone
PTILA 0.006082 Tarone
NGLGV 0.006793 Tarone
INESS 0.007286 Tarone
1 Like

Thinking out loud here.

Genetic entropy is quite a big concept for creationists - where they argue there is an inevitable decline and degradation, entropy of genomes until catastrophe and the species become extinct under the load of genetic entropy.

Perhaps these MAWs can provide a quantitative, measurable and objective measurement of genetic entropy;

If genetic entropy is true, there should be an unstoppable accumulation of MAWs over time.

If there is no inevitable accumulation of MAWs, then we can say yes, natural selection is sufficient to keep the genome clean enough for life to continue, refuting the concept of genetic entropy and catastrophe.


Nah I don’t think that’s a good argument against GE. At least from what I have seen of (at least somewhat competent) proponents of GE, they haven’t argued that mutations with deleterious effects large enough to be visible to selection do not occur.

Rather they have been saying that there is a class of mutations of such tiny fitness effects that they are effectively invisible to selection, and that there is a vast preponderance of deleterious mutations in this class, so much so that it constitutes a totally overwhelming majority of all mutations that occur.

The GE proponent is likely then to just respond that the MAW sequences identified simply have large-enough deleterious effects to be visible to selection, and hence their non-accumulation in genomes isn’t evidence against GE.

1 Like

That’s really surprising.

I notice that two of the DNA MAWs are almost identical, but they’re the only ones with TTTCGCGAAA. Does that mean that, say, CTTTCGCGAAAG does occur? It would be interesting to see a frequency distribution of oligomers that do occur but have similar though not identical motifs to the observed MAWs.

Are these MAWs all leading strand sequences? Do their reverse complements also not occur?

I think that @Witchdoc’s point was that if GE is a scientific hypothesis, it should be making empirical predictions; we all know that they won’t do that and that they can concoct endless excuses for what is known.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.