Inferring indel divergence patterns

Hello everyone. One of the most striking results of chimp genome analysis is that single-nucleotide differences between humans and chimps matches substitution patterns observed inside human species.
In the same vein, inferred indels for the human-chimp divergence are also reported to follow mutational patterns observed between populations: indels rarer than substitutions, short indels more frequent than long ones. [1].
However, sequence alignment algorithms seem to favor those results for indels, because gap openings are penalized more than substitutions and so are long gaps, compared to short ones. That elicits the following question, which I have been asking myself for a while: how can we know that the aforementioned indel patterns aren’t an artefact of alignment methods?

I’d say that we either try different gap penalties and rerun the alignments or trust the author to have done that, at least for representative samples.

Not a geneticist but this seems pretty simple.

The sequences of particular species genome (or sub-populations within a species) are usually the products of sequencing one or more individuals from those populations, not of alignment methods.

So then you sequence multiple individuals, then compare their sequences to see where there are indels. If you’ve sequened three different people, you can count the indels between them.

Then you do the same for another species (or another sub-population of the same species), see where there are indels, and then you can do cross-species (or between sub-population) comparisons to see how they compare (if the frequencies are the same).

What am I missing?

1 Like

I haven’t (yet) read this article from last year, but it’s probably the best place to start with your question:

https://academic.oup.com/mbe/article/41/9/msae177/7739074

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.