Hi all,
Sorry to start yet another new thread, but I have a question about nested hierarchies and bootstrap values.
I created a simple program that generates DNA sequences via an evolutionary algorithm, where each generation, every sequence splits into two ‘daughter’ sequences that gain several mutations. I then input the sequences into FastME, a program that infers phylogenetic trees from sequence data (Lefort et al. 2015).
I also created a different program that generates random sequences, and input those into FastME as well.
I did this in order to test for myself how accurate phylogenetic algorithms are, to see if they could correctly identify the true relationships between each descendant sequence. (The random sequences were basically meant to be a control group.) This was partially inspired by Holloway’s claim of “the fallacy of the phylogenetic signal,” since he suggested that non-evolutionarily produced sequences have just as much phylogenetic signal as evolutionarily produced sequences, and I wanted to test that claim for myself.
What I found was, unsurprisingly, that the random sequences had no discernable phylogenetic signal, and the resulting tree had bootstrap values of 0 at every single node. Furthermore, FastME was able to correctly reconstruct the relationships between the ‘descendant’ sequences produced by the evolutionary algorithm, with much higher bootstrap values than the tree produced by random sequences.
However, despite the fact that FastME was able to reconstruct the correct relationships between the ‘descendant’ sequences every time, some of the bootstrap values were very low – even zero. For example, here are two of the trees produced by FastME:
I thought that maybe the problem was just that there weren’t enough characters (the sequences were only 100 nucleotides long), so I then generated sequences that were 10,000 nucleotides long. But I still had the same problem:
(ATGC : PRESTO)
As you can see, although the bootstrap values were higher overall, some nodes still had bootstrap values of zero.
Can someone explain what this might mean? Does this reflect on the validity of bootstrapping techniques for phylogenetics? Or was there something wrong with my analysis?