Geneological Adam and Eve software

tdarr-okbu · December 22, 2021, 7:51pm

Hello - I am a professor of computer science at Oklahoma Baptist University and am interested in introducing my students to the simulation in the Geneological Adam and Eve book. I’d rather not re-create the wheel.

Is the software available? I did a search in several places (including this forum), but did not see anything.

Regards,

Jordan · December 22, 2021, 10:05pm

That would be very interesting indeed. I’ve not see any code for the simulation out in the wild.

Chris_Falter · December 22, 2021, 10:12pm

@swamidass - This would be a question you might be able to answer.

@tdarr-okbu - Great question! A student team could learn a lot of important computer science concepts, and more importantly, gain the skills to implement them, by writing the appropriate code.

In fact, speaking as a senior lead data scientist at a super-huge consulting firm, I would be impressed to see such a project on the resume of an interviewee for an entry-level data scientist position, and even more impressed to see his/her code in a public Github repo.

Pax Christi,
Chris

swamidass · December 23, 2021, 2:06am

That’s really cool! I’d be really curious to see the lesson plans for this. Perhaps we could publish them at PS.

The software from the Nature 2004 paper is not available. I’ve written some software of my own but its not in any shape to share at this time.

My suggestion is to use SLIM.

They have an excellent forward simulator and an active community. While it doesn’t usually track genealogical ancestry, it might be flexible enough configure for that purpose. I’d check with their discussion group.

The other option is to hold off from a full fledged simulator, and simulate focused questions. For example,

In a well mixed population of size N, how many generations before a randomly chosen individual becomes a universal ancestor?

That’s pretty easy to code up in python. Here it is in pseudocode:

import numpy as np

N = 10000
generation = np.zeros(N)
generation[0] = 1

for x in xrange(num_generations):
  men = generation[N/2:]
  father = (np.rand(N) * N/2).floor()
  father = np.take(men, father)

  women = generation[:N/2]
  mother = (np.rand(N) * N/2).floor()
  mother = np.take(women, mother)

  generation = mother + father 
  if np.alltrue(generation > 0 ):
    print("Generations to universal ancestor: ", x+1)
    break

  if np.alltrue(generation == 0):
    print("Generations till extinction: ", x +1)
    break

I’m calling this pseudocode because I haven’t tested it yet, but something very close to this will work.

You can have them compare that with the theoretical expectation (it is very close). They can also measure the variance there is between multiple runs (it is exceedingly low). You can also ask what chance there is of extinction of a lineage (very close to the theoretical value). You can also ask how this adjusts if you change the mating algorithm (usually not much affect at all). You can ask what the distribution of times is for multiple runs (it will look like a truncated and skewed gaussian).

One hint is that computing the most recent universal ancestor of a given population is much harder. That requires you to track every signal person in the whole simulation. I do NOT recommend this as it isn’t the most interesting quantity and it is technically challenging to compute. Instead, the average time to universal ancestry is a better place to focus.

Perhaps @evograd, @davecarlson or @Joe_Felsenstein has some suggestions I don’t know about.

deuteroKJ · December 23, 2021, 4:09am

Welcome @tdarr-okbu! I know the prez and several profs at OBU. Good things going on there!

Joe_Felsenstein · December 23, 2021, 9:25pm

I sort-of see how your pseudocode works. It calculates how long until one particular individual among 10,000 starting individuals has everybody in a 10,000-individual population as its genealogical descendant, when each individual has one of the 5,000 females and one of the 5,000 males in the previous generation as its immediate ancestor. (But there might be another one of the initial individuals who does have everyone as one of its descendants and it does not check for that).
Recall that Josh is interested in who is whose ancestor genealogically, not genetically. So they might all be descended from this individual but have no genes whatsoever from that ancestor. Thus the question asked is one for theology or for genealogical right-of-bragging, and not one of any interest to evolutionary biologists. (I speak in my role as a descendant of Charlemagne, Emperor of the Franks).

Chris_Falter · December 29, 2021, 5:10am

Hi Joshua,

I appreciate your efforts with the Python code. You never claimed it was bullet-proof or even tested, so your generosity is what shines through.

The code does have a couple small issues. The men (right half of the array) start out as all zeros, so there’s no Adam. When you apply np.take() with an index of father, the result will always be an array of zeros at each and every position. So no men in this simulation will ever be a descendant.

Perhaps a greater concern is that there are essentially 2 separated populations (male and female) that reproduce asexually. This does not seem entirely realistic.

I have uploaded a Python notebook here that looks basically right, to my eye. Would you (and @Joe_Felsenstein ) agree? It has not been properly optimized; OTOH some of the functions were designed to accommodate more flexible scenarios such as fluctuations in generation size or in male/female ratio, so it might prove useful as a starting point to those who want to go deeper.

Here is the output of 1 simulation:

Initial ancestral couples: 5000
Number of generations until A/E couple emerged: 13
Number of ancestral couples still in gene pool at simulation end: 3934
Number of A/E couples at simulation end: 16
Number of ancestral couples who are ancestors of 90+% of final generation: 1416

As a pop-gen noob, I found the simulation to be quite instructive! I hope others (such as @tdarr-okbu’s students) will find it so as well.

Thanks for kicking off the code rodeo, Joshua!

Grace and peace,
Chris Falter

swamidass · December 29, 2021, 5:14am

You are right that the first generation does not include a 1 labeled man. The first individual labeled is always a woman.

However, that woman will often become ancestor of everyone, including all the men. So all men will become a descendent of her, at least sometimes.

That’s not the case. They are a well mixed group, with all offspring having one mother and one father. This code might have been misunderstood by you:

Remember, there are N/2 men, but N fathers. So men are, on average, father of two individuals. The same is true of women. They are, on average, mother of two individuals.

This line here adds the labels of the mother and father to give the label of each of N offspring. So each offspring has one mother and one father. If their label > 0, then they are an descendent of the first marked individual.

The output looks about right. It would be great to see a link to jupyterlabs or a google collab notebook.

Chris_Falter · December 29, 2021, 7:13am

It was. Thanks for helping me understand.

Also, I did intend to share this link:

github.com

chrisfalter/DataScience/blob/master/PopulationGenetics/AeSim.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "from collections import Counter\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Each individual in a generation is represented as the set of ancestors\n",
    "# from the initial generation. \n",

This file has been truncated. show original

On a final note, it seems like the mechanisms in your simulation are perhaps a bit obscure, though they yield the correct mathematical result. In a random distribution of genealogical lineage, it is true that we would expect 2 of 10,000 children to have Eve as mother and 2 of 10,000 to have Adam as father. However, Eve’s children and Adam’s children would be identical—the same 2 children. We would not expect Eve’s 2 children to be, for example, at index 97 and index 1245 in the array, while Adam’s are at index 4756 and index 8254. Instead, we would expect Eve’s children and Adam’s children to be the same 2 children at the same 2 indices.

The expected doubling of A and E’s lineage in each generation is in fact due to intermarriage with the descendants of other ancestors. The merging of the genetic trees in each generation is what yields the growth/spread of A and E’s genealogy.

The beauty of your book is that it shows how, given normal pop gen dynamics, an A and E from 10 millennia ago are pretty much guaranteed to become the genealogical ancestors of every human by roughly 0 AD.

The goal of my code was to capture and illustrate the pop gen dynamics—in particular, the merging of genealogical trees— in a way that makes this perhaps counterintuitive concept easy to grasp. I hope I have succeeded, and I would welcome your feedback, along with that of @Joe_Felsenstein and @tdarr-okbu. And why not feedback from @jammycakes , who is a software professional? I hope I have not left anyone out!

Grace and peace,

Chris

EDIT: Corrected link. HT to @swamidass

swamidass · December 29, 2021, 7:15am

Honestly, this seems perfect for @AndyWalsh to dive into.

This isn’t accessible though…

swamidass · December 29, 2021, 7:21am

I think you meant this link:

github.com

chrisfalter/DataScience/blob/master/PopulationGenetics/AeSim.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "from collections import Counter\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Each individual in a generation is represented as the set of ancestors\n",
    "# from the initial generation. \n",

This file has been truncated. show original

Joe_Felsenstein · December 30, 2021, 1:57am

When I agreed roughly with Josh’s code I was considering it as computing how many descendants there were of Adam, not of the couple. In the classical Wright-Fisher model each offspring has two parents drawn at random from among all parents with replacement. Each such parent is drawn at random from the parent generation, including allowing self-fertilization. If you maintain separate populations of females and males, that is a slightly more complicated variant of the WF model. It does not enforce monogamy. One can also model a population in which there are only monogamous pairs, but then you have to pair up everybody at random before drawing offspring each from a random one of those pairs. That would be the monogamous version of the Wright-Fisher model. If you want to also allow for some lack-of-monogamy, then things get lots messier. So I just asked the descended-from-Adam question, which is what the program seemed to be doing.

Chris_Falter · December 30, 2021, 1:57am

Yes, that’s right! Thanks for the help; it’s hard to do links from a phone sometimes.

Chris

Chris_Falter · December 30, 2021, 6:25am

Based on your descriptions, what I coded was a monogamous version of the Wright-Fisher model.

tdarr-okbu · December 30, 2021, 9:37pm

@Chris_Falter are you freely sharing this code? I’d like to contribute to this either via your GitHub project, or a project that I will setup at the Oklahoma Baptist University GitHub account.

tdarr-okbu · December 30, 2021, 9:37pm

Great! I am in my second year teaching here and am excited about the future!

tdarr-okbu · December 30, 2021, 9:37pm

swamidass:

You can have them compare that with the theoretical expectation (it is very close). They can also measure the variance there is between multiple runs (it is exceedingly low). You can also ask what chance there is of extinction of a lineage (very close to the theoretical value). You can also ask how this adjusts if you change the mating algorithm (usually not much affect at all). You can ask what the distribution of times is for multiple runs (it will look like a truncated and skewed gaussian).

One hint is that computing the most recent universal ancestor of a given population is much harder. That requires you to track every signal person in the whole simulation. I do NOT recommend this as it isn’t the most interesting quantity and it is technically challenging to compute. Instead, the average time to universal ancestry is a better place to focus.

Excellent suggestions. I will put those on my to-do list.

tdarr-okbu · December 30, 2021, 9:37pm

I will look into this when I have some spare cycles.

tdarr-okbu · December 30, 2021, 9:37pm

I’d be happy to do that. We are still in the initial stages. See some of my prior responses for some ideas about my “vision” for looking at the intersection of computation and theology.

tdarr-okbu · December 30, 2021, 9:37pm

Yes! I am on the lookout for any intersections between computation and theology. There are things called “analytic philosophy” and “analytic theology” … what about a field called “computational theology”?

BTW - what kind of data science are you involved in? The CIS Department at Oklahoma Baptist University is trying to carve out a niche as a data science focused program.

Topic		Replies	Views
GAE software update Conversation	12	866	May 31, 2023
Three Reviews at BioLogos Conversation Adam , Science , Theology	17	2149	April 5, 2020
My pop gen article at Biologos Conversation Adam , Science	13	1081	July 22, 2021
Mosaic Eve: Mother of All (Part 1) Conversation Science	1	610	February 26, 2021
AskScience: Invitation to a panel on Universal Ancestry? Conversation Science	7	531	September 19, 2020

Geneological Adam and Eve software

Related topics