By Patrick Shaw Stewart, October 2022
The more genes are involved in the mechanism that transfers genetic material between individuals – including genes that usually do something else – the lower the chance that defective DNA polymerase genes with mutator mutations will be transferred.
Mutations must occasionally occur in all genes, including the genes that encode the proteins that replicate genetic material – the nucleic acid polymerases and their associated proteins. Some of these mutations must inevitably reduce the fidelity of DNA replication (or RNA replication in RNA viruses), without actually being fatal. Lineages that replicate with reduced fidelity must therefore exist in all biological kingdoms, although we do not know how common they are. As far as we know, error-prone polymerases have no phenotype other than an increased mutation rate. A fundamental problem at the heart of evolutionary biology is, therefore, to explain how high-fidelity replication originally evolved, and what selective pressures now maintain it. As part of a solution to this problem, I present a simple “Everest hypothesis”. This proposes that natural selection consistently adds unnecessary complexity to the mechanisms that transfer genetic material between individuals. This increases the number of genes involved in genetic transfer, including genes that have other uses (for example genes participating in the development of intelligent brains). Individuals with defective DNA polymerases are likely to have more mutations in these (and all other) genes, and, since most mutations are deleterious, the chance of transferring genes that encode error-prone polymerases is reduced. (It is also likely to reduce the chance of transferring other defective “housekeeping” genes.) Many sexual organisms therefore choose their partners (I suggest) by monitoring a variety of complex of behaviours, physical displays or biochemical mechanisms, often generated by the interaction of many gene products acting in sequence so that a defect in a single gene can result in failure to generate the sought-after character. Many puzzling biological phenomena among sexual organisms can be explained along these lines. The migration and spawning of Atlantic salmon and the complex displays of birds of paradise may, for example, be best understood as “tests” to establish whether potential sexual partners are capable of high-fidelity genetic replication. (Other explanations of these phenomena in the scientific literature may be correct but less important.) Animals that have developed physical handicaps that appear to be harmful, such as peacocks, may be special (extreme) cases.
Imagine a woman who announces publicly that she will have sex with any man, but only on the summit of Mount Everest. Moreover, the potential partners must have solved a difficult sudoku puzzle that they pick up on the way up and (so that she can choose quickly) they must write – display – their solutions in large numerals on a banner that they bring along with them. If it were practical this would be a reasonable mating strategy for both partners: both mother and father are likely to have better-than-average genes. I suggest in this essay that many plants and animals use similar strategies: they set up challenging practical “obstacle courses” for potential mates, and may also demand complicated physical displays that can only be generated by the interaction of many genes. These strategies can show that potential partners’ genes are stable before they agree to mate with them. In Part 2 this essay will consider simple entities such as viruses, which need to use a different approach because mate selection is not available to them.
No molecular replication system is error free. The mutation rate in humans has been estimated to be around 2.5x 10-8 mutations per nucleotide site per generation [Nachman] and bacteria have similar mutation rates (approximately one error per billion base pairs copied ). DNA replication (and RNA replication in RNA viruses) is carried out by proteins, which are themselves encoded by DNA or RNA sequences. There is therefore a chance that mutations will occur in the genes encoding DNA polymerases and their many “helper” proteins. While some of these mutations will have no effect, and some will reduce fidelity to an extent that is fatal, some must inevitably reduce fidelity to a small degree. Note that such mutator mutations must exist, and, of course, they increase the chance of other mutator mutations appearing during development or in future generations. This might result in a slow (or rapid) increase in mutation rates, eventually causing an “error catastrophe” that would kill descendants or prevent their replication. Biological strategies are therefore needed to avoid this fate, so that simple life-forms can persist and complex life-forms can evolve from them. This essay seeks to identify behavioural, physiological and biochemical strategies that can reduce the number of low-fidelity lineages in a population.
Sexual reproduction is extraordinarily ubiquitous – different forms of it are used by virtually all plants and animals, most fungi, and many protists. Some biologists regard sexual reproduction as paradoxical [Otto]. Many theories have been put forward to explain the ubiquity and persistence of sex. Hill and Robertson suggested that sex allows the combination of two or more beneficial mutations in one individual, allowing more effective selection [Hill]. Similarly, two or more deleterious mutations can be combined, accelerating their removal from the population [references in this section are mainly from https://en.wikipedia.org/wiki/Evolution_of_sexual_reproduction ]. Ronald Fisher suggested that sex might allow advantageous genes to escape their genetic surroundings if they happen to arise on a chromosome with deleterious genes. Heng suggested that sex can weed out major genetic changes such as chromosomal rearrangements, but permit minor variations such as nucleotide alterations. A widely-discussed theory, known as the Red Queen Hypothesis, suggests that sexual lineages are better able to resist parasites, because the combination of parasitic resistance alleles of offspring will differ from their parents’ [William D. Hamilton]. Others see sexual reproduction as a DNA repair mechanism . On the other hand, sex has disadvantages. John Maynard Smith pointed out, for example, that an asexual population can grow much faster than a population with two sexes because males do not produce eggs and bear offspring. Serious problems with the conventional explanations of sex have led many biologists to conclude that the benefit of sex is a major unsolved problem in evolutionary biology [so says Wikipedia; reference? Maybe William D. Hamilton again].
I suggest that many of the theories of sex discussed above are important and correct, but that the most fundamental and important aspect has been overlooked. So, to all these explanations I would like to add one more: that sexual reproduction is a very effective way to recombine low-fidelity lineages that have acquired beneficial mutations with lineages that have preserved something that is very valuable but easily lost – high-fidelity replication.
Mate selection in sexual reproduction allows low-fidelity lineages to be avoided
The only phenotype of mutator mutations is (presumably, see below) an increased number of mutations, which appear randomly throughout the genome. How can this phenotype be detected? I suggest that organisms use a variety of approaches, which monitor complex behaviours or structures that are generated by many genes. For example, many finely-tuned gene products must be required to make the feathers of a cock bird of paradise, while other genes generate the complex behaviour to display them effectively. Still other genes allow a female bird of paradise to identify the “correct” feathers and display. Mutations in any of these genes could prevent mating. Some animals have complex features that attract mates but are positive encumbrances, such as the tail feathers of a peacock. Biologists put these features, which seem to be harmful for the species as a whole, down to sexual selection, meaning that they are the result of a self-reinforcing “fashion” among the females – it is said that any particular female can’t easily break away from this harmful fashion because if she produces male offspring without these features they will not be able to find mates. Note, however, that not all birds have exotic feathers – many are plain, with both sexes looking alike (for example blackbirds). However, birds that lack complex plumage often have complex vocalisations, which may serve a similar purpose. Humans are attracted to partners with athleticism, pretty faces (which are close to, but not identical to, average faces [Nature paper]), and intelligence and/or a sense of humour, both of which are the product of an extraordinarily complex organ – the human brain. Other animals go to extraordinary lengths to migrate in order to breed. For example, birds often undertake dangerous migrations to breed in locations where they would be unable to over-winter. Atlantic salmon are able to migrate from fresh water to the ocean, and then return, with both sexes undertaking dangerous journeys, including adapting to changing salinity, leaping up waterfalls, avoiding predators, and swimming in shallow water, to return to the streams where they were bred in order to mate. (This really is similar to the fictitious woman who would be willing to mate on the summit of a mountain.) Presumably lineages have appeared in the past that bred in less demanding freshwater or saltwater locations, but, I suggest, they didn’t thrive because they lacked this very effective strategy for eliminating individuals with more mutator mutations and slightly higher mutation rates. Invertebrates may also use complex features and behaviours to attract mates. A similar argument can be applied to bird migrations. For example, fireflies receive and transmit flashed encoded messages to attract mates, while medflies and some spiders perform complex dances. Insects such as cicadas, mayfly and ants lack wings throughout most of their life-times, but grow wings – very complex structures – in order to mate. (I appreciate that wings also allow such insects to disperse themselves, but I suggest that they serve a double purpose – allowing dispersal while also acting as a filter that removes low-fidelity lineages). Corrals monitor water temperatures, light, and the cycles of the moon (or tides) in order to synchronize their spawning.
Zahavi’s handicap principle
In 1975 Amotz Zahavi suggested that characteristics, behaviours and structures in animals that confer handicaps may evolve by sexual selection because they “test the quality” of the animals that possess them [Zahavi]. The characters selected in this way must (according to Zahavi) reduce the fitness of the individuals that are subsequently selected as mates. Like the Everest hypothesis, the handicap principle suggests that sought-after characters are used to advertise the quality of genes whose effects would otherwise be hidden. There are, however, fundamental differences between these two hypotheses, which I have listed in table 1. As Zahavi pointed out, the evolution of these sought-after characters may be explained by more than one hypothesis [Zahavi]. I’m certainly not saying that the handicap principle is wrong: in the example given in the Introduction, above, if a man arrived on the summit of Mount Everest with a bunch of flowers, he would be applying the handicap principle. It might work!
|The principle proposes evaluation by individuals of an indirect signal: mates that squander scarce resources show that they are of good quality and are selected.||The principle proposes direct evaluation of characters that are correlated with low mutation: individuals simply select mates that have a sought-after character that can only be created by an intact set of genes.|
|This eliminates bluffing by giving a reliable signal that cannot be faked because it requires the consumption of a scarce resource.||Bluffing is impossible because the sought-after character can only be generated by the possession of the appropriate genes.|
|Seeks to explain the puzzling appearance and behaviour of some animal species.||Seeks to explain the ubiquity of sexual reproduction and the existence of high-fidelity genetic replication.|
|Applies to a subset of animals that have developed handicaps. Major handicaps seem to appear randomly in a subset of species.||Applies to all complex organisms. In some cases, the sought-after characters may be handicaps, but in other cases they may be useful in themselves (e.g. intelligent brains).|
|The signal evaluated should lower the fitness of selected individuals in relation to the main ecological problems of the species.||The sought-after character may be beneficial to the species in spite of being costly to the individual if it effectively identifies individuals with high mutation rates (as well as deleterious mutations in other housekeeping genes).|
|Patterns such as the eyes on a peacock’s tail are incidental – the cost of the sought-after character is the important factor.||Sought-after characters include (but are not limited to) patterns that can only be generated by the interaction of many genes. The extraordinary symmetry of the peacock’s tail suggests that symmetry is strongly selected by peahens (figure XX).|
|The principle focuses on biological fitness, which is difficult for scientists to define or quantify. For example, if the environment changes a new set of alleles may confer greater fitness.||The principle focuses on mutation rates, which are well-defined and can be measured directly and accurately by scientists.|
|Species and populations that have greater handicaps are expected to be at a disadvantage compared to other comparable groups with more modest handicaps.||Species and populations with more extreme handicaps may be at a long-term selective advantage compared to other comparable groups, and may thrive, if the handicaps successfully reduce transmission of error-prone polymerases.|
At first sight it might appear that flowering plants are a problem for the theory. However, it turns out that plant fertilization is complex: pollen tube elongation in the maternal tissue and navigation to the ovule require intimate successive cell–cell interactions between the tube and female tissues. This process can create complex “tests” for pollen grains (which should be thought of as haploid organisms that are capable of producing sperm) using multi-layered signalling pathways, involving many gene products, which may weed out error-prone lineages (figure x, from Li et al.). The Everest Hypothesis can explain why complex multi-layered mechanisms are beneficial.
Other benefits of mate selection
Note that the mate-selection strategies mentioned above can also filter out other defects that might otherwise be hidden. These complex behaviours, displays and biochemical mechanisms may also show up mutations in “house-keeping” genes that are active in all cell-types such as ribosomal and cell-cycle proteins, histones, mitochondrial proteins, as well as transcription, protein processing, RNA splicing, and translation factors.
The evolution of high-fidelity polymerases
The analysis above cannot fully explain how fidelity is maintained. For example, no matter how effective our heroine’s scheme at the summit of Everest may be, there remains a chance that a novel mutation may arise in one of the polymerase genes (humans have at least five DNA polymerases) in the sperm cell that fertilizes her egg. This suggests that fidelity would slowly decrease. To develop and maintain high fidelity replication in the long term, we also need active selection of mutations that enhance polymerase fidelity. How can such selection be achieved? I suggest increased fidelity can only evolve in special conditions – namely situations where an organism is very well-adapted to its environment. For example, the deep beds of fossilised seashells suggest that some species were virtually unchanged during extended geological periods [example]. In these conditions almost any mutation would be deleterious, and high-fidelity polymerases would be beneficial immediately. Having evolved, high-fidelity polymerases would be valuable even to species that need to adapt more rapidly.
Plant and animal breeding
It would be very interesting to talk to breeders. Do they notice that certain strains (although they may have desirable characteristics) are “weak”? Can the weakness be eliminated by crossing with more vigorous strains? I know that foresters take a lot of trouble to acquire good-quality seed. Rose breeders sometimes refer to “effete” lineages, which have difficulty reproducing. Isolated animal communities in zoos could be examined, including sequencing individuals’ polymerases. Can isolated communities maintain their fidelity by preventing low-fidelity individuals from breeding?
Early-expressed suicide genes etc.
I can imagine a complex biochemical mechanism that is highly sensitive to mutation, operating in early development. Mutation would result in the death of the organism. This would be the biochemical equivalent of asking a baby to walk across a high tight-rope (if that were possible). Do things like that exist? Just an idea – something to look for.
Chromosomal abnormalities are found in more than half of embryos miscarried in the first 13 weeks [Kaji 1980] – this may be related to my suggestion. And why did the other half miscarry?
I said in the abstract that, as far as we know, error-prone polymerases have no phenotype other than an increased mutation rate. But maybe they do. What would/could that look like?
Part 2: the Elimination of Mutator Mutations from Quasispecies (First Draft)
Mate selection is not available to simple biological entities such as viruses, but recombination between strains can still eliminate many undesirable mutator mutations. The periodic surges of cases of viral diseases such as influenza and Covid-19 may reflect the emergence of error-prone strains, which adapt to new conditions quickly, but then collapse due to the accumulation of mutations in essential viral genes, including polymerases. New variants (such as Alpha, Delta, Omicron etc in SARS-CoV-2) may emerge as a result of recombination between low-fidelity strains with desirable mutations (including mutations in surface protein genes) and high-fidelity strains that are capable of accurate replication. Finally, a thought-experiment suggests that individual polymerase genes with greater fidelity may evolve during extended periods of low evolutionary change i.e. in populations whose genes are close to equilibrium.
Mutator mutations in viruses
Biological entities such as bacteria, archaea and viruses are relatively simple, and they are often asexual, existing as quasispecies (large groups or “clouds” of related genotypes). They therefore have limited or non-existent opportunities for mate selection. This essay will focus on the simplest group, viruses, including SARS-CoV-2. Although asexual, viruses can recombine when two virions infect a cell simultaneously, with the result that one part of the sequence of progeny virions comes from one lineage, the rest from another. New variants are therefore often the result of one or more recombination events [Bill Gallaher].
Coronaviruses have their own polymerases, which replicate their genetic material, RNA. They also have some of the largest RNA virus genomes (around 30,000 base pairs) and they are said to exist close to “error catastrophe”, where a small increase in mutation would destroy the virus. They need high-fidelity polymerases to maintain such a complex genome. For example, SARS-CoV-2 uses five non-structural proteins to construct the complex that replicates its RNA (figure xx). This complex includes a proof-reading function (comprising NSP14 and NSP10) that reduces mutation by a factor of around 20. All these proteins can mutate, which must occasionally give rise to low-fidelity lineages. The existence of low-fidelity lineages is thus not a matter of conjecture – they must exist. It should be a high priority to find out how common low-fidelity SARS-CoV-2 lineages are, and to determine their role in viral evolution.
Unexplained features of Covid-19 epidemics
The extraordinary surges and rapid collapses of cases that we have sometimes seen in Covid-19 are some of the pandemic’s most puzzling features. For example, dramatic surges and collapses were seen in both South Africa and India (figure xxA), two countries where lockdowns may be relatively ineffective. Similar patterns can be seen in other countries, including the UK, Austria and France (figure xxB-D). I suggest that these surges mainly comprise low-fidelity strains, because their higher mutation rates allow them to evolve faster than the ancestral high-fidelity lineages, and to out-compete them. According to this view a high mutation rate in the presence of strong selection may be favourable at first, then detrimental. It should be borne in mind that mutations that increase the chance of transmission will often not be selected within the host – in fact the opposite may be the case – so many advantageous mutations can only be selected during transmission. Moreover, most infections are caused by the transfer of a relatively small number of virions that are airborne. So, with more mutations, and strong selection, it seems likely that our error-prone mutant could adapt to new opportunities more quickly than high-fidelity lineages. Such lineages have (I suggest) no long-term future in isolation, but their advantageous characteristics can be rescued by recombination of their genomes with those of high-fidelity lineages (figure xx). The overall effect would be alternating surges and collapses of cases, as shown in figure xx.
According to this analysis, we expect to see an increased number of mutations in isolates during case surges in the data produced by organizations such as http://Nextstrain.org, with a reduction at the end of the surge when high-fidelity strains reappear. This is not seen (although, as predicted, new variants tend to start with more mutations than their predecessors, shown schematically by the red arrow on figure xx). Remarkably, there is a plausible explanation for the observed lack of excess mutations during surges: NextStrain has a policy of excluding any strain with an unusual number of mutations that differ between the query sequence and the nearest neighbour sequence (they refer to these as “private” mutations.) It is often noticed, however, that when new variants appear more mutations are seen [R Neher, communication on the Nextstrain discussion forum]. This is generally put down to problems with the amplification schemes used, but it could also reflect a real increase in the number of mutations that arise before recombination takes place. In other words, scientists may have seen the effect that I am postulating, but misinterpreted it.
Several other explanations of the strange peaks and falls in cases have been proposed, such as increasing immunity, behavioural changes in hosts, and non-linear percolation effects. Taking these in turn, (1) increasing immunity cannot plausibly explain, for example, the collapse of cases at the end of e.g. the first surge in South Africa: it is clear that immunity in the population at the end of the surge was not high because it was followed by several larger surges (figures xx and xx). Moreover, we would expect the curves to flatten off gradually and the tails to be longer if the shape was driven by the slow increase of immunity. (2) Behavioural changes are not well-correlated with cases. For example, in December 2021 cases in South Africa first surged, then collapsed, although mobility was found to be steadily increasing for x weeks after the peak (figure xx). A similar pattern was seen in Europe (figure xx: cases and mobility in Austria are shown because the surge was not close to the Christmas holiday; cases close to holidays are difficult to interpret). (3) It has been suggested that non-linear percolation effects can explain the peaks seen, but it’s hard to see how they could routinely generate monotonic rises, followed by monotonic falls. Figure xx shows the patterns that would be expected in an extreme case of non-linearity. Here (red curve), data from the London stock exchange has been increased or decreased by 500 points when it crosses a threshold. In practice hysteresis would be expected to reduce the number of transitions.
The Omicron variant and other SARS-CoV-2 variants of concern
Omicron has puzzled virologists because it has 29 non-synonymous mutations in the spike gene, but only 15 non-synonymous mutations in the whole of the rest of the genome. Moreover, an anomalously low proportion of the mutations in the spike of Omicron and other variants were caused by C-to-T nucleotide transitions. Most C-to-T transitions in SARS-CoV-2 are thought to be generated by host modifications to viral RNA , and we can get an idea of the underlying frequency of C-to-T mutations by looking at synonymous mutations, which are not expected to be strongly selected. Combining the totals of all synonymous “defining mutations” of Alpha, Beta, Gamma, Delta and Omicron variants, the majority (53%) were C-to-T transitions, as shown on table xx [data from NextStrain.org]. However, only 11 out of 92 non-synonymous spike mutations (i.e. 12%) were C-to-T transitions [data for Beta, Delta, Lambda and Omicron came from a twitter-acquaintance. Can anyone help me to get more/better data – e.g. filling in the question marks in the table?]. This data is summarized in Table xx.
|Length||Synonymous mutations~||Of which, C-to-T~||Synonymous C-to-T mutations, %~||Mutations*||Of which, C-to-T*||C-to-T mutations, %*|
|~From “defining mutations” of Alpha, Beta, Gamma, Delta, Omicron, from Nextstrain.org|
|*From sequence data from Beta, Delta, Lambda, Omicron|
I suggest that these variants were generated by recombination between error-prone strains (which had acquired many beneficial, random, non-C-to-T mutations via defective polymerases) and more stable high-fidelity strains. I suggest that much of the right-hand-end “structural protein” sections of the genomes of these variants, including the spike genes, came from error-prone partners, while much of the non-structural protein sections, including the polymerases, came from high-fidelity partners.
A model in the form of a thought-experiment may shed light.
Imagine a laboratory with three fermenters that are continuously fed with a suspension of monkey cells that never varies, and from which waste products are continuously removed. The scientist running the lab adds a human RNA virus to the fermenters. This virus has (say) ten genes, including its own RNA polymerase.
The scientist monitors the fermenters every day, and sequences strains as appropriate. During the first week the number of virions increases steadily in all fermenters, as the virus adapts to the cells. The scientist now adds a frozen sample of the original virus to fermenters 2 and 3. After a second week, another sample of the original virus is added to fermenter 3.
The virion count in fermenter 1 starts to fall and by the end of the first month the virus in fermenter 1 is extinct. The virion count in fermenter 2 also falls to low numbers but the virus lingers for three weeks, before becoming stable and increasing again. After this the scientist takes a sample, and finds that the number of genes in this strain has decreased to eight.
The virus in fermenter 3 grows well and is stable for at least three months. At the end of three months the scientist takes a sample, and finds that part of the genome has been duplicated and the number of genes has increased to eleven.
The scientist now sits down to write his report, which he drafts as follows:
The human virus was not at first well-adapted to monkey cells, and it experienced strong selection in all fermenters during the first week. This seems to have given strains with error-prone polymerases an advantage, because mutations increased and virion counts rose rapidly in all fermenters. The virus in fermenter 1 (where fresh virus was not added) accumulated many mutations in essential genes, and become unviable. When fresh virus with an intact polymerase was added to fermenter 2, a less error-prone strain was created by the recombination of a partially-adapted strain with the original high-fidelity strain. This evolved into a stable strain in fermenter 2 by reducing the size of its genome. Fresh virus was added twice to fermenter 3, which allowed recombination of a fully-adapted strain with the original high-fidelity strain. In the following months, the virus, now well-adapted to its host and in the absence of strong selection, acquired a rare beneficial mutation that increased the fidelity of the polymerase, allowing the virus to increase its genome size to eleven genes.
Is the scientist’s interpretation correct?
Patrick Shaw Stewart, 16 October 2022. Revised 17-31 October 2022.
Nachman, Michael W., and Susan L. Crowell. “Estimate of the mutation rate per nucleotide in humans.” Genetics 156.1 (2000): 297-304. https://doi.org/10.1093/genetics/156.1.297
Otto, Sarah P., and Scott L. Nuismer. “Species interactions and the evolution of sex.” Science 304.5673 (2004): 1018-1020. https://www.science.org/doi/10.1126/science.1094072
Hill, W. G.; Robertson, Alan (1966). “The effect of linkage on limits to artificial selection”. Genetical Research. 8 (3): 269–294. doi:10.1017/S0016672300010156. PMID 5980116.
Li Plant Reproduction (2018) 31:31–41. Multilayered signaling pathways for pollen tube growth and guidance. https://doi.org/10.1007/s00497-018-0324-7
Neher, R, communication on the NextStrain discussion forum: https://discussion.nextstrain.org/t/trends-in-the-prevalence-of-private-mutations/1147
Zahavi A. Mate selection—a selection for a handicap. Journal of theoretical Biology. 1975 Sep 1;53(1):205-14. https://doi.org/10.1016/0022-5193(75)90111-3
Kajii T, Ferrier A, Niikawa N, Takahara H, Ohama K, Avirachan S. Anatomic and chromosomal anomalies in 639 spontaneous abortuses. Human Genetics. 1980 Jul;55(1):87-98. https://link.springer.com/article/10.1007/BF00329132
Hamilton, William D., Robert Axelrod, and Reiko Tanese. “Sexual reproduction as an adaptation to resist parasites (a review).” Proceedings of the National Academy of Sciences 87.9 (1990): 3566-3573. https://doi.org/10.1073%2Fpnas.87.9.3566