Thursday, August 8, 2019

Tuesday, March 12, 2019

What if light was important for the origin and early evolution of life?

I have become very interested in the idea that photosynthesis and photosynthetic water oxidation was important for the origin and early evolution of life. Call me crazy, but I have reasons!

In this brief post, I will try to present the reasons why I suspect that the emergence of photosynthesis may predate the last universal common ancestor (LUCA).

I have not always thought this way. I have been led to think this way guided by the results of my research. I think the evolution of photosynthetic reaction centres strongly suggests that light was involved in the early evolution of life.

But how?

Please, bear with me.

The reader should know that I am not an expert on Origin of Life research, and I am only superficially familiar with a couple of the different scenarios. I know for example that today the idea that life arose in hydrothermal vents is very popular, although I also know that it is not the only competing hypothesis. In a couple of years, I might be an expert (I am studying hard).

I also know that quite some time ago, it was speculated and considered that the origin of life was somehow photosynthetic, even oxygenic.

For example, Sam Granick wrote in his famous 1957 paper: “It seems more reasonable to consider that the functions of oxidation and photosynthesis were so fundamental that they were part of the first beginnings of protoplasm that arose from inorganic origins.” Then he went on to say: “I propose, as speculation, that the earliest unit around which any living entity arose was an energy-conversion unit. This unit of mineral origin would contain an organization of atoms that would serve as a photocatalyst, at first perhaps in the decomposition of water by UV radiation.”

Today things have changed and scientist do not think this way anymore. Why is that? It is due to a number of reasonable, but unproven assumptions:

1) Photosynthesis has only been discovered in the domain Bacteria, therefore it appears reasonable that the origin of photosynthesis likely occurred after the divergence of Archaea and Bacteria.

2) Oxygenic photosynthesis evolved in Cyanobacteria, so it appears reasonable that the origin of water oxidation is a late invention relative to the origin of life.

In reality, it is a bit more complicated than that as I have recently discussed. This is mainly because the origin of photosynthesis cannot be determined based on a species tree alone. What I mean is that a gene tree and a species tree do not always correspond. So, to understand at what point in the history of life photosynthesis arose we must understand how and when photosynthetic reaction centres and the chlorophyll synthesis pathway arose.

OK, so what is the evidence that suggests photosynthesis is a pre-LUCA innovation?

Allow me to recapitulate several aspects regarding the evolution of photosynthesis.

Firstly, I have concluded that the divergence of Type I and Type II reaction centres predates the divergence of the major groups of bacteria. This is true regardless of the specific evolutionary processes that led to the current distribution of photosynthesis across the tree of life. In other words, the earliest events in the origin of photosynthesis predate the evolution of most groups of bacteria that we know of, including all phototrophs.

There are several reasons why this can be concluded with a good level of confidence. I cannot discuss them here in huge detail because it is not the point of this post, but if you want to know more please see this, this, or just message me for more details. The most important reason, however, is because both Type I and Type II reaction centres make monophyletic clades. Therefore, before we have the Type II reaction centre of purple bacteria or the homodimeric Type I reaction centre of the green sulfur bacteria, we first need to have the processes that led to the ancestor of all Type I reaction centres and the ancestor of all Type II reaction centres.

From this it can also be concluded that at the point in time of the most recent common ancestor of all phototrophs, whatever this was, Type I and Type II reaction centres had already appeared.

Secondly, I have shown that to explain the structural characteristics of Photosystem II, including the coordination sphere of the Mn4CaO5 cluster (the oxygen evolving complex), water oxidation must have appeared before, at, or immediately after the divergence of Type I and Type II reaction centres.

Putting these two points together, we then get that water oxidation chemistry originated before the diversification of most groups of bacteria, including Cyanobacteria.

Thirdly, I attempted to understand the evolution of Photosystem II as a function of time. What I discovered is that the roots of Photosystem II, as determined by the gene duplication leading to the heterodimerisation of the photochemical core (D1 and D2), trace back to long before the most recent common ancestor of Cyanobacteria. This boils down to the fact that the rates of evolution of Photosystem II are tremendously slow. It is a bit more complicated than that, but this should suffice for the moment.

At this point we have traced photosynthetic water oxidation to an early stage in the evolution of the domain Bacteria.

But how we go from there to before the LUCA?

Warning! I am not trying here to explain the origin of life. I am no trying to come up with a reasonable evolutionary scenario. I am only following the evidence at hand, which is directly derived from the study of the molecular evolution of the reaction centres.

About two years ago, I was at a local meeting at Imperial. I presented my research on the evolution of Photosystem II and a well-known Nobel Prize winner mentioned that the evolution of ATP synthase seemed to share some similarities with Photosystem II.

Basically, the photochemical core of Photosystem II is made of two homologous subunits, D1 and D2. Catalysis occurs in D1. The catalytic core of ATP synthase is made of two homologous subunits, the alpha and beta subunits: they make the hexameric head. The beta subunit has the catalytic active site.

To provide further support that Photosytem II and water oxidation is as old as I suggested in the Geobiology paper, I thought that it would be a good idea to compare it to the evolution of other enzymes. I wanted to compare the D1/D2 and CP43/CP47 duplication events with one duplication that is known to be very ancient and with a duplication that is known to be very recent.

ATP synthase is a perfect point of reference for the very ancient duplication, not only because of those similarities with Photosystem II, but also because we know that the duplication leading to alpha and beta predate the LUCA.

Therefore, if Photosytem II emerged long after the LUCA: then, given the slow and very predictable rates of evolution of these complexes, major differences in evolutionary patters should be absolutely clear.

What I found is that Photosystem II evolves at a slower rate than ATP synthase.

I am talking here of some of the slowest rates of evolution in all biology.

ATP synthase evolves so slowly that even though the duplication leading to alpha and beta occurred before the LUCA, they still retain about 20% sequence identity and they are still structurally very similar. That is slow enough so that after billions of years of evolution strong sequence and structural identity is retained. Because the duplication is so old, then it makes sense that after billions of years the level of sequence identity between alpha and beta is relatively low.

Well, Photosytem II evolves slower than ATP synthase! And the core subunits, D1 and D2, show 29% sequence identity. The antenna of Photosytem II, CP43 and CP47, which also originated from a gene duplication event have about the same level of sequence identity as alpha and beta, 18%. And guess what, the rate of evolution of CP43 and CP47 is only slightly slower than the rate of alpha and beta.

From this reference.

Under similar conditions D1 and D2 are evolving at about 0.12 ± 0.04 amino acid changes per site per billion years (Cardona et al. 2019). CP43 and CP47 at about 0.19 ± 0.04 amino acid changes per site per billion years (unpublished) and alpha and beta at about 0.28 ± 0.06 amino acid changes per site per billion years (unpublished).

This means that there is no differences in the evolutionary patterns of the ATP synthase catalytic unit when compared to the core of Photosystem II! No matter how I model their evolution, I will not be able to place the origin of Photosystem II after the origin of ATP synthase.

The rate of evolution is strongly related to the complexity of the system. A case could be made to argue that all reaction centres show greater complexity than ATP synthase.

Therefore, the earliest stages of Photosystem II evolution could be coincidental or might slightly predate those leading to V-/F-type ATP synthases. If this is the case, then water oxidation and photosynthesis predates the LUCA.

Again, I want the reader to understand that I am not trying to come up with an origin of life scenario based on a collection of reasonable assumptions.

This is the path that the evidence has pointed towards…

Imagine the ribosome. Kind of in between the origin of information processing and protein synthesis. A complex molecular machine made of protein and RNA.

Imagine now reaction centres. Forget everything you know about reaction centres and look at them with fresh eyes. A bag of cofactors and proteins unlike anything else in biology. What if they emerged at the interface between the pre-biotic synthesis of porphyrin-derived compounds and the very first proteins involved in photochemical energy conversion and electron transfer?

I find beauty and harmony in this view.


Friday, March 1, 2019

Two phototrophic strains of Deltaproteobacteria (Myxococcota)

Phototrophy has not been found in Deltaproteobacteria. Using bioinformatics, I show that two distant strains of Deltaproteobacteria probably acquired phototrophy via a single event of horizontal gene transfer from Alphaproteobacteria into the most recent common ancestor of the proposed class of Deltaproteobacteria, Polyangia.

I have uploaded a short document to Researchgate with the details of this. Please have a look if you're interested and leave some feedback.

https://www.researchgate.net/publication/331453324_Two_phototrophic_strains_of_Deltaproteobacteria_Myxococcota


Friday, January 11, 2019

Has scientific output in photosynthesis research peaked?

I have bookmarked search queries for "photosystem", "cyanobacteria", and "photosynthesis" on the pubmed database to keep up to date with the literature. I have done that for quite a few years now and I have noted a trend in the "results per year" box that the search usually shows, on the right corner...

It looks like scientific output in photosynthesis research has peaked. See the graph below that shows the number of papers found for each keyword per year. The trend is clear:


In the years 2000 and 2001 there was a big rise in the number of publications on "photosynthesis" and "cyanobacteria"... and then it kept increasing non-stop. There is a tiny slow-down around the 2008 economic crisis, but since 2015/2016 the output reached plateau. 

Is this reflecting the economy?

I don't think it is just photosynthesis research. Have a look at this, using "mice", "cancer", and "neuron" as search queries:



You can see similar trends... What does this mean? Have we reached the maximum capacity of our intellectual potential as humans?

Well, I do not think so... while the number of PhD graduates and postdocs has increased massively the number of tenure-track positions at universities and other academic institutions has not change at all for decades. So, I don't think it has anything to do with capacity for output, but a reflection of the amount of cash that is invested in research.

It is a problematic trend, however, if one is counting with scientific innovations to overcome the greatest challenge we have ever faced: climate change!

Let me know what you think.

Friday, December 14, 2018

Evolution of the CP43 and CP47 antenna proteins of Photosystem II and the link to water oxidation

In our recent paper in Geobiology we made a strong case for the process of water oxidation to oxygen having originated before the duplication leading to D1 and D2.

Article Early Archean origin of Photosystem II

As you may know by now (if you follow my posts or work), the core of Photosystem II is not just made of D1 and D2, but these also have an intimate relationship with the antenna proteins CP43 and CP47. Why is it intimate? Because the CP43 binds the Mn4CaO5 cluster together with D1.

CP43-E354 coordinates two Mn atoms, and CP43-R357 offers a hydrogen bond to one of the Mn-bridging oxygen atoms and it is within 4 Ã… from the calcium in the cluster.

We have seen now that D2 does not bind a cluster but instead a number of phenylalanine residues seem to replace the ligands and block access to Mn and water. What is remarkable is that CP47 also reaches within D2, as if to provide ligands to a long-gone cluster, but instead it inserts a few phenylalanine residues: one of them within less than 4 Ã… of the redox tyrosine, YD. Have a look at Figure 7H in the paper.

How? Why? What does this mean? Does it mean that in the homodimeric Photosystem II, before the D1/D2 duplication, the water-oxidising cluster was also coordinated by the antenna domain? Like CP43 does today?

When the crystal structure of the homodimeric Type I reaction centre of heliobacteria was released in 2017, I found a Ca2+ bound to the place where the Mn4CaO5 cluster would be, and these Ca2+-binding sites had a number of structural similarities with the water-oxidising cluster that I thought could not possibly be just coincidence. In particularly, the fact that the putative Ca2+-binding site interacted with the antenna domain in a manner similar to Photosostem II.

I discussed this in an early and hasty version of a manuscript that I should be submitting for publication soon. Have a look:

Working Paper Origin of water oxidation at the divergence of Type I and Ty...

Funnily enough, Prof. Bob Blankenship said in a news article that he didn't believe it. Well, he should believed it, because I'm right! :D haha

https://www.quantamagazine.org/simple-bacteria-offer-clues-to-the-origins-of-photosynthesis-20171017/

I jest.

Anyways, I have now taken a closer look at the antenna's extrinsic domains. And I found something AMAZING.

Have a look at the attached figure with the structural comparisons.


A, B, and C, are the antenna of heliobacteria, CP43, and CP47 respectively. In four different views. In grey you see the transmembrane helices and in colours the extrinsic domain between the 5th and 6th helices. In panel D you can see a schematic view.

I have split the extrinsic domain of CP43 into three bits: EF2, EF3, and EF1.

EF1 is retained in all Type I reaction centres (except PsaA and PsaB) and in CP43 and CP47.

EF3 binds the manganese cluster in CP43. This EF3 region is also found in CP47, but it is at a different location! A change of place occurred!

There is sequence identity in all of the matching domains once they are compared to each other.

Have a look at the attached alignment comparing only the EF3. Sequence identity is unambiguous.


The green arrows indicate the positions where EF3/EF4 are “inserted” in both subunits.

The two residues at homologous positions in the CP47-EF3 region bind a calcium! Yeah, that is right! They bind a calcium!

CP43-E354 is CP47-E435, and CP43-R357 is CP47-N438 as shown in the figure. The Ca2+ is not found in the CP47 of photosynthetic eukaryotes (I did not see it in the structure of the red algae PSII). Except perhaps for the PSII of Cyanophora paradoxa and relatives: early-branching algae.

In CP47, EF1 which in heliobacteria binds the Ca2+, interacts with the CP47-N438 via K332.

The phenylalanine residues that in CP47 insert themselves into D2, are found in the region marked as EF4, which does not exists in CP43.

The level of sequence identity between CP43 and CP47 is about 20%. But this falls to virtually 0% in the extrinsic domain if these are compared in their current order. If you remove EF4, and align the homologous bits together, the sequence identity is back to 20%! Unbelievable.

You might think that 20% overall sequence identity is too low, but the level of sequence identity between the alpha and beta subunits of ATP synthase is also 20%. Just to give you context.

You might think that the CP43 and CP47 have evolved very fast… the opposite is true. Currently after D1 and D2, the second slowest evolving reaction centre subunits are the CP43 and CP47, evolving even slower than ATP synthase today (unpublished data).

All in all it means that EF2, EF3, and EF1 were already present at the moment of duplication!

Given that EF4 only exists in CP47, we can then argue that this was not present before duplication, and therefore the phenylalanine residues that today get inserted into D2 and interact with YD could not have been in the homodimer. So the D2 and CP47 phenylalanine patch could not have been the ancestral state, as it is of course obvious from everything we discussed in the Geobiology paper and what had been described by Bill Rutherford and Wolfgang Nitschke in the 90s (see references in the paper).

Given that EF3 is found in both CP43 and CP47, and that CP43-E354 is conserved as CP47-E435, and similar for position CP43-357 (CP47-438), and given that they still bind something (manganese/calcium), we can then argue that these residues were also available for metal-binding before duplication.

It is consistent with a homodimer photosystem, with clusters on both sides, and with ligands from the antenna. It also strengthens the notion that the Ca2+-binding site in the homodimeric Type I reaction centre is a real thing, and that the structural divergence of Type I and Type II reaction centres is indeed linked to the evolution of the Mn4CaO5 cluster and water oxidation to oxygen.

What this means you can read here:

Article Photosystem II is a Chimera of Reaction Centers

And here:

Preprint Thinking Twice about the Evolution of Photosynthesis

I think that originally manganese and water oxidation started with the help of a small domain similar to that in heliobacteria. A metal-binding site exposed to the media and soluble ions. Once manganese oxidation and an early version of water oxidation got started, the extrinsic domain in the ancestral protein to CP43 and CP47 then increased in complexity, evolving EF2 and EF3 in a drive to provide proton and water channels, to shield the cluster, and to provide a site of interaction with extrinsic polypeptides.

Then the swap of position of EF3 and the evolution of EF4 in the ancestral CP47 contributed to heterodimerization and the loss of water oxidation in D2.

This happens immediatly after the divergence of Type I and Type II reaction centres LONG before the most recent common ancestor of Cyanobacteria.

Did you know that at the gene level, the N-terminus of the CP43 gene overlaps with the C-terminus of the D2 gene contributing a few additional amino-acids to the latter? This is a trait shared by most cyanobacteria, including the earliest branching, and explains how D2 lost the ligands to the cluster located at the C-terminus.

Beautiful, just beautiful.

Sunday, December 2, 2018

Early Archean origin of Photosystem II: materials for the press office


An integral part of research is outreach and dissemination. I like my papers to be accompanied with a press release, if possible, to make it more visible to the public. Sometimes, what I do is send some materials to the press officer in our faculty and request if a press release can be written on that.

Below you find those materials, which I think could help some interested readers digest some of the information in the paper. This is the official press release from the college: https://www.imperial.ac.uk/news/189232/oxygen-could-have-been-available-life/

This is our recent paper: Early Archean origin of Photosystem II

Summary of the paper

The problem
When or how oxygenic photosynthesis originated remains controversial. Understanding how and when oxygenic photosynthesis emerged is fundamental to understand how life has evolved through the long history of the planet. For example, it is important to understand when oxygen was available to life for the first time. Oxygen permitted the evolution of aerobic respiration, which is the main energetic process that powers most life on Earth and it is essential to sustain the complexity of animals and humans. It is also important to understand the probability of complex life evolving in other solar systems. For example, if oxygenic photosynthesis is a very difficult process to evolve, then the probability of complex life emerging in a distant exoplanet may be very low.

The controversy is the result of the difficulty of unequivocally and unambiguously detecting oxygen in the rock record or figuring out when the first oxygen-producers evolved for the first time.

The older the rocks, the rarer they are, and the harder it is to prove conclusively that any fossil microbes found in these ancient rocks used or produced any amount of oxygen.

Today, the oldest known oxygen-producers are called cyanobacteria. These bacteria became the chloroplast of algae and plants, but all cyanobacteria that we know of use a very sophisticated form of oxygenic photosynthesis. So figuring out when cyanobacteria originated does not really tell us when oxygenic photosynthesis appeared for the first time, but only tells us when a very sophisticated form of oxygenic photosynthesis was already possible.

Therefore, it cannot tell us when oxygenic photosynthesis really got started and what ancestral forms of oxygenic photosynthesis looked like.

What we did
To overcome this difficulties, we studied the evolution of Photosystem II, nature’s solar panels that use the energy of light to break water molecules into its components, protons, electrons, and oxygen. Then, if we can understand when and how Photosystem II evolved the capacity to oxidize water, then we may have a better idea of when and how oxygenic photosynthesis got started, even before there was enough oxygen in the planet to leave a trace in the rock record.

The core of Photosystem II is made of two evolutionarily related proteins: called D1 and D2, which originated from a gene duplication. D1 and D2 are very similar to each other at a structural level but they differ at the basic sequence level, at the amino acid level, or in other words: they look the same but the basic building blocks have changed. Today D1 and D2 share 30% of the amino acid sequence identity. That means that from the approximately 350 building blocks that make D1 and D2, slightly over a hundred are perfectly identical between D1 and D2, but at some point in time they were 100% identical.

Fortunately, the function and structure of Photosystem II has been studied in great detail, so we can tell from what D1 and D2 look like, and from the remaining ~100 identical building blocks, that before the duplication that allowed the evolution of D1 and D2, water oxidation was possible.
Oxygen is a very reactive molecule: that is why it is so important to life because it can drive many chemical reactions that are essential to life. Oxygen can also react with chlorophyll leading to the formation of what is called reactive oxygen species. These reactive forms of oxygen are very toxic to life. So all photosynthetic organisms have evolved mechanism to protect against reactive oxygen species and to prevent oxygen molecules from interacting with chlorophyll. By comparing D1 and D2 we can also tell that before the duplication, the ancestral Photosystem II had already evolved mechanisms to protect against damage caused by oxygen.

What needed to be done now is to find out the span of time between the duplication event (when D1 and D2 were 100% identical) to the ancestor of all cyanobacteria, which inherited a standard sophisticated Photosystem II (when D1 and D2 had left only about 30% identical building blocks).
To do that we need to find out how fast D1 and D2 are changing: that is, the rate of evolution. We can find out using a technique called Bayesian relaxed molecular clock analysis. The method uses the power of statistics and known events in the evolution of photosynthetic organisms from the fossil record to calculate the rates of change.

The results
We found out that D1 and D2 are evolving at a very slow rate. The rate is so slow that it would take about 8 billion years for two identical D1 sequences today to become indistinguishable from each other in the future. For example, we know that the ancestor of flowering plants and most algae is more than 1 billion years old, but if I compare D1 in an algae and D1 in the banana tree, they will be about 87% identical. So in more than 1 billion years of evolution out of approximately 350 building blocks, less than 50 have changed in all plants and algae. If you compare the D1 in all flowering plants, which appeared around the time of the dinosaurs, they’ll be over 98% identical: that is less than 10 changes in more than 130 million years!

It is not strange at all that Photosystem II evolve so slowly: all complex enzymes that can be traced to the earliest forms of life evolve at similar rates. Because they fulfil important functions most changes are likely to result in a worst enzyme than a better enzyme, so most mutations are naturally wiped out. That is why we can tell that all life on Earth originated from a single origin, because many of the enzymes important for function have evolved at a really slow pace so that even after 4 billion years of evolution, they still look the same and work in similar ways in all groups of life.

We found out that because D1 and D2 are evolving so slowly, the span of time between the duplication and the ancestor of cyanobacteria is likely to be over a billion years or more! We cannot tell however with perfect exactitude when the ancestor of cyanobacteria appeared for the first time, but if it existed about 2.5 billion years ago, then the duplication could have easily occurred more than 3.5 billion years ago. The important discovery is that it does not matter when the ancestor of cyanobacteria appeared, because the span of time between the duplication (the dawn of oxygenic photosynthesis) and this ancestor will always be very large.

Another amazing thing we discovered is that even when the span of time is one billion years, the rate of change at the moment of duplication had to be about 40 times greater than the observed rates in the past 2.0 billion years. Forty times the current speed of change is about the limit of what is possible for molecular machines of such level of complexity. In fact, it is already above any measured rate for these kind of complex, highly conserved, molecular machines. Then, knowing that, we can calculate that if this gap of time were to be smaller, the rate at the duplication would have to be faster, and quickly enough the rates would be so large that they would be outside the realms of biology.

Imagine a car going from Paris to Berlin, a journey of about 1000 km, it would take about 10 hours to drive such distance at about 100 km per hour. If we want to arrive in 5 hours, we would need to drive at about twice the speed, but if we want to arrive in 1 hour, we would need to go at 10 times the speed, at almost the speed of sound. Not possible even for the fastest Formula 1 car. It is the same for the speed of evolution.

This is also important because it tells us in a very straightforward manner that evolutionary scenarios in which oxygenic photosynthesis originated very quickly before the ancestor of cyanobacteria can be ruled out with confidence. Even if we don’t know when exactly cyanobacteria originated.

The bigger picture
The main implications of the paper is that oxygen was available to life long before it started to accumulate in the air at about 2.4 billion years ago. This is in agreement with current geological data that suggests that whiffs of oxygen or localized accumulations of oxygen were possible before 3.0 billion years ago.

There has been debates on whether aerobic respiration evolved before or after cyanobacteria, and therefore before or after oxygenic photosynthesis. This is because the enzymes used for aerobic respiration appear to be much older than cyanobacteria. But how can aerobic respiration have evolved before oxygen was available to life? In the absence of oxygenic photosynthesis it is expected that the amount of oxygen available to life would be virtually negligible. So scientist have had to come up with convoluted scenarios to explain this. Our data help understand how this is possible, because oxygenic photosynthesis likely got started long before the ancestor of cyanobacteria. Today oxygenic photosynthesis is only found in cyanobacteria, but our data suggests that it is likely that many other forms of microbes that today do not do photosynthesis may have had old ancestors with the capacity to split-water using light.

In fact, recent data hints to the possibility that oxygen was important for the development of the genetic code, and reconstructions of the genetic capabilities of the earliest forms of life always retrieve enzymes to protect against reactive forms of oxygen, but the latter are usually dismissed as artefacts or anomalies. Our work can help understand how this is actually possible, because the older cyanobacteria is found to be, the more likely it is that oxygenic photosynthesis started at the earliest stages in the history of life and soon after the earliest forms of photosynthesis.

What’s next
We are trying now to bring back to life what the ancestral photosystem before the duplication looked like using a method called Ancestral Sequence Reconstruction. This is a well-established method that allows us to predict the basic building blocks of the ancestral enzyme using the known variation across all extant species. We cannot travel back in time to 3.0 billion years ago, but we can make the ancestral enzyme travel from the distant past into our test tube in the lab today.

Because the enzyme is evolving so slowly its structure has not change too much since its origin, what has changed is the particular building blocks along the different positions of the preserved structure. That makes it very suitable system for Ancestral Sequence Reconstruction, or targeted site-directed mutagenesis, although that does not mean it is easy. Nevertheless, we have now modified strains of cyanobacteria expressing some of the ancestral genes and we will soon attempt to validate our predictions experimentally. This is a three year-project funded by the Leverhulme Trust.

Thursday, November 15, 2018

Answer to Dawn Summer's comments and questions regarding the evolution of oxygenic photosynthesis

Regarding our paper published recently in Geobiology, titled "Early Archean origin of Photosystem II"

I wrote "undescribed assumptions" because usually the papers read really well and describe many of their assumption in ways that are convincing, but results vary significantly. I've identified a couple of things that aren't justified, but I don't know if they are reasonable.
Example: It doesn't make sense to me that molecular evolution rates in chloroplasts should be the same as in free-living cyanobacteria given the significantly different "environmental" contexts, including pigments to absorb damaging radiation. Has anyone looked at this?

You are absolutely right. There are differences in the rates of evolution between chloroplast and cyanobacteria, and overall plastid proteins evolve at a faster rate than those in cyanobacteria. But that is not true for every protein. For example, proteins involved in information processing (e.g. ribosomal proteins, RNA polymerase) are evolving significantly faster in plastids. On the other hand, proteins of bioenergetics and photosynthesis metabolism, like ATP synthase, Rubico large subunit, the core subunits of the photosystems, are evolving at about similar rates in cyanobacteria and plastids.

It has to do with the different evolutionary pressures. The proteins of bioenergetics are under strong purifying selection (slow rates), but those of information processing have undergone periods of positive selection (accelerations of the rates) because they had to be put under the control of the eukaryotic replication/gene expression/translation systems. I don’t know much about it, but I have now been comparing systematically the rates of evolution between a bunch of these proteins. I am trying to establish what is a reasonable time for the emergence of the most recent common ancestor of Cyanobacteria... but of course, not so straight forward.

In our analysis, we used D1. One of the slowest evolving proteins in all life. We found that there is hardly any difference in the overall rates of evolution between D1 in all photosynthetic eukaryotes and in Cyanobacteria. In fact, the G4-D1 that in cyanobacteria is used to do oxygenic photosynthesis with chlorophyll f have experienced faster rates of evolution than those in the chloroplast.

That is why we presented Figure 2 in our paper. To try to show that the rates of evolution of D1 and D2 are quite slow, both in plants and cyanobacteria, and that if it just happens that cyanobacteria are much older than we anticipate, that would imply even slower rates, which then would push the duplication that led to D1 and D2 to even older times.

To give you an idea of how slow D1 and D2 are evolving... They are evolving slower than the alpha and beta subunits of ATP synthase. Alpha and beta originated from a gene duplication event that occurred before the LUCA. D1 and D2 are under tremendous evolutionary pressure, because they bind so many cofactors and they have to be maintained at the right orientations, plus they also interact with a bunch of other subunits, and in addition they have to incorporate protection mechanisms. Therefore, when primary endosymbiosis occurred, this had virtually no effect on the rates of evolution of D1 and D2. Unlike the ribosome for example.

If they do evolve at different rates on average, almost none of the fossil record calibrations will be effective without a deep dive into these variations.

I agree 100%! That is something I am exploring at the moment. In the case of cyanobacteria/chloroplast trees, calibrations have to be placed on either side of the node you are more interested in. That is why timing the most recent common ancestor of cyanobacteria is so difficult. If we only put calibrations on fast evolving branches, then the dates on the slowest evolving uncalibrated clades will be overestimated. On the other hand, if we place calibrations on slower evolving clades, then the rates in those clades that are fast evolving will be underestimated resulting in older calculated ages.

Therefore, when performing a molecular clock it is important to maximize calibrations and to put them strategically. However, the changes in the rates between clades should not be a big problem. The molecular clock algorithms can cope with differences in the rates orders of magnitude apart, believe me, I have tested this. But the only way the software can infer accurate dates, is with the appropriate use of calibrations.

There is no perfect dataset, and there is no perfect molecular clock, but we tried to do the best we can. We tried to model every possible scenario. The point of the paper is not to find out when cyanobacteria originated, but to find out what is the span of time between the duplication leading to D1 and D2, and standard Photosystem II (inherited by all cyanobacteria). And we find that that span of time is likely to be pretty substantial…

Think about this, the origin of ATP synthase (the duplication leading to alpha and beta subunit) does not depend on the age of any particular group of bacteria. Same for Photosystem II, the origin of Photosystem II does not depend on the age of the most recent common ancestor of cyanobacteria, but it depends on when the duplication that led to D1 and D2 occurred. And that photosystem, before the duplication, even if it didn’t oxidize water, was already a pretty special photosystem unlike any of the known anoxygenic ones.

Example 2: Atm O2 was lower pre-late Ediacaran, so there was less O3 & more UV. Even more pre-GOE. And w/ more Fe2+ in seawater, more free radicals are produced from light. How do environmental conditions such as these affect mutation rates? Different in cyanos vs chloroplasts?
Different for organisms living in different environments? E.g. Nostoc in super high light vs new cyanos found living in subsurface? Phormidium living at light limit w/HS-? How do ecological variations feed into long term mutation accumulation?

From the patterns that I have seen, it appears that overall, chloroplast proteins (eukaryotes in general) are evolving faster than cyanobacteria. But as I was mentioning above, the rates of evolution vary a  between proteins. What scientists have tried to do is to measure the background rates of evolution in non-coding regions of the genome, and compare them to the coding regions. The change in the ratio of these rates reflect different evolutionary pressures.

There are no systematic studies of the changes of the rates of evolution across geological time. Your questions are super interesting, and it is something that needs to be explored in more detail.

Have a look at the figure below. That is a comparison of the level of sequence divergence between pairs of cyanobacteria (a measurement of phylogenetic distance). What you see is a total of 703 comparisons. And I am plotting that for RpoB (RNA polymerase subunit B) and for the beta subunit of the ATP synthase. For example, if I compare the level of sequence identity between beta of Nostoc punctiforme with that of Chroococcidiopsis thermalis, they’ll be about 10% different. If I compare against Gloeobacter violaceous it would about 30% different.


The dots in blue are comparing between heterocystous cyanobacteria, and the orange dot is every comparison against Gloeobacter, the earliest branching cyano. There is a big scatter but it follows an overall linear trend, the slope of the trend line is 1.06. It means that RpoB and beta are evolving at pretty much the same rate across the core diversity of cyanobacteria.

The figure also shows that the distance between Gloeobacter and the rest of cyanobacteria is about three times as great as that among heterocystous cyanobacteria. Then if it can be established that the rates of evolution across most cyanobcateria follow approximately uniform patterns we can then be more confident of a time for their most recent common ancestor. We will only need a good fossil to calibrate it all.

Let us assume that we have identified a number of proteins that have evolved at a constant rate across cyanobacteria (say those in the figure). Now, there was a recent paper showing fossil heterocystous cyanobacteria in the Tonian period, did you see it? The lower age is 720 Ma. That would imply that the branch leading to Gloeobacter occurred at about 2.1 Ga. If instead we think that heterocystous cyanobacteria appeared about 1.0 Ga, then that would make the branching of Gloeobacter about 3.0 Ga. Molecular clocks  also behave in a similar way depending of course on the calibration choices.

Example 3: Gene exchange among closely related organisms, including via viruses. Is it possible that D1 G4 (and assoc genes) evolved in one sp of cyanos, was better, and was transferred to a bunch of others post GOE with those who didn't get the transfer dying out?

What I found out in my study of the evolution of D1, is that G4 is found in all Cyanobacteria, see Figure 1 of our paper. And when you focus on G4 only, it appears to follow a species tree of cyanobacteria, bear in mind that even D1 G4 have duplicated several times (e.g. low-light vs high-light forms, the one in the far-red light gene cluster). Nevertheless, it seems as if at least G4 had mostly been inherited vertically. That is not to say that horizontal gene transfer has not occurred, it certainly has occurred, but I don’t think to such an extent that it would dominate the topology of the tree.

Because of that, then we also concluded that the atypical D1 forms branched out before the most recent common ancestor of Cyanobacteria, including the so-called microaerobic forms.

I do think that a post-GOE ancestor of cyanobacteria is likely an artefact resulting from an overestimation of the rates of evolution, and I think there are a number of reasons for this. It turns out however that D1 and D2 are very susceptible to that because they are so slowly evolving. That is why we focused on the concept of delta-T instead.

We did not focus on trying to figure out if cyanobacteria occurred after or before the GOE, but on the span of time between the duplication leading to D1 and D2, and standard PSII. We concluded therefore that regardless of the exact timing for the MRCA of cyanobacteria, delta-T will always be very large (1.0 billion years). We also found out that if delta-T is made to be smaller, the rates of evolution will increase beyond what is likely for these type of proteins, and quickly enough beyond what is possible for any kind of protein.

So if the MRCA of cyanobacteria is found to be 2.5 Ga old, I think it would be reasonable to assume that the duplication leading to D1 and D2 occurred about 3.5 Ga... see what I mean?

In any case, I think that most of the diversity of oxygenic phototrophs that have ever existed actually predated the MRCA of cyanobacteria. That does not mean that such diversity had to be abundant or globally distributed though.

Or being present only in environments where they can compete with relatively ineffective D1s?
I'm not saying I think these necessarily happened. It just leaves me with the feeling that we are missing something really big and important in our assumptions.

I agree. Think about this:

There are three gene duplication events that are exclusive to oxygenic photosynthesis. D1 and D2, the core of PSII. CP43 and CP47, the core antenna of PSII. And PsaA and PsaB, the core of Photosystem I.

All cyanobacteria today have a form of oxygenic photosynthesis that have remained basically unchanged from Gloeobacter to avocados. In fact, most of the sequence change in the evolution of Photosystem II and Photosystem I that has ever occurred in the history of life, happened before the MRCA of cyanobacteria. From the moment those key duplications occurred countless forms of oxygenic phototrophic bacteria should have appeared spanning all of those changes that are not accounted for in the known diversity. And given that these enzymes are some of the slowest evolving enzymes we know of, the roots of oxygenic photosynthesis are likely placed deep in time... early Archean deep. We are oblivious to such huge diversity. By the time cyanobacteria enters the scene, when Gloeobacter split from the rest, oxygenic photosynthesis had already reached a pretty sophisticated stage.

So yeah, we are missing so much, in fact, we’re probably missing most of it.