“Symptom/Metabolome Directed Genomics for ME/CFS” by Neil McGregor (written transcription)

This is a loose transcription of Neil McGregor’s presentation at Open Medicine Foundation’s Community Symposium, “Molecular Basis of ME/CFS” in August 2017. Some additional notes have been added.

This research has been undertaken by the Melbourne Bioanalytics research group, which consists of both scientists and clinicians. Many of the scientists are associated with the Bio21 Institute in Melbourne. The clinicians are the most valuable part of our group, as they ensure that we have a patient cohort, that is appropriately diagnosed, for us to study.

In my 25 years of researching ME/CFS, I’ve noticed that people with ME/CFS don’t all have the same outcomes. They may have similar onsets, but factor analysis of patient data reveals different clusters: for example, a depressive cluster, a pain cluster, a sensory cluster (eg: heightened taste, smells). What this suggests is that we are looking at a heterogenous event, not a single entity. Analysing our symptom/metabolome data revealed some clusters, (eg: fatigue cluster, a pain cluster). If we analyse fatigue, we find clusters of other symptoms which go along with that. In our sample, 50% of patients have pain syndromes/Fibromyalgia, but not the other 50%. A cluster of patients are cognitively impaired, but others don’t have that symptom.

So what we really want to know is why are there these differences? If there is a similar onset, but different outcomes, that can only be explained by a different genetic response, or some change to pathogen or other epigenetic-type issues.

Using statistical analysis, we looked at how patients rated their fatigue, and compared it with their metabolome (metabolites present in the body). We found that elevated glucose was most prominently associated with fatigue. We took our samples first thing in morning, which might explain why our finding is different to some others. Perhaps if we had taken our samples postprandial (after meals), we might have seen some variation to this, because glucose levels will be impacted by eating food, which will be reflected in the metabolome.

So, we found that fatigue is related to a glycolytic change, and predominantly related to the glucose level in the serum. But for the patients with Fibromyalgia-type pain syndromes, again we looked at the amount of widespread pain they have, and compared that with their metabolome, and we found that it is quite a different event metabolically. In patients with pain, there’s a change in kidney function. When the pain is worse, they have lost a lot of amino acids and electrolytes through the urine. One of the problems with working with ME/CFS, is that we are only seeing patients after at least 6 months of symptoms (as per the definitions). But in other pain syndromes, when we have looked at data closer to the time of onset, we can see these changes quite clearly. When you put them together, you can start to understand the relationship between the pain symptoms and metabolic changes.

We can see metabolic changes which relate to symptom clusters. Now, we have turned to genetics, to see if there are some genetic factors which may help identify some of these clusters.

We are currently undertaking a pilot study, consisting of 56 patients and 32 controls. The data was analysed using a DNA chip (microarray) which searches for changes in general chemistry (there are many different types of DNA chips, which search for different things, and which will result in different datasets). We ended up with an initial set of 316 snps* which were different between the two groups. Of these, 162 were in known genes, or protein production; 49 snps were in RNA genes, which regulate how the DNA is transcribed in a cell. The remaining 142 were potentially junk DNA, according to the old definition, though we now know that some of this potentially junk DNA actually does things: it promotes or enhances the function of other genes, when those genes are activated.

* There are four nucleotides that form the basis of DNA and RNA: guanine (G), cytosine (C), adenine (A) and thymine (A). A snp (single nucleotide polymorphism) is a variation in a gene, in which one nucleotide has been changed. Some snps may underlie predisposition to certain illnesses, though many snps occur without any discernible impact on the individual.

One issue with these sorts of studies is that, because the control groups are so small, we need to make sure that they aren’t biased, but reflect the general population. To do that, we compared our data to the 1,000 genome data set. We eliminated any of our identified snps where our control group didn’t match the general population, and we also eliminated any snps where our ME/CFS group did match the general population. After doing that, we were left with 111 snps which differed between the two groups.

Next we eliminated any snps which had an odds ratio of less than two. (An odds ratio of 2 means that the snp is twice a likely to be present in the ME/CFS group than the controls.) An odds ratio less than 2 means that they are probably not as important. We can go back and look at these again later but, for now, we wanted to make sure we were not biased in the beginning.

At the end, we were left with 38 genes, which isn’t a lot, out of the huge number which we started with.

Once we had identified these 38 genes, we then wanted to see if there was a relationship between them. To do this, we used factor analysis (which is a statistical process which is used to identify underlying clusters within data). We found that there were at least 7 major clusters within the data. The first cluster was an anomaly in G-protein coupled receptor protein. The 2nd and 3rd clusters also had the anomaly in the G-protein couple receptor protein, in combination with variations in a couple of additional genes. These other genes are probably amplifiers or modifiers of other genes, and we think they may be related to symptom expression. The 4th, 5th, 6th, & 7th clusters all involve RNA helicases. RNA helicases remove viral and bacterial RNA, and our host RNA, from the cell. These four clusters which had anomalies in RNA helicases, also had anomalies in some additional genes.

When we looked at some of the individual snps associated with these clusters, we found quite strong differences between ME/CFS and controls. The snps were 5 or 6 times more likely to be present in the ME/CFS group than in the control group. You can see from the table, that some of them were present in heterozygotes (meaning the individual only had one copy of the snp), which suggests that it may be an autosomal dominant gene. Others (like the Langerin snps) were present in homozygotes (meaning the individual had two copies), which means that they’re more likely to be autosomal recessive.

The next thing we did was we took all the mutations within the genes that had low prevalence within the community, and looked at the number of snps on these genes that each of our participants had. Again, we found a striking contrast between the ME/CFS and control groups in the number of snps (variations) on these genes. We found that Langerin snps were 3-fold more likely in ME/CFS patients than controls. G-protein snps were 21.5-fold more likely and RNA helicase snps were 12.6-fold more likely in ME/CFS than in controls. Scientists are usually pretty happy with odds ratios of 2, so these are very large odds ratios.

Looking broadly at the functions which some of these snps influence:

G-proteins function as molecular switches within the cell, and consist of the 3 components (alpha, beta and gamma subunits). There are multiples of each type of subunit in each G protein. The function of each particular G protein depends on the number, and combination, of subunits it has.

In the inactive state, G proteins consist of the 3 subunits (alpha, beta, and gamma) plus one molecule of GDP on the alpha subunit. In this inactive state, G proteins are bound to the G-protein coupled receptor (GPCR) and the cell membrane (State 1). Once activated by cell signalling, the G protein releases GDP, binds GTP in its place (State 2) and separates from the GPCR receptor. The binding of GTP to the alpha subunit triggers the G-protein to separate into two components: the alpha (with the GTP molecule) and the beta/gamma unit, which are both active (State 3). These two activated units then go and perform separate functions in the cell.

G-proteins are involved in a wide variety of processes in the body, many of which involve common symptoms in people with ME/CFS. Looking at the table on this slide, you can see the percentage ME/CFS patients in our sample who reported these symptoms. We think that some of these G proteins are critically important in symptom expression in ME/CFS.

RNA helicases are enzymes which remove double-stranded DNA, viral RNA and host RNA out of the cytoplasm. They are inhibited by viruses, and could potentially be the major reason for viral triggers in ME/CFS because, if there is a problem with RNA helicases, the body will have difficulty removing the viral RNA from the cell.

Dendritic cells are found on those parts of the body which have contact with the external environment (skin, and mucosa like nose, throat, gut). They serve to alert the immune system to the presence of pathogens. The CD207 gene codes for Langerin, which is a protein found in dendritic cells. If there is a problem with the CD207 gene, then the body will likely have problems detecting certain viruses.

From HIV studies, we know that when Langerin is inhibited, patients have greater viral loads, and more infections in the body.

When we look at the viruses which are transported or acquired by this receptor, we find many viruses which are associated with ME/CFS.

We also did an analysis on the mitochondrial DNA (mtDNA). We found one mutation which indicated Haplogroup* H5. When we cross-checked the mutation to see whether it was a mitochondrial anomaly, we found that all of those individuals with the mtDNA mutation, were also carriers of the Langerin mutation set. So we think that the mtDNA mutation may be an indicator of a line of transmission for that particular gene group.

* Mitochondrial DNA (mtDNA) is located in the mitochondria. Because it is inherited from the mother (and not combined with genes from the father), it remains largely unchanged for many generations. It is sometimes known as the “Eve gene”. This matrilineal line has allowed geneticists to trace something of a human family tree through the history human migration. Haplogroups are used to indicate where the family tree has branched. Haplogroups are signified with a letter and number.

In summary, by using a slightly different approach, we have been able to identify a number of potential genes, which factor analysis grouped into 7 clusters. We have found gene clusters related to G-coupled protein, gene clusters related to RNA Helicases, and a gene cluster related to Langerin.

This is preliminary data. Clearly, this data may still be nothing. But we have to make sure that this makes sense, and that it’s not “no-sense”. Thank you.