DNA/protein function finder from the Wellcome Trust, Sanger Institute, emblebi and YourGenome

When is a rare disease not a rare disease? The answer: when big data gets involved. An ambitious new research project aims to show patients that they are not alone.

We might essentially generate a ‘dating agency’ to try to match our patient with a similar case somewhere else in the world

Lucy Raymond

At some point in their career, every doctor will encounter a patient whose condition perplexes them, requiring detailed investigation and discussion with colleagues before diagnosis is possible. After all, not every disease is as common as cancer, which affects around one in three of us, or depression, which affects one in 10.

Dr Lucy Raymond from the Department of Medical Genetics specialises in rare diseases. Technically, this means diseases that affect fewer than one in 2,000 people, but in fact, Raymond sees children with learning disabilities so rare that they may be the only person in the UK to be affected.

These conditions are usually caused by one of two scenarios: a spontaneous change to their DNA, not inherited, or a ‘recessive disorder’ where two copies of the same, rare variant are necessary for the disease and each parent unwittingly passes on a copy. Comparing the child’s and their parents’ genomes enables the researchers to pinpoint the gene responsible. In extremely rare cases – where the patient appears to be truly unique – the researchers need to study whether the same variant in mice or zebrafish creates a similar condition.

“Or,” Raymond explains, “we might essentially generate a ‘dating agency’ to try to match our patient with a similar case somewhere else in the world.” With these diseases as rare as they are, the only way for this to be viable would be to have access to tens, possibly hundreds, of thousands of potential matches: something the era of ‘big data’ makes possible.

But this presents a potential problem: how to share information about the patient without breaking their confidentiality. Unlike in the USA, where projects such as the Broad Institute’s Exome Aggregation Consortium (ExAC) place genome data in the public domain, data in the UK is deposited in a ‘managed-access’ database: bona fide researchers with a clear research proposal are allowed access, and only then after signing a commitment saying they will not attempt to identify individual patients.

“We have to remember that big data is great, but it isn’t our data: it’s people’s data and we need to be respectful of this. People in the UK are often altruistic; we have free blood donation, we have a tremendous tradition of patients giving to help others. We must not jeopardise this relationship.

“Parents know that even if finding the gene abnormality that is responsible will not immediately help their child, it may help ensure that others don’t have to wait 20 years before their child receives a diagnosis. They’re happy to share the data on that basis, but are less keen on the idea that they’ll lose control of the information.”

For several years, Raymond, Professor Willem Ouwehand and Dr John Bradley have been leading the National Institute for Health Research BioResource for Rare Diseases in Cambridge, which has recruited some 5,800 patients. They are now part of a major initiative launched by Prime Minister David Cameron: the 100,000 Genomes Project. Cambridge University Hospitals NHS Foundation Trust will lead the East of England Genomic Medicine Centre, one of 11 centres across the UK aimed at realising this project and sequencing the genomes of patients affected by cancer or rare diseases.

“The 100,000 Genomes Project is about going forward to having a truly national health service, not a provincial, regional health service,” explains Raymond. “The data will be central, will be national, will be available to researchers and healthcare professionals across the country.”

The sheer number of people recruited will create a powerful dataset and ensure that clinicians and researchers don’t have to start from scratch each time they encounter a new case. In fact, the value of a patient’s genome extends beyond just helping identify the cause of their disease: it’s also important as a ‘control’ to compare against and help find the cause of another patient’s disease. “It’s a form of ‘enforced altruism’. Having all the data stored in a central place means that everybody’s data acts as a control for everybody else’s. It has a multiplying effect.”

Big data also reveals an otherwise glaringly obvious fact that the name ‘rare diseases’ obscures: one in 2,000, even in a population of 64 million, is not an insignificant number of people. “Ten years ago people used to ask ‘Why study rare diseases when they’re so rare?’ It’s only recently that people are coming round to see that, with big data, rare is common.

“Rare diseases are becoming increasingly tractable, too, so now there’s a huge interest in them, which is good: it’s not your fault if your disease is rare. Solving these problems is the next big challenge,” says Raymond with a glint in her eye. “If it was all easy, we wouldn’t be doing it – in typical Cambridge style.”


Trust me, I’m an e-doctor

Big data ‘dating agencies’ are not just for people with rare conditions. A similar concept could help patients with far more common conditions receive the best possible hospital treatment.

Addenbrooke’s Hospital in Cambridge is one of the first ‘eHospitals’ in England, explains Dr Lydia Drumright from the Department of Medicine. Everything that happens to you within the hospital – every test result, every diagnosis, every drug prescribed – is captured in an electronic record. Drumright and her colleague Dr Afzal Chaudhry believe that the wealth of information in these records can be used to better inform the treatments of individuals.

“Around 10–20% of our patients may have diabetes or acute kidney injury, but that’s not necessarily why they’re here,” explains Drumright. “They might have had a heart attack, so they’re being cared for by the cardiology team, but the drugs they’re prescribed might have an impact on their other conditions. Added to that, they’re now more susceptible to infection.

“It’s the junior doctors that have to look after the patients and do the basic prescribing. They’re still learning, but need to know which drugs work best and the hospital’s policy for prescribing antibiotics.”

Could a patient ‘dating agency’ not dissimilar to that suggested by Raymond, based on everyone’s medical records, help these junior doctors? “The doctor can search for other patients that look like their own. They can go back historically and see what drugs were prescribed and what their outcomes looked like.”

Drumright is mindful of setting up a system that tells doctors what to prescribe; the literature about how we interface with technology suggests that people can too easily surrender their responsibility. Instead, it’s about building on collective knowledge, “What we’re trying to do is enhance the doctor’s experience so that it’s not ‘my experience as me’, it’s the experience of every prescriber in the hospital.”


Creative Commons License
The text in this work is licensed under a Creative Commons Attribution 4.0 International License. For image use please see separate credits above.