Big data – Where to start!
Old Members Trust Travel Grant Report – Heather Jeffery
“Big data” is a commonly used phrase in the media, but what is its relevance in scientific research? With the help of University College Old Members’ Trust Graduate Conference and Academic Travel Grant, I went to a ‘Big Data in Health Research’ summer school at Utrecht University, Holland, to find out.
Our first lecture was a stimulating yet brief introduction to the concept of big data in clinical research and what forms it could take followed by a brief introduction to the legal aspects surrounding big data. Here, we discussed the issues surrounding anonymity, privacy and data ownership. It appeared to be a constant balancing act: the more anonymous the data is the more privacy the patient has, but the less useful the data is for the researcher. For example, an ID is required for the integration of databases, which would allow research into the effects of environmental risk factors on health. Similarly, the age of the patient is important for comparing age ranges which is often informative due to the highly variable prevalence of certain diseases with age.
The technical aspects of dealing with such complex datasets were covered in a day devoted to machine learning; particularly supervised learning algorithms. In addition to learning the theory, we also had the opportunity to develop our own machine learning pipelines in groups. This involved using the software, Weka, to formulate and answer research questions using real patient datasets. My group’s project tested how well a computer could predict patient survival 1 year after thoracic surgery of primary lung cancer. It turned out that this wasn’t very effective, and misclassification occurred very often, however for other datasets, where the risk factors measured were a good indication of disease, it worked extremely well. It was also interesting to learn how people think differently as groups working on the same dataset all selected different classifiers as the method of analysis, and posed slightly different research questions. Therefore, in the minefield of big data there is no single correct analysis. Naturally, we could only skim the surface of machine learning in a day but it has provided me with a strong starting point for applying these techniques in my DPhil studies at the University of Oxford where I will use them to classify protein binding sites on DNA.
Have you ever considered displaying your data in the form of tropical fish? I hadn’t until a talk by Dr Jason Moore, Director of the Institute of Biomedical Informatics at the University of Pennsylvania, introduced the concept of integrating advances in virtual reality and gaming with analysis of biological datasets. As humans, the sense of touch and the feeling of being surrounded by your data activates different parts of the brain that may help us to think about our data in novel ways, and to develop more interactive ways to communicate it.
Several clinicians gave insightful talks on how big data is important in their field, either anaesthesia, cardiovascular disease or associations between environmental factors and disease on a national or worldwide scale. The availability of a wealth of data from many sources, or over time, provides many opportunities to gain new knowledge, but as the complexity increases, so does the computational burden. After analysis, it is important to consider how much confidence can be given to the result; with the difficulty of singling out various factors and identifying correlation versus causation.
It was a busy, but very enjoyable, week filled with learning in both formal and informal settings. The course was a fantastic springboard which has inspired me to delve deeper into machine learning. Furthermore, it was the perfect opportunity to meet like-minded researchers, many of whom I hope to keep in contact with, from a variety of fields, and a wide variety of countries, including America, Australia, Honduras, Italy, Denmark, Hungary and Holland. I would like to say thank you to the Biochemical Society and University College, Oxford for funding me, in addition to Dr Rolf H.H. Groenwold for organising the event.
Find out more about the range of travel grants and scholarships available to assist Univ students on our Travel Grants page or read further travel reports.
Published: 18 August 2017
Explore Univ on social media