UKRI Centre for Doctoral Training in Artificial Intelligence, Machine Learning & Advanced Computing


Research projects

Our doctoral training programme is constructed around three research themes:
  • T1: data from large science facilities (particle physics, astronomy, cosmology)
  • T2: biological, health and clinical sciences (medical imaging, electronic health records, bioinformatics)
  • T3: novel mathematical, physical, and computer science approaches (data, hardware, software, algorithms)
Research projects are placed in one of the three themes. The CDT encourages in particular the development of synergies between the themes, via the sharing of common methods and a interdisciplinary supervisory team. Not all themes are available at all of the partner universities.

A sample of research projects is given further down on this page.


Research contacts

In order to discuss a PhD position/project at one of the partner universities, please contact:

T1: data from large science facilities

T2: biological, health and clinical sciences

T3: novel mathematical, physical, and computer science approaches


Example research projects, organised by host university, 2019 cohort

Aberystwyth University

Project title: Big Data algorithmics for efficient search and analysis of large collections of genomes

1st supervisor: Dr Amanda Clare
2nd supervisor: TBD
Department/Institution: Department of Computer Science, Aberystwyth University
Research theme: T2 - biological, health and clinical sciences

Project description: This project addresses novel approaches to the analysis of big data applied to genomic health challenges. Comparative studies of microbiome data yield advances in understanding of underlying causes and consequences of diseases (examples include Crohn's, Parkinson's and IBS). The computational resources required to analyse microbiomic data demands more efficient algorithms for search, matching and compression. The proposed research is focussed around advanced algorithmics and versatile data structures in stringology contributing to this need, including the Burrows-Wheeler Transform, suffix arrays and the Lyndon factorization. The aim of the project is to investigate innovative algorithmic approaches including various alphabet and string ordering methods, divide and conquer techniques, and bio-inspired genetic search operators for optimization. Artificial intelligence approaches including machine learning will be applied to search for and optimize results. Sequential and parallel implementations will be explored. Computational efficiency will be evaluated theoretically and experimentally using Supercomputing Wales HPC facilities and publicly available metagenome data sets. This will be a highly interdisciplinary project at the crossroads of computer science, biology, data science and informatics, thus affording transferable skills to related domains.


Project title: Modelling the Development of Breast Cancer Abnormalities

1st supervisor: Prof Reyer Zwiggelaar
2nd supervisor: TBD
Department/Institution: Department of Computer Science, Aberystwyth University
Research theme: T2 - biological, health and clinical sciences T3 - novel mathematical, physical and computer science approaches

Project description: Breast cancer is the most occurring cancer in women worldwide, with about 8% of women developing breast cancer in their life. Early detection through screening programmes have shown to be beneficial and computer aided diagnosis (CAD) is starting to play a more significant in this. However, both breast screening experts and CAD miss potential abnormalities, especially at an early stage, and histology is needed to determine malignancy/treatment. This project will have three aspects. In the first instance it will develop a model of the development of various types of breast abnormalities (e.g. using local clustering techniques). For this a very large set of examples will be used, which will be obtained from our clinical collaborators. This will take the morphology of abnormalities (and the associated histology) into account and will order them on their developmental stage. Morphology can be represented by traditional hand-crafted features, by deep learning based features, and/or other novel approaches such as evolutionary algorithms, random projection forests, or graph matching. Secondly, directly linked to the cancer cases used above, we will extend the modelling to include pre-cancer cases, which will be based on previous screening rounds of the abnormalities. This will concentrate on the mammographic morphology. It will be of interest to investigate what mammographic morphology leads to specific mammographic abnormalities. Finally, manifold modelling will be used to develop (using techniques such as learning without forgetting) an overall model of breast cancer development, which is expected to provide pathways from normal tissue to a range of mammographic abnormalities. This will provide a model, which can be used for unseen abnormalities to estimate how these might develop over time, but will also provide a probability for mammographic abnormality development based on normal (pre-cancerous) tissue. The former can be used to contribute to treatment decisions, whilst the latter could inform individual screening intervals.


Project title: An Intelligent Bee-Inspired Framework for Data Collection and Management in Health Care

1st supervisor: Dr Alexandros Giagkos
2nd supervisor: TBD
Department/Institution: Department of Computer Science, Aberystwyth University
Research theme: T2 - biological, health and clinical sciences T3 - novel mathematical, physical and computer science approaches

Project description: A bee colony, numbering 20,000 or so workers, operates as a thoroughly integrated unit in gathering its food. Bees work in a cooperative manner; they play different roles while exercising their individual capabilities to contribute to the overall benefit and well-being of the hive. Studies on bee ethology provide detailed documentation on the division of labour, methods to acquire information about food, and strategies to optimise the coordination of collecting and processing different commodities (i.e., water, pollen and nectar). Models that replicate the hive's underlying behaviours not only offer a better understanding of the colony in a biological point of view but also they support the investigation of new multi-agent optimisation techniques applicable to a variety of NP-hard decision-making problems. The aim of this research is to explore the dynamics, interactions and decision-making processes in the hive, and to use the findings as metaphors to propose a new artificial intelligence framework for multi-agent systems (MAS). The framework will be applied to and evaluated in the domain of distributed Health Care, where the ability to constantly monitor patients in remote distances and to timely and reliably transfer medical data is of paramount importance. Wireless Body Area Networks and Wireless Sensor Networks are integrated to form the network infrastructure by which medical data is collected, accumulated and processed centrally. A bee inspired MAS is seen as the employed low-level system responsible for ensuring the adaptive discovery of optimised communication paths, which can reliably facilitate i) the collection of data from multiple sources and ii) its delivery to appropriate destinations. Inspired by bees, the proposed MAS will support Quality of Service and application-driven requirements. For instance, doctors may want to prioritise the data from certain sensors with respect to changes in the condition of a patient, or it may be necessary to temporarily accumulate data in certain sinks to save power before it is ultimately delivered to the destination. The effectiveness of the proposed bee-inspired MAS depends on the underlying machine learning mechanisms to adaptively acquire network parameters, build performance-related data and utilise it to detect transmission anomalies, and to predict impediments that impose negative effects on medical data delivery over time.


Project title: Prediction of facial growth for children with cleft lip and palate using 3D data mining and machine learning

1st supervisor: Dr Richard Jensen
2nd supervisor: TBD
Department/Institution: Department of Computer Science, Aberystwyth University
Research theme: T2 - biological, health and clinical sciences T3 - novel mathematical, physical and computer science approaches

Project description: Approximately 150 children are born in England and Wales each year with complete unilateral cleft lip and palate (cUCLP). Despite improvements in clinical outcomes in the UK over the past 15 years, between 20-25% children with cUCLP have poor facial growth compared to 3% of the non-cleft Caucasian population. Poor facial growth results in poor aesthetic appearance and poor dental occlusion which can negatively impact on a child's psychosocial development with long-lasting effects. It is not clear why only some children with cUCLP have poor growth, nor why facial growth outcomes vary between surgeons and centres. A number of explanations have been advanced including extrinsic factors such as poor surgery in cleft palate repair during infancy, surgical technique and timing, and intrinsic factors such as the congenital absence of the upper lateral incisor, or the shape of the infants' upper arch, indicating a genetic cause. The relationship of the upper dental arch to the lower arch reflects mid-face growth and can be assessed as early as 5 years using the 5 year index. Children with cleft lip and palate in the UK have been treated in regional specialist centres since 2000 and facial growth is routinely assessed between the ages of 5 and 6 years in this way. It is also routine for cleft centres to take and keep a dental model of the upper arch of infants with complete UCLP before they have any surgery. This project would involve the development of techniques for both 3D data mining and machine learning for the scanned models of infants with cUCLP, in order to determine which features are most predictive of facial growth outcome and if a predictive model can be learned. The maxillary arch models taken from infants prior to their first surgical procedure will be used along with the 5 year index score to develop models via machine learning and identify important regions. In particular, the identification of an intrinsic neonatal arch shape that is predictive of detrimental facial growth would give an opportunity to explain prognosis and manage expectations more easily with parents. It would also facilitate research on the development of new techniques for earlier treatment of poor facial growth and more personalised care for individual patients.


Project title: Principled Application of Evolutionary Algorithms

1st supervisor: Dr Christine Zarges
2nd supervisor: TBD
Department/Institution: Department of Computer Science, Aberystwyth University
Research theme: T3 - novel mathematical, physical and computer science approaches

Project description: Evolutionary algorithms are general and robust problem solvers that are inspired by the concept of natural evolution. Over the last decades, they have successfully been applied to a wide range of optimisation and learning tasks in real-world applications. Recently, some researchers [1] argue that evolutionary computation now has the potential to become more powerful than deep learning: While deep learning focuses on models of existing knowledge, evolutionary computation has the additional ability to discover new knowledge by creating novel and sometimes even surprising solutions through massive exploration of the search space. This project will build upon recent momentum and progress in both, theory and applications of evolutionary algorithms and related randomised search heuristics. While such heuristics are often easy to implement and apply, in order to achieve good performance, it is usually necessary to adjust them to the problem at hand. Thus, the main goal of the project is to provide mathematically founded insights into the working principles of different randomised search heuristics to improve their applicability. This will include the development of novel mathematical approaches to facilitate their analysis as well as the development of new randomised search heuristics in a principled, theory-driven way. Interdisciplinary collaboration and the involvement of industry partners will support recent efforts to bridge the gap between theory and practice in this important research area [2].
[1] Sentinent Labs. Evolution is the new deep learning. https://www.sentient.ai/labs/ea/
[2] COST ACTION CA15140: http://imappnio.dcs.aber.ac.uk


Bangor University

Project title: Classification of Wide Data and Weakly Supervised Data

1st supervisor: Prof Lucy Kuncheva
2nd supervisor: TBD
Department/Institution: School of Computer Science and Electronic Engineering, Bangor University
Research theme: T3 - novel mathematical, physical and computer science approaches

Project description: Modern data confronts the analyst with severe challenges. In addition to coming in massive volumes, data can be streaming, drifting, partially labelled, multi-labelled, contaminated, imbalanced, wide, and so on. Rightfully held in high esteem, deep learning has stepped in to address some of the challenges. However, majority of these challenges are beyond the reach of this approach. For example, wide data sets are characterised by a very small sample size and an exceedingly large dimensionality. This type of data is unsuitable for deep learning which requires thousands, even millions of labelled training examples. This project will seek to develop novel and effective solutions for challenges where standard approaches are insufficient. We will aspire to offer theoretical grounds for those solutions to ensure transferability across application domains. Our focus will be on wide data and weakly supervised data. Examples of weak supervision are semi-supervised learning, transductive learning, and restricted set classification. A curious possible application of both areas of interest is identification of individual animals in a herd or group for the purposes of non-invasive monitoring. Experiments will be carried out to verify our hypotheses (subject to data availability).


Project title: Artificial Intelligence for Immersive Analytics

1st supervisor: Dr Panos Ritsos
2nd supervisor: Prof Jonathan Roberts
Department/Institution: School of Computer Science and Electronic Engineering, Bangor University
Research theme: T3 - novel mathematical, physical and computer science approaches

Project description: We are gradually being immersed in a technology-mediated world, within which the omni-presence of data introduces increased needs in cognition, reasoning and sensemaking mechanisms. Such mechanisms become increasingly dependent on the synergy of artificial intelligence, human-data interface evolution and human cognition. This doctoral research builds upon this notion and explores the application of Artificial Intelligence (AI) in Immersive Analytics (IA), within Virtual Reality Mixed Reality (collectively denoted Extended Reality - XR) environments. This project is concerned with the design, development, application and evaluation of AI-driven mechanisms and models for context-aware, context-adaptive and predictive interfaces for IA. It seeks to create novel analytical interfaces that: a) think of, b) learn about and c) adapt to a user's exploration habits, requirements and sensemaking activities.


Project title: A grammatical approach to neuroevolution

1st supervisor: Dr Bill Teahan
2nd supervisor: Dr Franck Vidal
Department/Institution: School of Computer Science and Electronic Engineering, Bangor University
Research theme: T3 - novel mathematical, physical and computer science approaches

Project description: Neuroevolution is a form of AI that uses evolutionary algorithms to generate some form of artificial neural network. Grammatical evolution is a novel approach to genetic programming that uses a context-free grammar to constrain the search space for genetic programming and has the ability to evolve executable programs for any programming language. The goal of this project is to develop novel solutions to neuroevolution by building on research into grammatical evolution conducted at Bangor University. In particular, the project will aim to overcome some of the limitations of both artificial neural networks and evolutionary programming (specifically grammatical evolution). For example, for artificial neural networks two limitations include the fixed nature of the neural network (e.g. a fixed number of layers or inputs / outputs) and their "black-box" nature which makes interpretation of the hidden layers very difficult. For grammatical evolution, a major limitation is the need for a fitness function, the need to define many parameters (e.g. mutation and crossover probabilities), the repeated execution of the same components in the different solutions and the lack of learning which components help to produce good solutions. In order to overcome these limitations, this project will explore how to evolve a neural network that produces effective performance on standard classification tasks. The evolved network will take the form of a subsumption architecture and therefore this work will also draw on work from the field of evolutionary robotics involving the evolution of this type of architecture.


Project title: Smart storytelling for scientific data visualisation

1st supervisor: Prof Jonathan Roberts
2nd supervisor: Dr Panos Ritsos
Department/Institution: School of Computer Science and Electronic Engineering, Bangor University
Research theme: T3 - novel mathematical, physical and computer science approaches

Project description: Displaying scientific data is often difficult. Showing the most important results, highlighting the most significant correlations, and displaying it in an effective way can be challenging for any researcher. When teaching best practice, educators often use "good examples", and so there are many good examples that a the computer could learn from, and there are also many "good" design principles have been created that could guide the user in effective storytelling. With AI and machine learning, this research will focus on automatic design and layout of scientific visualisations results, learning from current exemplar research, it will research algorithms, metrics and methods to help researchers tell effective stories with their data.


University of Bristol

Project title: Multi-channel waveform reconstruction for dark matter searches with LUX-ZEPLIN

1st supervisor: Dr Henning Flaecher
Department/Institution: Particle Physics, University of Bristol
2nd supervisor: Prof Stephen Fairhurst (Cardiff University)
Research theme: T1 - data from large science facilities

Project description: LUX-ZEPLIN (LZ) is a next generation direct dark matter detection experiment that will start collecting data in the first half of 2020. It will search for evidence of dark matter particles scattering off xenon nuclei, which results in ionization and scintillation signals that can be recorded with photomultiplier tubes (PMTs). Each PMT produces waveform data: a sampled, time-series representation of the response for individual detector channels. Typical particle physics analysis pipelines proceed by first summarizing each individual waveform (e.g., as the area under a pulse, time above a threshold, etc.) thereby reducing the quantity of information significantly. Subsequent stages of the analysis then operate on these reduced quantities whilst combining multiple channels together in order to reconstruct the underlying event that occurred within the entire detector. The goal of this project is to develop a more generic approach, that connects the final physics analysis directly to the original waveform data, using modern high-performance computing and deep learning techniques, including the combination of convolutional neural networks (CNNs) and Graph Convolution Neural Networks (GNNs) to produce a new method to reconstruct underlying event parameters based on the raw waveform inputs. While the project is to be carried out within the context of dark matter searches at LZ, the general method should also be of interest and applicable at other particle physics experiments (e.g., DUNE or Mu3e) and possibly beyond, for any network of connected devices.


Project title: FPGA Implementation of Machine Learning for Big Science

1st supervisor: Dr Jim Brooke
2nd supervisor: TBD
Department/Institution: Particle Physics, University of Bristol
Research theme: T1 - data from large science facilities

Project description: This project will study the implementation of machine learning algorithms in programmable logic technology, for potential applications in experimental particle and astrophysics. Experiments in these science areas produce increasingly large volumes of data, with increasingly sophisticated online data acquisition systems, that are often required to perform fast, low-latency, online processing to properly handle and store the data. Machine learning algorithms have thus far been generally restricted to extracting information in offline data analysis. However, the implementation of machine learning algorithms in Field Programmable Gate Array (FPGA) technology may bring high performance image recognition and classification to online data acquisition systems, thereby extending the reach of the next generation of particle and astrophysics experiments. The implementation of ML algorithms in FPGA logic will be studied, together with near-term applications in the CMS experiment at the LHC, and the Deep Underground Neutrino Experiment, as well as next generation future colliders and telescopes.


Project title: Machine Learning to Find New Physics in Muon Decays

1st supervisor: Prof Joel Goldstein
2nd supervisor: TBD
Department/Institution: Particle Physics, University of Bristol
Research theme: T1 - data from large science facilities

Project description: The Mu3e experiment at PSI will look for extremely rare muon decays; in particular it is designed to try to identify the lepton flavour-violating decay of a muon to three electrons at the level of one event in 10^16. The experiment will use the latest advances in detector technology to identify electrons with high spatial and temporal resolution, and advanced pattern recognition algorithms will be implemented electronically to filter the data in real time. In this project the student will apply the latest developments in machine learning to Mu3e event reconstruction and filtering, developing new techniques that could be faster, more flexible and/or more effective than conventional algorithms. This could lead not only to the optimisation of the physics reach for the three-electron channel, but also the capability to perform real-time detailed analysis to look for different signatures. The student will start by developing and optimising algorithms in simulation, and then will have the opportunity to commission and test them in early data from the running experiment.


Project title: New Physics searches in B and D meson decays with Machine Learning

1st supervisor: Dr Kostas Petridis
2nd supervisor: TBD
Department/Institution: Particle Physics, University of Bristol
Research theme: T1 - data from large science facilities

Project description: This project aims to discover physics beyond the Standard Model (SM) by using advanced machine learning techniques to study a vast number of B- and D-hadrons (bound state of beauty or charm quarks respectively) with unprecedented precision at current and future CERN facilities. The proposed research has two main branches:

  • Development of GPU based 4-body amplitude fits of decays of B- and D-hadrons using TensorFlow
  • Development of fast simulation of both the collisions as well as of the response of particle physics detectors using Generative Networks


Project title: AI for SKA Data Challenges

1st supervisor: Prof Mark Birkinshaw
2nd supervisor: TBD
Department/Institution: Astronomy, University of Bristol
Research theme: T1 - data from large science facilities

Project description: The Square Kilometre Array (SKA) will be constructed in Australia and South Africa in two phases over the next couple of decades. This large international project will provide two (and later three) interferometers operated as a single observatory. It will be the predominant radio observatory, and promises both qualitative and quantitative changes in the science that can be obtained. The SKA will require new modes of observation and will generate enormous data volumes. In recognition of this, the pre-construction and early construction phases will be marked by a series of data challenges (DCs), where model, and later preliminary real datasets, will be released for analysis by the community. SKA DC1 was recently launched. It provides sample model sky survey images at three frequencies with exposures of 8, 100, and 1000 hours. Under DC1, analysts are to locate and characterise sources, and then can attempt to use the source population to study cosmology and the evolution of structure. The high source density in SKA images means that the extended sources often overlap, and there is high source crowding ("confusion"). Normal data analysis techniques are futile and AI methods are needed. Such AI techniques as topological data analysis will become even more critical as further data dimensions are added --- fully-sampled spectral data (the third dimension), polarisation data (fourth and fifth dimensions), and variability (the sixth dimension). This studentship will investigate AI techniques for extracting source information from the increasingly-complicated SKA datasets over the preparatory and early construction phases of the project. This will involve

  • recognising unresolved source populations, and finding incompleteness and other corrections to source counts
  • filtering extended sources, and identifying the central components which will be associated with the optical counterparts
  • investigating the effects of spectral and Faraday complexity in data analysis at higher dimensionalities
  • studying how imaging imperfections limit the ability to extract time-series data with a view to applying these techniques to early-release SKA data, especially in relation to Magnetism Key Science Projects.


Project title: Real time radiotherapy verification

1st supervisor: Dr Jaap Velthuis
Department/Institution: Particle Physics/Detectors, University of Bristol
2nd supervisor: Dr Richard Hugtenburg (Swansea University)
Research theme: T2 - biological, health and clinical sciences

Project description: We are currently developing a device that will be operated during radiotherapy treatment upstream of the patient and verify in real time the beam shape, which changes all the time, and the dose map. This requires online fast analysis and lots of Monte Carlo simulations to verify the treatment. This MC generation is fairly "inefficient" as the photon cross section is very low. We are therefore looking at alternative ways to do this. In addition, we expect that the device will be installed in many radiotherapy centres. Clever data mining will allow the systems to use anomaly detection to do preventive maintenance but more interestingly by combining the data from several centres we can disentangle misbehaving sensor systems from misbehaving X-ray linacs. The key challenge is to get the individual systems to learn to signal these faults while sharing as little data as possible due to, e.g., privacy reasons. This is very important as wrongly delivered radiotherapy treatments are extremely dangerous. As such this project combines three different but very much linked big data and machine learning challenges.


Cardiff University

Project title: Empowering time-domain astronomy with Artificial Intelligence

1st supervisor: Dr Cosimo Inserra
2nd supervisor: TBD
Department/Institution: School of Physics and Astronomy, Cardiff University
Research theme: T1 - data from large science facilities

Project description: Supernovae are catastrophic stellar explosions shaping the visible Universe and affecting many diverse areas of astrophysics. Supernovae arising from massive stars, referred to as core- collapse supernovae, play a major role in many intriguing astronomical problems since they produce neutron stars, black holes, and gamma-ray bursts. We are now living in the golden era of transient astronomy, with roughly 11000 transients discovered per year. The advent of the Large Synoptic Survey Telescope will boost the number of yearly discoveries by a factor of 100. Task-specific algorithms employed until now for transients' classification have limitations in taming the zoo of transients. The main project goal is to develop an Artificial Intelligence tool (deep learning algorithm) that can process time-series (e.g. luminosity evolution) and non-time-series (e.g. environment information) and that can identify core-collapse supernovae in less than 15 days from explosions, which is when we can retrieve crucial information about the progenitor nature. A secondary goal is to build such an AI tool in a way that is scalable enough to be applied to the environment of compact stars mergers producing gravitational waves. This application can predict the merger type (what objects are merging and their masses) and allow for rate and population studies at far distances.


Project title: Investigating the Epoch of Galaxy Formation Using Artificial Intelligence

1st supervisor: Prof Steve Eales
2nd supervisor: TBD
Department/Institution: School of Physics and Astronomy, Cardiff University
Research theme: T1 - data from large science facilities

Project description: We recently completed the largest far-infrared survey of the extragalactic sky, the Herschel ATLAS, which detected almost 500,000 sources, ranging from nearby galaxies to dust-enshrouded galaxies at redshifts>4 seen during their initial galaxy-building burst of star formation. NASA and ESA currently have no plans for a future far-infrared space telescope, and so our survey is likely to remain the main source of information about the far-infrared and submm sky for several decades. The poor angular resolution of the Herschel Space Observatory meant that we faced a major challenge in identifying the optical counterparts to the far-infrared sources. We used a simple Bayesian technique that took account of the distance of the possible counterpart from the far-infrared source and the optical magnitude of the counterpart (a fainter counterpart is more likely to be close to the far-infrared source by chance). The H-ATLAS team (160 members in 16 countries, led from Cardiff) released all their catalogues last year but there is still a huge amount to be done. First, lack of time meant that we never looked for counterparts at all in our largest field (~200,000 sources). Second, there are several new, deeper sets of images available on which to look for counterparts. Three, the rapid development of machine-learning techniques means that we should be able to develop a method that uses all the properties of the potential counterpart (its flux densities in all the available photometric bands not just the flux density in a single band) to estimate the probability that it is associated with the far-infrared source. The student will initially produce a set of training data for the identification analysis using the much deeper and smaller (in area) COSMOS field where we can use deep radio data to identify all the counterparts. The student will then use a neural network, trained using the COSMOS data, to find the most probably counterparts to all the far-infrared sources. The student will write this up as a paper and release the catalogues of counterparts to the worldwide astronomical community. If time permits, we will proceed to deep learning techniques.


Project title: Developing automatic supernova and star-forming region detector

1st supervisor: Mikako Matsuura
2nd supervisor: TBD
Department/Institution: School of Physics and Astronomy, Cardiff University
Research theme: T1 - data from large science facilities

Project description: AI is very power tool to investigate and process a large quantity of astronomical data. Using image recognition software supported by AI, we will develop an automatic identification software of supernovae and star-forming regions. The project uses the existing catalogue of Herschel Space Observatory's Galactic plane survey as a starting point, and find further supernovae and star forming regions. We are anticipating to find any difference in dust properties between these two different regions, hence, understand the evolution of dust in the interstellar medium. It is also expected that the project can capture the event of supernovae triggering star-formation.


Project title: Deep Learning for Real-Time Gravitational-Wave Detection

1st supervisor: Prof Patrick Sutton
2nd supervisor: TBD
Department/Institution: School of Physics and Astronomy, Cardiff University
Research theme: T1 - data from large science facilities

Project description: Joint observations of the binary neutron star merger GW170817 by LIGO/Virgo and electromagnetic telescopes produced a wealth of information not accessible to gravitational waves alone, such as evidence for the origin of heavy elements and a direct measurement of the cosmic expansion. Future joint observations of systems such as supernovae, long gamma-ray bursts, or as-yet unknown phenomena could produce equally important insights. Such observations rely on rapid (minute-latency) analysis of the gravitational-wave data to identify signals. Deep neural networks are a promising technology; preliminary studies indicate they are capable of robustly detecting a wide variety of signal morphologies with very low processing time. The goal of this project is to construct, characterise, and deploy a deep neural network signal detection algorithm for the LIGO-Virgo observatory network. This work will involve gravitational physics, astrophysics, the statistical theory of signal detection, advanced data mining techniques, and large-scale computing.


Project title: ST-AI; Application of AI approaches to improve patient outcomes for sexually transmitted infections

1st supervisor: Dr Thomas Connor
2nd supervisor: TBD
Department/Institution: School of Biosciences, Cardiff University 3rd supervisor: Dr Zoe Couzens (Public Health Wales, Health Protection)
Research theme: T2 - biological, health and clinical sciences

Project description: Neisseria gonorrhoeae (NG) poses a major public health challenge. It is the second most frequently diagnosed sexually transmitted infection in Europe, and isolates are increasingly resistant to key treatments - ceftriaxone and azithromycin. Increasing resistance is driven by a multitude of factors. The nature of sexual behaviour and of the way that patients present for care for sexually transmitted infections, combined with variable provision of care for STIs across the UK complicates the delivery of targeted treatments, and may affect the increase in resistance that is being observed. This study seeks to utilise AI approaches to improve our ability to type, track and treat NG infections in Wales. It builds upon work already being undertaken within Public Health Wales, and seeks to extend what is currently possible. Broadly, it has three interrelated elements, the outcomes from which will provide a route to improved patient care;

  • The interrogation of population-level health data to identify complex risk factors that relate to NG disease
  • The linking of genomic sequence data to population-level health data to inform the development of molecular tests and to gain increased resolution on NG risk factors
  • The development of systems to perform risk assessment of a patient in real time, using information collected from patients via an online client to either reassure a patient or trigger the sending out of a self test kit and visit to an STI clinic.
Depending on background, the successful candidate will begin work examining the population-level patterns and risk factors of NG disease using SAIL, and then move on to either utilising genomic sequence data or to undertaking the research that will underpin the potential patient-facing system.


Project title: Harnessing Spiking Neural Networks for Enhanced Situational Awareness

1st supervisor: Prof Alun Preece
2nd supervisor: Prof Roger Whitaker
Department/Institution: Cardiff University Crime and Security Research Institute
Research theme: T3 - novel mathematical, physical and computer science approaches

Project description: Spiking Neural Networks (SNNs) are a kind of artificial neural network (ANN) intended to more closely model natural neural networks in human brains. Information flows through the network based on the gradual accumulation of 'electrical charge' over time, giving SNNs a way of modelling temporal-based processes as well as having a kind of embedded memory. Importantly, SNNs can be implemented on hardware that requires far less power consumption than 'traditional' kinds of ANN, as well as being trainable on much smaller datasets. The aim of this PhD is to explore how SNN approaches may be used to model tasks involving machine understanding of situations, and prediction of future states. Starting with relatively simple temporal problems, there is a lot of scope in the project to explore specific domains including understanding the dynamics of human social networks and social media, performing activities autonomously, and engaging in human-machine collaboration. The project will work in close collaboration with Crime and Security Institute researchers, using large-scale data sources to investigate the potential of SNNs for situational awareness.


Project title: Exploiting network motifs to enhance prediction of contagion in complex networks

1st supervisor: Prof Roger Whitaker
2nd supervisor: Prof Alun Preece
Department/Institution: Cardiff University Crime and Security Research Institute
Research theme: T3 - novel mathematical, physical and computer science approaches

Project description: Network motifs are the over-representation of small induced sub-structures in a directed network, compared to what can be expected against some baseline (e.g., at random). Motifs are useful for characterising complex networks, which may be too large or dynamic to engage other types of network analysis. Network motifs have been established as a useful methodology for determining the underlying and often hidden characteristics of a network. This project will consider using motifs as a basis to predict the susceptibility of a network to different forms of contagion (e.g., both simple and complex contagion). The work will be undertaken in close collaboration with Crime and Security Institute researchers, using large-scale data sources to investigate the potential of motifs to offer advanced warning against different forms of social contagion in a variety of networks and scenarios. These will centre on, but will not be restricted to social media, and will consider the potential to address dynamic and (near-) real time scenarios. The project will involve considering a range of prediction strategies, based on supervised (and potentially other types of) learning.


Swansea University

Project title: Analysing lattice QCD data with machine learning

1st supervisor: Prof Gert Aarts
2nd supervisor: Prof Chris Allton
Department/Institution: Physics Department, Swansea University
Research theme: T1 - data from large science facilities

Project description: Simulations of the strong nuclear force (using lattice Quantum Chromodynamics, or lattice QCD) produce a large amount of data. While some physical observables can be extracted in a quite straightforward manner, e.g. the masses of stable hadrons in vacuum, this is not the case for more complicated observables, especially those relevant for QCD under the extreme conditions of nonzero temperature and/or density. Nevertheless, there are many outstanding questions here, which are linked to the physics of the early Universe and of heavy-ion collisions experiments at CERN and elsewhere. Examples include transport coefficients in the quark-gluon plasma and in-medium modification of the hadronic spectrum. In this project, we will apply deep learning techniques to access those observables, by connecting numerically computed lattice QCD correlators with spectral functions, containing this information. By generating large sets of mock data deep learning algorithms will be assessed, and subsequently applied to actual lattice QCD data. The goal is to obtain reliable spectral information in regions where other methods have given limited information so far.


Project title: Machine learning with anti-hydrogen

1st supervisor: Prof Niels Madsen
2nd supervisor: TBD
Department/Institution: Physics Department, Swansea University and CERN
Research theme: T1: data from large science facilities

Project description: The ALPHA Antihydrogen experiment makes use of several particle detector technologies, including a Silicon Vertex Detector, Time Projection Chamber, and a barrel of scintillating bars. One of the key challenges for these detector systems is to distinguish between antihydrogen annihilations and cosmic rays, a classification problem machine learning can do excellently. Presently this task is done by the use of cuts based on two high-level variables from the detectors for online analysis, and boosted decision trees with high level variables in offline analysis. This project would take a student into the future of machine learning. High level variables are a powerful tool for discrimination, however they are slow to pre-process. The challenge of this PhD project would be to build both online and offline analyses that have different processing budgets. Initially the plan is to investigate the application of modern machine learning techniques, such as deep learning, to attempt to beat the current cutting edge decision tree analysis used by the collaboration. Subsequently the project will expand to look at replacing the high level variables with lower level variables to reduce pre-processing time. Ultimately, a small enough model that can interpret raw detector output can make a real-time online analysis, with the final goal of programming an FPGA or micro-controller to perform accurate, real-time classification of detector events. The combination of these projects would build a robust and comprehensive thesis that investigates machine learning applied to particle detectors. It will clearly illustrate that good data preparation is the key to accurate classification models, as well as demonstrate the speed that can be achieved using simple models to handle low level data. Demonstration of a micro-controller and FPGA level classification would have a large impact for the particle detector community contributing to detector trigger systems and live diagnostics beyond the scope of the ALPHA experiment.


Project title: Deep Learning and natural language processing for early prediction of cancer from electronic health records

1st supervisor: Dr Shang-Ming Zhou
2nd supervisor(s): Prof Ronan Lyons, Dr Martin Rolles
Department/Institution: Medical School, Swansea University
Research theme: T2 - biological, health and clinical sciences

Project description: This project concerns development of deep learning and natural language processing techniques for early prediction of cancer occurrence and progression from routine electronic health records. Early detection of cancer can vastly improve the outcome for individuals and facilitates the subsequent clinical management. Due to lack of accurate tools to accurately predict which individuals will actually progress into malignancy, the current healthcare systems heavily rely on frequent and invasive surveillance of entire at-risk population via screening, which leads to significant financial, physical and emotional burdens. In this project, the doctoral researcher will develop AI and machine learning methods for early prediction of cancer occurrence and progression by identifying useful clinical signals and possible warning signs of cancer from linked primary care and secondary care electronic health records using large anonymised electronic cohort in Wales (pop ~3.1M). The findings of this study will increase the awareness of possible warning signs of cancer among health professionals, policy-makers as well as general public to make great impact on the outcomes. Due to intensive computing (particularly, data-driven model training) and big datasets collected with large number of records and extremely high dimensions, the doctoral researcher will become highly skilled in HPC/HPDA, using the infrastructure offered by the CDT infrastructure and the HDR UK.


Project title: Advanced machine learning for unravelling unknown patterns of polypharmacy and multi-morbidity from social media and electronic health records to enhance care of patients with chronic conditions

1st supervisor: Dr Shang-Ming Zhou
2nd supervisors: Prof Andrew Morris, Prof Sinead Brophy
Department/Institution: Medical School, Swansea University
Research theme: T2 - biological, health and clinical sciences

Project description: As the population ages, caring for the patients with multimorbidity and the extent to which their needs are met are sharp exemplars of the most important tasks facing healthcare services across the world in the 21st century. This project is intended to contribute to the solutions of the two greatest challenges currently confronting healthcare: the linked problems of multimorbidity and polypharmacy. This project will develop and use advanced machine learning and AI techniques to discover previously unknown patterns of polypharmacy and multimorbidity from electronic health records and social media, and predict patient cohorts at risk, detect adverse drug events caused by a combination of drugs, and identify patterns of prescriptions for intervention to facilitate drug verification. Therefore, this project will help gain in-depth knowledge of pharmacovigilance for patient safety, and more insight into what constitutes "best care" for patients with multimorbidity.


Project title: Computationally-bounded agents in game theory

1st supervisor: Dr Arno Pauly
Department/Institution: Department of Computer Science, Swansea University
2nd supervisor: Dr Jeffrey Giansiracusa
Department/Institution: Department of Mathematics, Swansea University
Research theme: T3 - novel mathematical, physical and computer science approaches

Project description: Game theory is concerned with the strategic interactions of agents, typically assumed to be rational. It underlies a significant part of economics, but is also central to AI in the context of heterogeneous or multi-agent systems. In its traditional incarnation, game theory puts no a priori limits on the information-processing abilities of the agents. This is problematic, because it leads to results predicting behaviour which is computationally intractible or even non-computable to determine -- which obviously limits any applicability to real-world agents. This project is about starting from a different position: If agents are computationally-bounded (but otherwise "as rational as possible") , what types of interaction would emerge? Potential setups for this could be rooted in theoretical computer science, limiting the agents to execute algorithms in certain complexity classes; in functional analysis, restricting agents to determine their actions by e.g. Lipschitz functions, and using higher-order fixed point theorem to obtain equilibria; or in machine learning, where an agent selects the parameters of the machine learning model in a rational fashion, but is subsequently bound to the chosen learning model. What approaches to focus on would primarily be determined by the interests and qualifications of the student. A key application area could be the attempt to better model stock market interactions than orthodox approaches accomplish.


Project title: Mathematical and Computational Modelling of Brain Evolution, Development and Disease

1st supervisor: Dr Noemi Picco
2nd supervisor: Dr Gibin Powathil
Department/Institution: Department of Mathematics, Swansea University
Potential supervisor/collaborators: Dr Fernando García-Moreno (Achucarro Basque Center for Neuroscience)
Research theme: T3 - novel mathematical, physical and computer science approaches

Project description: The brain is the seat of our highest cognitive functions. Critically, precise composition and positioning of neurons are determined during development and are key to the emergence of these cognitive functions. Variations of the developmental programme can lead to speciation as well as malformations such as schizophrenia, epilepsy, and microcephaly. The recent zika virus epidemics exposed the lack of our basic understanding of fundamental mechanisms of neural development. The developmental program leading to the formation of the brain is the result of a complex regulation of cellular processes in space and time. To date, brain development has been studied through analysis of sparse temporal data that may miss crucial information. The project aims to develop novel mathematical and computational approaches that account for both the spatial and temporal aspects of this process leading to the vast array of brain architectures, shapes and sizes that we see in different animal species. The project will explore the hypothesis that this variety emerged from a trade-off between proliferative and spatial constraints and preferential expansion of certain proliferative zones of the developing brain. Drawing from techniques of machine learning and optimisation, the project aims to map all the possible evolutionary pathways of the brain, to highlight the evolutionary principles and fundamental mechanisms of normal brain development shared across species, and to provide insight into disease and malformations.


Project title: Visualising Extremely Large Dynamic Networks through AI and HPC

1st supervisor: Dr Daniel Archambault
Department/Institution: Department of Computer Science, Swansea University
2nd supervisor: Prof Jonathan Roberts
Department/Institution: Department of Computer Sciene, Bangor University
Research theme: T3 - novel mathematical, physical and computer science approaches

Project description: Until recently, state-of-the-art dynamic graph drawing algorithms used the timeslice as a basis for drawing the network. New event-based approaches [1,2] have changed this approach and are designed to draw such networks directly in 2D+t. In either case, all algorithms use small, local areas of the network metrics as a basis. Many of these localised structures are considered in parallel propagating upwards to realise an overarching global behaviour, allowing domain scientists to visualise the network. However, when humans try to understand large graphs, we use a top down approach, looking at the global, high level features first followed by individual details (Overview First, Zoom and Filter, Details on Demand). Artificial intelligence provides an opportunity to manage these two simultaneously, by considering top down examples validated by a human and applying scalable bottom up algorithms in the background for localised detail refinement. This PhD would consider the following goals:

  1. Produce novel visualisation approaches for extremely large dynamic graphs. These approaches would consider top down cases suggested by supervised learning techniques while simultaneously using more localised refinement frequently seen in the dynamic graph drawing literature.
  2. Dynamic graph drawing and visualisation is highly parallelizable. We will use HPC technology to further scale the visualisation and drawing process to larger data sets.

[1] Paolo Simonetto, Daniel Archambault, and Stephen Kobourov. Event-Based Dynamic Graph Visualisation. IEEE Transactions on Visualisation and Computer Graphics, accepted and in press, 2018.
[2] Paolo Simonetto, Daniel Archambault and Stephen Kobourov. Drawing Dynamic Graphs Without Timeslices. Graph Drawing 2017, 394--409, 2018.


Project title: Multi-dimensional time series analysis with large scale hypothesis testing and geometric dimension reduction

1st supervisor: Dr Farzad Fathizadeh
Department/Institution: Department of Mathematics, Swansea University and Guest scientist at the Max Planck Institute for Biological Cybernetics
2nd supervisor: Prof Biagio Lucini,
Department/Institution: Department of Mathematics, Swansea University
Research theme: T3 - novel mathematical, physical and computer science approaches

Project description: Advances in concurrent recording of behaviours and neural activities in the intact brain have led to an invaluable source of information for fathoming into the properties of brain networks, and for determining the statistical properties of animal cognition and social behaviour. Multi-modal recordings of neural activities at different spatiotemporal scales are accompanied with considerable noise and require advanced and novel analytical and statistical techniques for signal detection in the corresponding time series. In previous work, a statistical method for the detection and sorting of neuronal signals in noisy time series through large scale hypothesis testing and the so-called geometric learning has been devised. Geometric learning is a method that associates a graph to a given data set; one can then read off the local geometry of the data in the heat kernel of the Laplacian of the graph (viewed as an approximation of the Laplacian of a curved geometry). In this project, this analysis technique will be generalised to multi-dimensional (correlated) time series. Part of the project will be about detection of decision boundaries for hypothesis rejections by simulations, and working out theoretical aspects of the observed boundaries.


Project title: Graph Matching for Big Data - an AI and Machine Learning approach

1st supervisor: Dr Gary KL Tam
Department/Institution: Department of Computer Science, Swansea University
2nd supervisor: Dr Yukun Lai, Prof Paul Rosin
Department/Institution: Department of Computer Science, Cardiff University
Research theme: T3 - novel mathematical, physical and computer science approaches

Project description: Graph matching is a fundamental core problem in many research areas like computer vision, image and geometry processing, robotics, and medical imaging. It is also related to many disciplines (e.g. bioinformatics, psychology, dentistry, physics) and supports many downstream applications (intelligent image editing, image morphing, biometrics, evaluation of surgery outcome, high performance data analysis and knowledge discovery, study in phylogenetic evolution and even drug discovery etc). Graph matching however is a computationally very hard problem, and traditional techniques use approximation algorithms to seek the best solution. These techniques often require specific domain knowledge and tailored constraints to drive the search of a solution. However, when the dataset is highly complex (e.g. some form of hierarchical or temporal correlation) or there is little knowledge about the dataset, existing generic graph matching techniques struggle to perform. In this project, we are going to use AI and big data, with the help of a new mathematical formulation (using the spectral graph theory and the cycle consistency constraint), we can explore a special class of deep graph matching algorithms for such challenging tasks. The interested candidate is expected to have sound mathematical background and programming skills. Knowledge in general machine learning is desirable.


More projects based at Swansea University can be found here.



For details on how to submit your application, see the Applications page.