UKRI Centre for Doctoral Training in Artificial Intelligence, Machine Learning & Advanced Computing

Research projects based at Swansea University

Project title: Analysing lattice QCD data with machine learning

1st supervisor: Prof Gert Aarts
2nd supervisor: Prof Chris Allton
Department/Institution: Physics Department, Swansea University
Research theme: T1 - data from large science facilities

Project description: Simulations of the strong nuclear force (using lattice Quantum Chromodynamics, or lattice QCD) produce a large amount of data. While some physical observables can be extracted in a quite straightforward manner, e.g. the masses of stable hadrons in vacuum, this is not the case for more complicated observables, especially those relevant for QCD under the extreme conditions of nonzero temperature an/or density. Nevertheless, there are many outstanding questions here, which are linked to the physics of the early Universe and of heavy-ion collisions experiments at CERN and elsewhere. Examples include transport coefficients in the quark-gluon plasma and in-medium modification of the hadronic spectrum. In this project, we will apply deep learning techniques to access those observables, by connecting numerically computed lattice QCD correlators with spectral functions, containing this information. By generating large sets of mock data deep learning algorithms will be assessed, and subsequently applied to actual lattice QCD data. The goal is to obtain reliable spectral information in regions where other methods have given limited information so far.

Project title: Machine-Learning symmetries and phases of matter

1st supervisor: Prof Biagio Lucini
2nd supervisor: Prof Simon Hands
Department/Institution: Mathematics & Physics, Swansea University
Research theme: T1 - data from large science facilities T3 - novel mathematical, physical and computer science approaches

Project description: Symmetries play an important role in determining the properties of a many-body system. In particular, the different possible realisations of a symmetry (e.g. linear vs. non-linear) determine the existence of different phases of matter. For instance, in a ferromagnet, the existence of spontaneous magnetisation at low temperature is a clear signal of a non-linear realisation (or breaking) of a symmetry that is manifest in its Hamiltonian, while the absence of the spontaneous magnetisation above a critical temperature signals the the restoration (or linear realisation) of that symmetry. While there is a standard procedure for investigating the phases of a system when the relevant symmetry is known, there are important models and real-world systems for which the symmetry is unknown. Examples include the theory of the strong interactions, Quantum Chromodynamics, in which the matter can be either confined into bound states called hadrons or be in a plasma of elementary particles, quarks and gluons. Another model that cannot be treated by conventional phase classification analysis is the topological superconductor, a largely unexplored class of novel materials with potentially revolutionary applications in, eg. robust quantum computation. The project aims to characterise phases of model matter using machine learning classification applied to Monte Carlo generated data. At an early stage, systems for which the conventional approach to phase classification is known will be studied, in order to develop the correspondent machine learning methods. Subsequently, building on these investigations, more complex systems such as QCD and topological superconductors will be characterised using machine learning tools.

Project title: Machine learning with anti-hydrogen

1st supervisor: Prof Niels Madsen
2nd supervisor: TBD
Department/Institution: Physics Department, Swansea University and CERN
Research theme: T1: data from large science facilities

Project description: The ALPHA Antihydrogen experiment makes use of several particle detector technologies, including a Silicon Vertex Detector, Time Projection Chamber, and a barrel of scintillating bars. One of the key challenges for these detector systems is to distinguish between antihydrogen annihilations and cosmic rays, a classification problem machine learning can do excellently. Presently this task is done by the use of cuts based on two high-level variables from the detectors for online analysis, and boosted decision trees with high level variables in offline analysis. This project would take a student into the future of machine learning. High level variables are a powerful tool for discrimination, however they are slow to pre-process. The challenge of this PhD project would be to build both online and offline analyses that have different processing budgets. Initially the plan is to investigate the application of modern machine learning techniques, such as deep learning, to attempt to beat the current cutting edge decision tree analysis used by the collaboration. Subsequently the project will expand to look at replacing the high level variables with lower level variables to reduce pre-processing time. Ultimately, a small enough model that can interpret raw detector output can make a real-time online analysis, with the final goal of programming an FPGA or micro-controller to perform accurate, real-time classification of detector events. The combination of these projects would build a robust and comprehensive thesis that investigates machine learning applied to particle detectors. It will clearly illustrate that good data preparation is the key to accurate classification models, as well as demonstrate the speed that can be achieved using simple models to handle low level data. Demonstration of a micro-controller and FPGA level classification would have a large impact for the particle detector community contributing to detector trigger systems and live diagnostics beyond the scope of the ALPHA experiment.

Project title: Deep Learning and natural language processing for early prediction of cancer from electronic health records

1st supervisor: Dr Shang-Ming Zhou
2nd supervisor(s): Prof Ronal Lyons, Dr Martin Rollers
Department/Institution: Medical School, Swansea University
Research theme: T2 - biological, health and clinical sciences

Project description: This project concerns development of deep learning and natural language processing techniques for early prediction of cancer occurrence and progression from routine electronic health records. Early detection of cancer can vastly improve the outcome for individuals and facilitates the subsequent clinical management. Due to lack of accurate tools to accurately predict which individuals will actually progress into malignancy, the current healthcare systems heavily rely on frequent and invasive surveillance of entire at-risk population via screening, which leads to significant financial, physical and emotional burdens. In this project, the doctoral researcher will develop AI and machine learning methods for early prediction of cancer occurrence and progression by identifying useful clinical signals and possible warning signs of cancer from linked primary care and secondary care electronic health records using large anonymised electronic cohort in Wales (pop ~3.1M). The findings of this study will increase the awareness of possible warning signs of cancer among health professionals, policy-makers as well as general public to make great impact on the outcomes. Due to intensive computing (particularly, data-driven model training) and big datasets collected with large number of records and extremely high dimensions, the doctoral researcher will become highly skilled in HPC/HPDA, using the infrastructure offered by the CDT infrastructure and the HDR UK.

Project title: Advanced machine learning for unravelling unknown patterns of polypharmacy and multi-morbidity from social media and electronic health records to enhance care of patients with chronic conditions

1st supervisor: Dr Shang-Ming Zhou
2nd supervisor: TBD
Department/Institution: Medical School, Swansea University
Research theme: T2 - biological, health and clinical sciences

Project description: As the population ages, caring for the patients with multimorbidity and the extent to which their needs are met are sharp exemplars of the most important tasks facing healthcare services across the world in the 21st century. This project is intended to contribute to the solutions of the two greatest challenges currently confronting healthcare: the linked problems of multimorbidity and polypharmacy. This project will develop and use advanced machine learning and AI techniques to discover previously unknown patterns of polypharmacy and multimorbidity from electronic health records and social media, and predict patient cohorts at risk, detect adverse drug events caused by a combination of drugs, and identify patterns of prescriptions for intervention to facilitate drug verification. Therefore, this project will help gain in-depth knowledge of pharmacovigilance for patient safety, and more insight into what constitutes "best care" for patients with multimorbidity.

Project title: Towards Automated and Explainable Computational Aerospace Design

1st supervisor: Dr Sean Walton
Department/Institution: Computer Science, Swansea University
2nd supervisor: Dr Ben Evans, College of Engineering, Swansea University
Research theme: T3 - novel mathematical, physical and computer science approaches

Project description: Automatic computational optimisation methods have become increasingly sophisticated and powerful in recent years. Despite this, the uptake of automatic methods using AI in industry has been slow for two key reasons (1) engineers fear the loss of human input, or intuition, in the design process and (2) it is difficult to understand the high dimensional data which is generated by the process and hence understand the decisions the algorithm makes. Can we build tools and design new algorithms which address these two issues? Working on this project a PhD student will have the opportunity to work with engineers in Airbus Group, BAE Systems, Jaguar and Bloodhound SSC to explore and find solutions to this problem. Solutions which involve developing new optimisation algorithms which allow human intervention, whilst visualising the almost unlimited stream of data produced by optimisation algorithms. This project is aligned with the current movement towards explainable AI and will make use of high-performance computing.

Project title: Improving Precision and Convergence of Machine Learning Algorithms with application to Lattice QCD and GPUs

1st supervisor: Dr Benjamin Mora
Department/Institution: Computational Foundry, Swansea University
2nd supervisor: Prof Biagio Lucini
Department/Institution: Mathematics, Swansea University
Research theme: T3 - novel mathematical, physical and computer science approaches

Project description: Lattice QCD researchers are used to improve numerical algorithms to understand particles like quarks or gluons better. The algorithms are at the crossroad between Monte Carlo techniques and inverse problem solvers, and usually improved variants of the conjugate gradient algorithm. Similarly, Machine Learning (ML) researchers try to solve (or at least minimise) inverse problems using randomised methods like Stochastic Gradient Descent (SGD). With the advent of better algorithms and concepts (e.g. Generative Adversarial Networks), high performance graphics cards (GPUs) and specialised accelerators (e.g. Google's TPUs), some reasonably good levels of AI can nowadays be obtained with current methods in specific applications. Hence, it is clear that both Physics and ML have common interchangeable areas of research and there has never been a more exciting time to combine Physics and ML knowledge. This project will therefore try to aim at the following problems:

  • Can we ensure faster learning with Machine Learning? More precisely, are there efficient methods that can replace algorithms based on SGD? This is an extremely important problem due the huge quantity of calculations needed to train a neural network.
  • What is the influence of arithmetic precision in ML applications? One aim is to provide new algorithms that are more robust at calculating dot products from standard types (e.g. float or doubles), especially on GPUs. The new techniques will possibly improve stability of CG methods and reduce the number of times the algorithm needs to be restarted.
  • On the opposite direction, can (low complexity) approximation methods for linear algebra operators be useful to neural networks, and in particular in the context of GANs? Can approximation techniques also be useful to lattice QCD?

Project title: Computationally-bounded agents in game theory

1st supervisor: Dr Arno Pauly
Department/Institution: Department of Computer Science, Swansea University
2nd supervisor: Dr Jeffrey Giansiracusa
Department/Institution: Department of Mathematics, Swansea University
Research theme: T3 - novel mathematical, physical and computer science approaches

Project description: Game theory is concerned with the strategic interactions of agents, typically assumed to be rational. It underlies a significant part of economics, but is also central to AI in the context of heterogeneous or multi-agent systems. In its traditional incarnation, game theory puts no a priori limits on the information-processing abilities of the agents. This is problematic, because it leads to results predicting behaviour which is computationally intractible or even non-computable to determine -- which obviously limits any applicability to real-world agents. This project is about starting from a different position: If agents are computationally-bounded (but otherwise "as rational as possible") , what types of interaction would emerge? Potential setups for this could be rooted in theoretical computer science, limiting the agents to execute algorithms in certain complexity classes; in functional analysis, restricting agents to determine their actions by e.g. Lipschitz functions, and using higher-order fixed point theorem to obtain equilibria; or in machine learning, where an agent selects the parameters of the machine learning model in a rational fashion, but is subsequently bound to the chosen learning model. What approaches to focus on would primarily be determined by the interests and qualifications of the student. A key application area could be the attempt to better model stock market interactions than orthodox approaches accomplish.

Project title: Mathematical and Computational Modelling of Brain Evolution, Development and Disease

1st supervisor: Dr Noemi Picco
2nd supervisor: Dr Gibin Powathil
Department/Institution: Department of Mathematics, Swansea University
Potential supervisor/collaborators: Dr Fernando García-Moreno (Achucarro Basque Center for Neuroscience)
Research theme: T3 - novel mathematical, physical and computer science approaches

Project description: The brain is the seat of our highest cognitive functions. Critically, precise composition and positioning of neurons are determined during development and are key to the emergence of these cognitive functions. Variations of the developmental programme can lead to speciation as well as malformations such as schizophrenia, epilepsy, and microcephaly. The recent zika virus epidemics exposed the lack of our basic understanding of fundamental mechanisms of neural development. The developmental program leading to the formation of the brain is the result of a complex regulation of cellular processes in space and time. To date, brain development has been studied through analysis of sparse temporal data that may miss crucial information. The project aims to develop novel mathematical and computational approaches that account for both the spatial and temporal aspects of this process leading to the vast array of brain architectures, shapes and sizes that we see in different animal species. The project will explore the hypothesis that this variety emerged from a trade-off between proliferative and spatial constraints and preferential expansion of certain proliferative zones of the developing brain. Drawing from techniques of machine learning and optimisation, the project aims to map all the possible evolutionary pathways of the brain, to highlight the evolutionary principles and fundamental mechanisms of normal brain development shared across species, and to provide insight into disease and malformations.

Project title: Visualising Extremely Large Dynamic Networks through AI and HPC

1st supervisor: Dr Daniel Archambault
Department/Institution: Department of Computer Science, Swansea University
2nd supervisor: TBD, Bangor University
Research theme: T3 - novel mathematical, physical and computer science approaches

Project description: Until recently, state-of-the-art dynamic graph drawing algorithms used the timeslice as a basis for drawing the network. New event-based approaches [1,2] have changed this approach and are designed to draw such networks directly in 2D+t. In either case, all algorithms use small, local areas of the network metrics as a basis. Many of these localised structures are considered in parallel propagating upwards to realise an overarching global behaviour, allowing domain scientists to visualise the network. However, when humans try to understand large graphs, we use a top down approach, looking at the global, high level features first followed by individual details (Overview First, Zoom and Filter, Details on Demand). Artificial intelligence provides an opportunity to manage these two simultaneously, by considering top down examples validated by a human and applying scalable bottom up algorithms in the background for localised detail refinement. This PhD would consider the following goals:

  1. Produce novel visualisation approaches for extremely large dynamic graphs. These approaches would consider top down cases suggested by supervised learning techniques while simultaneously using more localised refinement frequently seen in the dynamic graph drawing literature.
  2. Dynamic graph drawing and visualisation is highly parallelizable. We will use HPC technology to further scale the visualisation and drawing process to larger data sets.

[1] Paolo Simonetto, Daniel Archambault, and Stephen Kobourov. Event-Based Dynamic Graph Visualisation. IEEE Transactions on Visualisation and Computer Graphics, accepted and in press, 2018.
[2] Paolo Simonetto, Daniel Archambault and Stephen Kobourov. Drawing Dynamic Graphs Without Timeslices. Graph Drawing 2017, 394--409, 2018.

Project title: Multilevel Monte Carlo numerical methods for SDDEs with small noise and applications to biological sciences

1st supervisor: Prof Chenggui Yuan
Department/Institution: Department of Mathematics, Swansea University
2nd supervisor: Dr Mike Fowler
Department/Institution: Department of Biosciences, Swansea University
Research theme: T3 - novel mathematical, physical and computer science approaches

Project description: Multilevel Monte Carlo path simulation was proposed by Giles for stochastic differential equations. Let ε be a very small positive number, in order to make the mean-square-error between the true solution and the numerical solution by O(ε2), the computational complexity (cost) is O(ε-3) by Euler method, while the computational complexity is O(ε-2(log ε)2) if we apply the Multilevel Monte Carlo method, which reduces the cost a lot. Small noise stochastic differential equations (SDEs) are widely used in econometrics, finance, computational fluid dynamics, ecology, population dynamics and etc, and many numerical methods have been developed for small noise SDEs with the aim of improving efficiency. On the other hand, stochastic differential delay equations (SDDEs) are kind of processes that depend on the past states of the system. It plays an important role in theoretical and practical analysis. In this proposal, we will develop numerical simulations that combine multilevel Monte Carlo method with the EM scheme for SDDEs with small noise, and apply our new theory to biological models.

Project title: A Multiscale Modelling Approach to Study the Effects and Responses of DNA Damage Response (DDR) Inhibitor Drugs

1st supervisor: Dr Gibin Powathil
2nd supervisor: Dr Noemi Picco
Department/Institution: Department of Mathematics, Swansea University
Potential supervisor/collaborators: Professor Mark Chaplain, University of St Andrews Dr James Yates, AstraZeneca
Research theme: T3 - novel mathematical, physical and computer science approaches

Project description: The increasing complexity of clinical and biological effects of multimodality therapies often result in substantial challenges to the clinical and preclinical development of novel therapeutic drugs. Mathematical modelling, based on a systems approach, informed by experimental data can be often very helpful in understanding and studying the multiple (nonlinear) therapeutic effects and responses of these drugs, helping the preclinical design and development, and its clinical implementation. The multiscale complexity of cancer as a disease necessitates the adoption of a multiscale approach, incorporating appropriate mechanisms to obtain meaningful and predictive mathematical models to study the therapeutic effects and outcomes. This highly interdisciplinary project aims to develop a multiscale experimental data driven mathematical and computational models to study and analyse the effects and efficacy of DNA damage response inhibiting drugs. Once the model is developed and fully calibrated and validated, it will be used to study optimal sequencing, scheduling and dosing alone and in combination with multimodality therapies. It will be also used to inform in vivo and preclinical studies, moving a step closer to potential drug development and clinical trial designs.

Project title: Multi-dimensional time series analysis with large scale hypothesis testing and geometric dimension reduction

1st supervisor: Dr Farzad Fathizadeh
Department/Institution: Department of Mathematics, Swansea University and Guest scientist at the Max Planck Institute for Biological Cybernetics
2nd supervisor: Prof Biagio Lucini,
Department/Institution: Department of Mathematics, Swansea University
Research theme: T3 - novel mathematical, physical and computer science approaches

Project description: Advances in concurrent recording of behaviours and neural activities in the intact brain have led to an invaluable source of information for fathoming into the properties of brain networks, and for determining the statistical properties of animal cognition and social behaviour. Multi-modal recordings of neural activities at different spatiotemporal scales are accompanied with considerable noise and require advanced and novel analytical and statistical techniques for signal detection in the corresponding time series. In previous work, a statistical method for the detection and sorting of neuronal signals in noisy time series through large scale hypothesis testing and the so-called geometric learning has been devised. Geometric learning is a method that associates a graph to a given data set; one can then read off the local geometry of the data in the heat kernel of the Laplacian of the graph (viewed as an approximation of the Laplacian of a curved geometry). In this project, this analysis technique will be generalised to multi-dimensional (correlated) time series. Part of the project will be about detection of decision boundaries for hypothesis rejections by simulations, and working out theoretical aspects of the observed boundaries.

For details on how to submit your application, see the Applications page.