Leandro L. Minku
Leandro L. Minku

Office UG39
School of Computer Science
University of Birmingham
Edgbaston
Birmingham, B15 2TT
UK

L.L.Minku ._at_. cs.bham.ac.uk
+44 (0)121 414 6822

Projects

Stable Prediction of Defect-Inducing Software Changes (SPDISC)

Principal investigator: Dr. Leandro L. Minku.
Keywords: software defect prediction, concept drift, ensembles of learning machines.
Funding: Engineering and Physical Sciences Research Council (EPSRC).

Context: software systems have become ever larger and more complex. This inevitably leads to software defects, whose debugging is estimated to cost the global economy 312 billion USD annually. Reducing the number of software defects is a challenging problem, and is particularly important considering the strong pressure towards rapid delivery. Such pressure impedes different parts of the software source code to all receive equally large amount of inspection and testing effort.

With that in mind, machine learning approaches have been proposed for predicting defect-inducing changes in the source code as soon as these changes finish being implemented. Such approaches could enable software engineers to target special testing and inspection attention towards parts of the source code most likely to induce defects, reducing the risk of committing defective changes.

Problem: the predictive performance of existing approaches is unstable, because the underlying defect generating process being modelled may vary over time (i.e., there may be concept drift). This means that practitioners cannot be confident about the prediction ability of existing approaches -- at any given point in time, predictive models may be performing very well or failing dramatically.

Aim and vision: SPDISC aims at creating more stable models for predicting defect-inducing changes, through the development of a novel machine learning approach for automatically adapting to concept drift. When integrated with software versioning systems, the models will provide early, reliable and automated defect-inducing change alerts throughout the lifetime of software projects.

Impact: SPDISC will enable a transformation in the way software developers review and commit their changes. By creating stable models to make software developers aware of defect-inducing changes as soon as these are implemented, it will allow targeted inspection and testing attention towards defect-inducing code throughout the lifetime of software projects. This will reduce the debugging cost and ultimately lead to better software quality.

Proposed approach: an online learning algorithm will be developed to process incoming data as they become available, enabling fast reaction to concept drift. Concept drift will be detected using methods designed to cope with class imbalance, which typically occurs in prediction of defect-inducing software changes. Class imbalance refers to the issue of having a much smaller number of defect-inducing changes than the number of safe changes. The proposed approach will also make use of data from different projects (i.e., transfer learning between domains) to speed up adaptation to concept drift.

Novelty: SPDISC is the first proposal to look into the stability of predictive performance over time in the context of defect-inducing software changes. Most previous work ignored the fact that predictions are required over time, being oblivious of the instability of predictive performance in this problem. To deal with instability, SPDISC will develop the first online transfer learning approach for predicting defect-inducing software changes.

Ambitiousness: online transfer learning between domains with concept drift is not only a very new area of research in software engineering, but also in machine learning. Very few approaches exist for that, and none of them can deal with class-imbalanced problems. Therefore, SPDISC will not only advance software engineering by enabling a transformation in the way software developers review and commit their changes, but also advance the area of machine learning itself.

Timeliness: given the current size and complexity of software systems, the increased number of life-critical applications, and the high competitiveness of the software industry, approaches for improving software quality and reducing the cost of producing and maintaining software are currently of utmost importance.


Dynamic Adaptive Automated Software Engineering (DAASE)

Principal investigator: Prof. Xin Yao.
Keywords: software project estimation, project scheduling problem, ensembles of learning machines, online learning, concept drift, evolutionary algorithms.
Funding: Engineering and Physical Sciences Research Council (EPSRC).
My work on this project completed in August 2015.

DAASE aims to create a new approach to software engineering which places computational search at the heart of the process and products it creates and embeds adaptivity into both. This new approach will produce software that is dynamically adaptive, being not only able to respond to and fix problems that arise before deployment and during operation, but also to continually optimise, re-configure and evolve to adapt to new operating conditions, platforms and environmental challenges. DAASE will create an array of new processes, methods, techniques and tools for this new kind of software engineering, radically transforming both theory and practice of software engineering. As part of it, DAASE will develop a hyper-heuristic approach to adaptive automation. A hyper-heuristic is a methodology for selecting or generating heuristics. Most heuristic methods in the literature operate on a search space of potential solutions to a particular problem. However, a hyper-heuristic operates on a search space of heuristics.

Currently, I am researching into adaptive software prediction. Software prediction tasks are of strategic importance for software developing companies. An example of such task is software effort estimation. Overestimations may result in a company loosing contracts or wasting resources, whereas underestimations may result in poor quality, delayed or unfinished software systems. Most software prediction research neglects the fact that software prediction tasks operate in online changing environments. Models are typically trained on a set of projects and evaluated on another set of projects, without considering whether the training projects were really available before the testing projects. Besides possibly leading to incorrect conclusions, this results in inflexible prediction approaches that become obsolete with time. I am currently investigating the type of changes suffered by software prediction tasks and proposing new approaches to quickly adapt to these changes.


Software Engineering by Automated Search (SEBASE)

Principal investigator: Prof. Xin Yao.
Keywords: software effort estimation, project scheduling problem, ensembles of learning machines, evolutionary algorithms, online learning, concept drift.
Funding: Engineering and Physical Sciences Research Council (EPSRC).
Project completed in December 2011.


Online Ensemble Learning in the Presence of Concept Drift

Supervisor: Prof. Xin Yao.
Keywords: concept drift, online learning, ensembles of learning machines.
Funding: Overseas Research Students (ORS) Award and School Research Studentship (School of Computer Science, The University of Birmingham).
Completion: 2010.
Degree congregation: 2011.

Most machine learning algorithms operate in offline mode. They first learn how to perform a certain task, and then are used to perform this task. However, most practical problems change with time, i.e., they suffer concept drift. For example, the problem of predicting users' preferences in information filtering systems may involve changes in users' preferences; the problem of classifying webpages may involve changes in the most representative words of different webpage categories; the problem of credit card approval may involve changes in customers' reliability. Different from offline learning algorithms, online learning algorithms can be used to adapt to concept drifts based on newly incoming training examples. These algorithms do not have a separate training and testing phase, but learn throughout their lifetime as they are used to perform a certain task, by processing each new training example separately and then discarding it.

Due to the practical need for adaptive learning systems, there has been an increasing number of works on online learning algorithms able to deal with concept drift. In particular, online ensembles of learning machines have been used. However, there has been no deep study of why they can be helpful for dealing with drifts and which of their features can contribute for that. This thesis mainly investigates not only how ensemble diversity affects accuracy in online learning in the presence of concept drift, but also how to use diversity in order to significantly improve accuracy in changing environments. This is the first diversity study in the presence of concept drift. The main contributions of the thesis are:


EFuNN Parameters Optimisation and EFuNN Ensembles Construction

Supervisor: Prof. Teresa B. Ludermir.
Keywords: online parameters optimisation, numeric parameters optimisation, fuzzy neural networks, ensembles of neural networks.
Funding: Brazilian Council for Scientific and Technological Development (CNPq).
Degree congregation: 2006.

Evolving Connectionist Systems (ECoSs) are systems composed by one or more neural networks whose structures adapt according to the data in a continuous interaction with the environment. Evolving Fuzzy Neural Networks (EFuNNs) are ECoSs which join the neural networks functional characteristics to the power of fuzzy logic. Fuzzy systems have been showing to be very efficient to represent and reason about uncertain knowledge. This is very important, as, many times, human knowledge is uncertain.

A key challenge in Artificial Intelligence is to create systems that are able not only to represent human knowledge and reason about it, but also to evolve and adapt their structures in a changing environment. This kind of system is able to model processes that continually develop and change over time, e.g., biological data processing, electricity load forecasting and adaptive speech recognition. A system with these characteristics needs to be able to tune its parameters in an on-line manner, according to the environment. EFuNNs have some adaptable parameters and their structures can also adapt according to incoming data. However, they still have many parameters that are fixed before the learning and have great influence on its results. The problem of using a fixed set of parameters is that an optimal set to a particular state of the environment can be unsuitable when the state of the environment changes.

In this work, two new techniques which use evolutionary algorithms to evolve the EFuNN parameters in an on-line manner were developed. These techniques are able to create fuzzy systems that are completely tunable, according to unpredictable and unknown environments. The techniques showed to be able to have better accuracy than the techniques existent in the literature to evolve EFuNN parameters in an on-line manner.

Besides the necessity to create new techniques to allow changing environments to be represented, it is always important to develop approaches with increasing generalization capabilities and lower execution time. Ensembles of neural networks have formally and empirically shown to outperform systems composed by only one neural network. Thus, this work also proposes a new approach to create ensembles of neural networks, e.g., ensembles of EFuNNs. The approach uses a clustering method and co-evolutionary algorithms to create the ensembles in an innovative way, explicitly partitioning the input space, in order to allow the networks that compose the ensemble to specialise in different parts of it and work in a divide-an-conquer manner. The approach showed to be able to improve the accuracy of single EFuNNs generated using evolutionary algorithms similar to the co-evolutionary algorithms used in the approach. Furthermore, the execution time of the approach is lower than the execution time of evolutionary algorithms to generate single EFuNNs.