Research

Research Interests

I am loosely interested in all aspects of the "data->information->knowledge->'wisdom' lifecycle", information extraction, storage and retrieval.

My Ph.D. has been focussed on Process Mining. Specifically, developing a probabilistic framework for the analysis and comparison of process mining algorithms. How do different algorithms learn? How much data should we use? What does noise mean and what should we do about it? What happens if the process evolves? How can we make it more general or easier to understand? Such a framework could provide the basis for objectively answering some of these questions.

Completing a Ph.D. takes a long time! My interests have evolved, so I am now less interested in answering specific questions from data, and more in building models of the underlying phenomena that gave rise to the data. Machine learning is partly about this - using the data as evidence to draw conclusions about the real world and build useful models of it. My next step will build on these interests, in Automatic Speech Recognition.

Phil WeberA process model

Process Mining - an introduction

Process Mining is the extraction of business process models from businesses' information systems' log files, although techniques are applicable to software processes, operating system processes, general network data flow, enterprise backups systems traffic, networked storage, understanding IT infrastructure/ support environment interactions, robotic interactions ...

Process Mining extracts (business) process models, which can be represented as ad-hoc directed graphs, in a business process modelling (BPM) language such as BPMN or BPEL, or formally using Petri Nets or Finite State Automata. Petri Nets are often used, due to their ability to capture concisely, complex models including parallelism, while being rigorously mathematically analysable.

The goals of process mining are to capture the 'reality' of the (business) process by looking at what is actually happening, to compare with the 'believed' process which may be held by management or analysts. Analysis of the process flow can include comparison between models, identification of bottlenecks, improvements, how decisions are made, and so on. Other 'perspective' can also be mined, such as social or organisational interactions.

Further Reading

See my publications. Here http://www.processmining.org/ is where the biggest community is; Process Mining in its current form probably started here.

I came to research after several years in industry doing systems analysis and design, development, and administration of Unix and Storage systems. There was always too much of

  • Complexity and data overload. Distributed and networked systems are far too complex so they just get 'managed', never understood. E.g. enterprise backups infrastructure, networked data storage. Yet information is available in log files etc., explaining what is occurring and where/when/how/why things are going wrong. How to extract useful information and act on it?
  • Information loss. The same problems are tackled and solved again and again, and any learning is lost as different people tackle the problem. Attempts to solve this include documentation, 'training', Sharepoint, bespoke scripting. How to learn, remember, and re-use the information/knowledge --> wisdom?

Of course, there is lots of research in these areas! Big Data, Cloud, model-based engineering, information retrieval, ... and still lots of opportunities!