Advanced Interaction:
Synergistic Data Mining

Investigators: Russell Beale

In data mining, or knowledge discovery, we are essentially faced with a mass of data that we are trying to make sense of. We are looking for something “interesting”. Quite what “interesting” means is hard to define, however - one day it is the general trend that most of the data follows that we are intrigued by - the next it is why there are a few outliers to that trend. “Interesting” is an essentially human construct, a perspective on relationships between data that is influenced by tasks, personal preferences, past experience and so on. Interest, like beauty, is in the eye of the beholder. For this reason, we cannot leave the search for knowledge to computers alone. We have to be able to guide them as to what it is we are looking for, which areas to focus their phenomenal computing power on. In order for a data mining system to be generically useful to us, it must therefore have some way in which we can indicate what is interesting and what is not, and for that to be dynamic and changeable. Many data mining systems do not offer this flexibility in approach: they are one-shot systems, using their inbuilt techniques to theorise and analyse data, but they address it blindly, unable to incorporate domain knowledge or insights into what is being looked for; they have only one perspective on what is interesting, and report only on data that fit such a view.

In order to provide an indication of interest, we need to provide the user with some representation of the data that they can interact with. We use visualisation techniques to present an abstract representation of the data in order to achieve this. The human visual system is exceptionally good at clustering, at recognising patterns and trends, even in the presence of noise and distortion. By interacting with the raw data presented visually, the user can identify to the system the areas of interest, and focus the data mining onto exploring that part of the dataset.

Once we can ask the question appropriately, we then need to be able to understand the responses that the system gives us. The data mining system produces some information, be it classification of the data, association rules or other such information. Whilst complex statistical measures of the dataset may be accurate, if they not comprehensible to the users they do not offer insight, only description. It is desirable that a data mining system should be able to present comprehensible results, in an accessible manner.

We have developed a data mining system that satisfies these criteria, and are expanding and evaluating its capabilities.