Finding the Right Stuff

Russell Beale

School of Computer Science
University of Birmingham
Edgbaston, Birmingham

B15 2TT, UK
+44 121 414 3729

R.Beale@cs.bham.ac.uk

Extended Abstract

This position paper presents a perspective on the problems of finding information on the internet. The context of the interaction and the nature of the internet make the search activity very different in the electronic space compared to the physical world. We identify some of the behaviours commonly encountered in search situations, and identify the issues that arise. We also present an overview of a system developed to help resolve those problems.

 

Analysing search behaviour

"Looking for something" takes on a whole new set of meanings when applied to the electronic world. To appreciate why looking for things is more complex than we might initially imagine, we need to understand more about the context and nature of the interaction. The complexity has to be more than simply the vastness of the resource - libraries can have huge quantities of information in them too. Some blame for the difficulty must lie with the unordered, unstructured nature of the internet - and yet with items often linked to related items, there should also be some benefit here, and in any case the massive computing resources most people can call upon ought to have some ability to impose some sort of order onto this chaos. There is more to it than that - part of the problem is in how people perceive, use and react to the internet itself.

 

In a real library

In the non-electronic world, searching for information usually takes place in one of a few specific locations, most notably in a library (personal, community or commercial). Situated within that specific context, the task at hand is defined and restricted by the resources available. Organised index cards provide a global view of the catalogued and ordered information, allowing a search to have a defined starting point. If the aim is to find a specific atomic item - a book - then the index will allow the user to head straight there. If the aim is to find information on a specific subject, which may be contained within one book or more likely spread throughout many, then the search is developed through refinement and feedback until the correct stack and shelves and then books are found. The physical structure of the library also offers subtle supporting assistance to users. It supports physical memory - where things are and their spatial relationships - and visual memory - what a book looks like, how big or how old it is.

This reality of the library - its architecture, layout and quiet air - all lend themselves to the focussing of the goal of retrieving knowledge and information.

But no-one would think of going into a library to talk to their friends, or to watch a movie, or to listen to the radio. Precisely because the library is task-specific, the thought of these things is laughable.

 

Online

However, in the electronic world, we use the internet for a whole host of things. We use it as our information resource, which we wish to search and find things in. We also use it as an entertainment resource, and a communication tool. It therefore doesn't "feel" much like a library. Because the internet has to support this multiplicity of roles, it is unlikely to be particularly good at one of them if that is at the expense of the others. So from a practical perspective, the fact that it impacts our lives in many more areas than a library does makes it harder for it to fulfil the information retrieval tasks so effectively. This generality of usage also means that, from a psychological perspective, the user is not working in such a situated, focussed context, and so their searching behaviour is often mixed in with diversions into other forms of behaviour. This easy distraction also makes the effective finding of information difficult.

Coupled with this diversity of content is the new functionality of the internet; cross-linked information, multimedia, smaller atomic units, keyword and meaning-based search abilities, and so on. Because of these, users are modifying their search behaviours, and our understanding and hence the tools we develop needs to keep up with these social changes.

 

I'm sure I've seen…

Following this line of argument, we can identify a further behavioural aspect of user search. How often have you, without real direction, stumbled across a whole load of material that you scan and then move on, only to find that a few weeks later you're reminded of it and want to look at it in detail? Rediscovering this material can be very difficult, since it's not entirely clear what it is you are searching for in the first place. Often the only way to do it is to return to the original activity and try to recreate the unstructured approach that led you to it the first time. It is the very nature of the internet that potentially unrelated information is encountered when searching for something entirely different; it is the very nature of people that we are often distracted, entertained, amused, shocked or whatever by this, spend some time with it, then return to our previous activity. Simply finding things is not the issue; it's finding things that are directly relevant, without offering too much in the way of distractions.

 

Intelligent search

This is an example of the wider issue of returning things that are of relevance to the user, whilst realising that the user works in many different situations and the notion of relevance varies from context to context. One approach to this problem is to employ intelligent user modelling within the search systems. The Mitsikeru system we are developing addresses these issues in two ways. Firstly, browsing is done via a proxy server that acts as a cache for pages, and we bias any search towards pages that have been recently browsed. This means that pages that are an equally good keyword match but have been recently looked at are ranked much higher in the returned results. We augment this approach by building up a task-sensitive user model to determine the relevance of particular material, loosely based on latent semantic indexing and Bayesian statistics. This is used to cross-match pages and provide metrics on their similarity. This information is then clustered over time, building up areas of interest that the user has. These clusters are often closely related to tasks, but labeling them as such is more problematic since they also contain the elements of distraction. These clusters provide us with information about the tasks in themselves; together they form a profile of the user's interests.

 

Into the future

Mitsikeru is interesting in that it is also forward-looking. It looks at current pages and pre-fetches and analyses subsequent ones. The analyses of these future pages allows us to determine which are relevant to the task in hand (i.e. which fall into the current cluster), which are relevant to other tasks the user may have (i.e. those that fall into recent clusters), and so on. We then use this information to annotating the current page to provide guidance about which links are directly related to the current subject and which are relevant to the task in hand. This is achieved in a non-intrusive manner, through colour-coding the links, which provides subtle but immediate feedback.

 

Conclusions

What can we learn from all of this? I think that one of the big insights is that searching, per se, is not always the real issue. In fact, some would argue that searching on the internet is hardly a problem at all. There are a host of search engines, each with different characteristics, strengths and weaknesses, and for many people the internet is now the first choice of resource when it comes to looking something up. To use it effectively you need to understand something about the search engine, and to be aware that the veracity of information found cannot always be determined, yet it is still a major component of people's questing lives. Indeed, a new phrase has found its way into many commercial organisations: in response to a generally-emailed question, "DAFGS" is the reply (pronounced "daff-gus"). It stands for "Do a Flippin' Google Search" (or similar, depending on the company). The internet is the resource of choice for many, who access what they need through a combination of the available search tools, iterative refinement of their search terms, and an intuitive understanding of their own behaviour.

Tools such as Mitsikeru show us that by understanding some of the issues more clearly, it is not complex to provide more effective support for user behaviour.

It is the enhanced functionality of the internet, coupled with its diversity of context, and its freedom from adhering to one classification system or structure that makes things more complex. What we need to understand more completely is the way that people behave when searching, and how we can support that behaviour to provide more effective information retrieval mechanisms.