Dr Peter Hancox

Senior Lecturer in Computer Science


Peter Hancox and Nikolaos Polatidis (2013). An evaluation of keyword, string similarity and very shallow syntactic matching for a university admissions processing infobot. Computer Science and Information Systems, vol. 10(4), pp. 1703-1726. DOI:10.2298/CSIS121202065H

"Infobots" are small-scale natural language question answering systems drawing inspiration from ELIZA-type systems. Their key distinguishing feature is the extraction of meaning from users' queries without the use of syntactic or semantic representations. Three approaches to identifying the users' intended meanings were investigated: keyword-based systems, Jaro-based string similarity algorithms and matching based on very shallow syntactic analysis. These were measured against a corpus of queries contributed by users of a WWW-hosted infobot for responding to questions about applications to MSc courses. The most effective system was Jaro with stemmed input (78.57%). It also was able to process ungrammatical input and offer scalability.

Peter Hancox (2012) Lodvick Verelst (1668-1704), "limner". Transactions of Worcestershire Archaeological Society, 3rd series, vol. 23, pp. 223-226.

Peter Hancox and Nikolaos Polatidis (2012) Query Matching Evaluation in an Infobot for University Admissions Processing. In: SLATE 2012: 1st Symposium on Languages, Applications and Technologies, Universidade do Minho, Braga, 21-22 June 2012. Schloss Dagstuhl-Leibniz-Zentrum für Informatik. ISBN: 978-3-939897-40-8; ISSN: 2190-6807. pp. 149-161. DOI:10.4230/OASIcs.SLATE.2012.149

"Infobots" are small-scale natural language question answering systems drawing inspiration from ELIZA-type systems. Their key distinguishing feature is the extraction of meaning from users' queries without the use of syntactic or semantic representations. Two approaches to identifying the users' intended meanings were investigated: keyword-based systems and Jaro-based string similarity algorithms. These were measured against a corpus of queries contributed by users of a WWW-hosted infobot for responding to questions about applications to MSc courses. The most effective system was Jaro with stemmed input (78.57%). It also was able to process ungrammatical input and offer scalability.

Peter Hancox (2005) Lexical Functional Grammar constraints and concurrent constraint programming. In: AI and Cognitive Science '05: proceedings of the 16th Annual Conference, University of Ulster, Coleraine, 7-9 September 2005. Coleraine: University of Ulster. ISBN 1-85923-197-7. pp. 309-318.

Lexical Functional Grammar allows grammar writers to use linguistic constraints to specify attributes and their values without using unification. The satisfaction algorithm for these constraints is within the generate-and-test paradigm and has the disadvantage of not being able to detect, at minimal cost, violations of the constraints as early as native speakers. Concurrent constraint languages, of which CHR is an example, allow searches to be incrementally constrained with goals delayed until they can be properly discharged. It is shown that linguistic constraints can be implemented in CHR to give early detection of satisfaction/violation of constraints, and also allows some further detection of redundancy and inconsistency.

Elliot Smith and Peter Hancox (2001) Representation, Coherence and Inference. Artificial Intelligence Review 15(4), pp. 295-323.

Approaches to story comprehension within several fields (computational linguistics, cognitive psychology, and artificial intelligence) are compared. Central to this comparison is an overview of much recent research in cognitive psychology, which is often not incorporated into simulations of comprehension (particularly in artificial intelligence). The theoretical core of this experimental work is the establishment of coherence via inference-making.

The definitions of coherence and inference-making in this paper incorporate some of this work in cognitive psychology. Three major research methodologies are examined in the light of these definitions: scripts, spreading activation, and abduction.

This analysis highlights several deficiencies in current models of comprehension. One deficiency of concern is the 'one-track' behaviour of current systems, which pursue a monostratal representation of each story. In contrast, this paper emphasises a view of adaptive comprehension which produces a 'variable-depth' representation. A representation is pursued to the extent specified by the comprehender's goals; these goals determine the amount of coherence sought by the system, and hence the 'depth' of its representation. Coherence is generated incrementally via inferences which explain the co-occurrence of story elements.

Shun Ha Sylvia Wong and Peter Hancox (1999) What is the lexical form of 'bei'? In: Jhing-Fa Wang and Chung-Hsien Wu eds. Language, Information and Computation: proceedings of the 13th Pacific Asia Conference, Taipei, 10-11 February 1999. Taipei: National Cheng Kung University, 1999. pp 169-176.

The lexical representation of the Chinese word '' has been an issue of on-going debate. The lexical form suggested by Her seemed to provide a complete representation of the different syntactic behaviours of '' within a Lexical-Functional Grammar (LFG) account. However, when applying this representation in conjunction with the argument structure (a-structure) and the lexical mapping theory in LFG, this representation conflicts with the lexical mapping theory. This paper examines this problem and proposes a solution to the problem when dealing with the lexical representation of ''.

Shun Ha Sylvia Wong and Peter Hancox (1998) An Investigation into the use of Argument Structure and Lexical Mapping Theory for Machine Translation. In: Jin Guo, Kim Teng Lua and Jie Xu eds. Language, Information and Computation: proceedings of the 12th Pacific Asia Conference, Singapore, 18-20 February 1998. Singapore: Chinese and Oriental Languages Processing Society, 1998. pp 334-339.

Lexical Functional Grammar (LFG) has been quite widely used as the linguistic backbone for recent Machine Translation (MT) systems. The relative order-free functional structure (f-structure) in LFG is believed to provide a suitable medium for performing source-to-target language transfer in a transfer-based MT system. However, the linguistic information captured by traditional f-structure is syntax-based, which makes it relatively language-dependent and thus inadequate to handle the mapping between different languages. Problems are found in the lexical selection and in the transfer from some English passive sentences to Chinese. The recent development of the relatively language-independent argument structure (a-structure) and the lexical mapping theory in the LFG formalism seems to provide a solution to these problems. This paper shows how this can be done and evaluates the effectiveness of the use of a-structures for MT.

Helen Gaylard and Peter Hancox (1995) A top-down approach to lexical acquisition and segmentation. Manchester: Manchester Metropolitan University, Centre for Policy Modelling. CPM report no: 95-18.

A major objection to top-down accounts of lexical recognition has been that they are incompatible with an account of acquisition, it being argued that bottom-up segmentation must precede lexical acquisition. We counter this objection by presenting a top-down account of lexical acquisition. This is made possible by the adoption of a flexible criterion as to what may constitute a lexical item during acquisition, this being justified by the extensive evidence of children's under-segmentation. Advantages of the top-down account offered over the bottom-up alternatives are that it presents a unified account of the acquisition of a lexicon and segmentation abilities, and is wholly driven by the requirements of comprehension. The approach described has been incorporated into an integrated model of acquisition processes, the incremental learning of which captures the gradual nature of child language development.

Helen Seville and Peter Hancox (1994) Phrase structure in a computational model of child language acquisition. In: AI and Cognitive Science '94: proceedings of the 7th Annual Conference, Trinity College, Dublin, 8-9 September 1994. Dublin : Dublin University Press. pp. 193-206.

The problem of the acquisition of morpho-syntactic rules, as addressed by a number of existing computational models, is introduced. A distinction is made between 'innatist' models which presuppose the importance of innate linguistic knowledge (specifically, syntactic categories and X-Bar Theory), and 'empiricist' models, which reject such assumptions. It is argued that 'empiricist' models better account for such features of child language acquisition as gradual development. However, existing 'empiricist' models are inadequate insofar as the grammars acquired are finite-state, as opposed to phrase-structure, grammars. The problem of phrase structure acquisition in an 'empiricist' model is addressed. The problem is characterized as that of developing a model of acquisition in which the kind of information embodied in X-Bar Theory can be directly derived from the semantic inputs to learning. The solution offered closely identifies phrase structure acquisition with lexical acquisition and segmentation. A computational model in which these processes have been implemented is described. This acquires a finite-state grammar and later a phrase-structure grammar, thereby providing a gradual and continuous model of child language development. The account given of lexical acquisition and segmentation is novel, in that lexical acquisition precedes the acquisition of segmentation.

Helen Seville and Peter Hancox (1994) The role of lexical acquisition and segmentation in the acquisition of phrase structure. In: International Workshop on Cognitive Models of Language Acquisition, Tilburg University, 21-23 April 1994. Tilburg: University of Tilburg, 1994. pp 63-65.

Peter Hancox (1994) The uniform treatment of constraints, coherency and completeness in a Lexical Functional Grammar compiler. In: AI and Cognitive Science '94: proceedings of the 7th Annual Conference, Trinity College, Dublin, 8-9 September 1994. Dublin : Dublin University Press. pp. 129-142.

Lexical Functional Grammar (LFG) describes linguistic structure in terms of a Functional structure (F-structure) that may be computed by the evaluation of equations written in the grammar and lexicon. Constraints can be placed on F-structure features to restrict the instantiation or non-instantiation of features and ranges of possible values assigned to features. F-structures must have qualities of coherency and completeness so as to be well-formed. An LFG compiler is briefly described and it is shown how constraints may be compiled so as to improve post-parse checking speed. Coherency and completeness are shown to be re-interpretable as constraints checks in their own right (although this is not explicit in LFG) and thus they may be treated in a way uniform with LFG's own constraints.

Jane Littlehales and Peter Hancox (1992) Integrated interfaces to publicly available databases. In: Cooper, Richard, ed. Interfaces to database systems (IDS92): proceedings of the first International Workshop on Interfaces to Database Systems, Glasgow, 1-3 July 1992. London: Springer, 1993. (Workshops in computing). ISBN: 3-540-19802-4. pp 41-55.

This paper looks at the drawbacks of existing user interface mechanisms for accessing databases, namely query languages, natural language and Query by Example, in relation to publicly available database systems. An integrated, multimodal interface, where natural language and pointing gestures to a graphical display can be combined into a single sentence, using deictic words such as this, here, or used individually, is proposed. This is seen as closely imitating human dialogue. The application chosen is a geographical database, taking advantage of a familiar atlas metaphor. The progress to date is described with results from the first stage of integration. A number of existing combined mode systems are analysed, particularly with respect to their degree of integration, gestural ambiguity and other error handling facilities.

Neil K. Simpkins and Peter Hancox (1990) Chart parsing in Prolog. New Generation Computing 8(2), pp. 113-138.

Several differing approaches to parsing using Prolog are discussed and their characteristics outlined, in particular Definite Clause Grammar (DCG), the Bottom-Up Parser (BUP) and the Active Chart Parser. Attention is paid to the conflict that arises between the simplicity and efficiency of the parsing algorithm when using a grammar specified as a linguistic, rather than computationally efficient, description of a sublanguage. A simple and efficient parsing algorithm called "`Word Incorporation" is described. Its efficient implementation in Prolog and extensions for handing literals, the Kleene star operator and gaps in grammar rules are described using experience gained with the unification-based formalism, Lexical Functional Grammar (LFG).

Richard A. Gatward and Peter Hancox (1990) Functional Grammar as a unification grammar: is it a worthwhile investigation? In: Hannay, M. and Vester, E. eds. Working with Functional Grammar: descriptive and computational applications. Amsterdam: Foris, 1990. ISBN: 90-6765488-4. pp. 117-127.

Peter J. Hancox; William J. Mills and Bruce J. Reid (1990) Keyguide to artificial intelligence and expert systems. London: Mansell, 1990. ISBN: 0-7201-2007-1.

Neil K. Simpkins and Peter Hancox (1989) An efficient, primarily bottom-up parser for unification grammars. In: International Workshop on Parsing Technologies, Pittsburgh, PA, 28-31 August 1989. Pittsburgh, PA: Carnegie-Mellon University, 1989. pp 399-400.

Peter Hancox (1989) A recursive algorithm for generating SLIC index entries. Program: electronic library and information systems 23(3), pp. 311-317.

Combination indexing is a method of ordering a set of terms or keywords on the syntagmatic axis rather than the paradigmatic axis. All terms applied to the document by the indexer are listed in all possible combinations to produce the index entries, and therefore the user should be able to search the index with no more knowledge of its structure beyond alphabetization. The disadvantage of combination indexing is the large number of entries produced. For n keywords, n! entries are produced. With a very few terms in each string this is not a serious problem, but when several terms are included in a string the number of entries soon reaches excessive proportions. Five terms would produce one hundred and twenty entries; seven terms, five thousand and forty entries.

Jacqueline Archibald and Peter Hancox (1988) A survey of deterministic parsers. In: Proceedings of the IASTED International Symposium, Grindelwald, Switzerland, 16-18 February 1988. Anaheim: Acta Press. ISBN: 0-88986-097-1. pp. 143-146.

Mitchell Marcus' motivation for developing the Deterministic Parser was to model an intuition of how humans process language. His PhD thesis (1977) describes PARSIFAL, an implementation of his idea and especially the determinism hypothesis

Other designers' implementations, which we examine, cover lexical ambiguity and computational aspects of the human sentence processing mechanism. Current work is directed toward the use of the Deterministic Parser in the limited text Machine Translation environment.

Peter Hancox and Frederick Smith (1985) A case system processor for the PRECIS indexing language. In: Informatics 8, advances in intelligent retrieval: proceedings of a conference..., Oxford, 16-17 April 1985. London: Aslib. ISBN: 0-85142-195-4. pp. 120-147.

The PRECIS indexing system has been successfully applied to languages other than English, and the British Library PRECIS Translingual Project devised an interlingual system (not implemented) to translate between English, French and German. An analysis of the system was used in the design of a transfer indirect machine translation system presented here to translate from English to French. To resolve the lack of one-to-one correspondence between prepositions, a semantic analysis of the case system or grammar type was imposed on a syntactic analysis of the string. Strings drawn from BLAISE were used to test the performance of the system. The results are presented and the weaknesses analysed.