Finite State Automata are the most widely known techniques for implementing morphological analysis, but not the only algorithm. Another seen in the literature is the stripping system.
The idea of the stripping system is simple. It is based on two kinds of knowledge.
/* ************************************************ */
/* */
/* Knowledge */
/* */
/* ************************************************ */
/* ************************************************ */
/* */
/* dictionary/3 */
/* Arg 1: Morpheme */
/* Arg 2: Syntactic category */
/* Arg 3: Grammatical features */
/* Summary: Base form dictionary for morpho- */
/* logical stripping algorithm. */
/* Author: P J Hancox */
/* Date: 9 February 1995 */
/* */
/* ************************************************ */
dictionary(navigate, verb,
features(trans('transitive+'), form('infinitive+'))).
/* ************************************************ */
/* */
/* rule/3 */
/* Arg 1: Inflected ending */
/* Arg 2: No. of letters to be stripped */
/* from end of word */
/* Arg 3: Ending of base form */
/* Summary: Morphological stripping rule. */
/* Author: P J Hancox */
/* Date: 9 February 1995 */
/* */
/* ************************************************ */
rule(form1(suffix(gable), adjective,
features(trans('transitive-'), form(_))),
remove(3),
form2(suffix(te), verb,
features(trans('transitive+'),
form('infinitive+')))).
The dictionary contains knowledge either about a stem or (as here) a word chosen to be the reference form (onto which inflected forms are matched). This dictionary contains information about the syntactic category and the grammatical features of the word.
The rule contains knowledge about the morphology of the language. This example program has only one inflection rule. Its first argument says that the suffix "gable" indicates an adjective with the grammatical feature "transitive-". The second argument says that the stripping algorithm should remove the last three letters of the word to be analysed. The third argument says that ending "te" should be added to the stripped form to give a verb that is "transitive+" and "infinitive+".
As an example, the word "navigable" could be stripped to "naviga" and then the ending "te" added to form "navigate".
The algorithm for a stripping system is more complex than that for a Finite State Automaton. At the top level it consists of finding a rule to use, matching the suffix given in the rule with the suffix of the input word, and then substituting the new suffix to give a base word to look up in the dictionary.
/* ************************************************ */ /* */ /* Algorithm */ /* */ /* ************************************************ */ /* ************************************************ */ /* */ /* stripping_alg/4 */ /* Arg 1: Word to be analysed */ /* Arg 2: Base word */ /* Arg 3: Category of Word (Arg 1) */ /* Arg 4: Features of Word (Arg 1) */ /* Summary: Morphological stripping algorithm. */ /* Author: P J Hancox */ /* Date: 9 February 1995 */ /* */ /* ************************************************ */ stripping_alg(Word, Base_Word, Category, Features) :- % retrieve a rule rule(form1(suffix(Suffix), Category, Features), remove(Count), form2(suffix(Suffix1), Category1, Features1)), % match Suffix from rule/3 to Word to obtain Base_Word reform(Count, Word, Suffix, Suffix1, Base_Word), % retrieve a dictionary entry for Base_Word dictionary(Base_Word, Category1, Features1).
The next section is more complicated. The reason is simple. In Prolog, it is difficult to perform simple string matching operations. To do them, we have to turn atoms into list of ASCII characters using the in-built procedure name/2. (Try the example name(atom, List). to see what this procedure does.) Matching the ends of lists is difficult, but matching the beginnings of lists is easy. In the suffix analyser we have to match the ends of words, so we use a neat trick: reverse the lists of characters so that we've changed it into a matching prefixes problem. We do have to remember to reverse our results again to get back to the original order - otherwise we'd produce nonsense.
In this procedure, reverse_ascii/2 simply changes a word into a reversed list of ASCII characters (very messy). match/2 matches the input word with the suffix; remove/3 removes the unwanted characters (ie the three characters "ble" in the example of "navigable"), and app/3 adds on the new suffix (to make "navigate").
This sounds complicated, but you don't have to have a detailed understanding of the program, line-by-line: just sufficient to understand what is happening.
/* ************************************************ */ /* */ /* reform/5 */ /* Arg 1: Counter (no. of letters to be */ /* stripped from end of Word */ /* Arg 2: Word to be analysed */ /* Arg 3: Suffix of Word (Arg 2) */ /* Arg 4: Suffix of Base_Word (Arg 5) */ /* Arg 5: Base Word to be looked up in dict. */ /* Summary: Removes Suffix from Word, adds suffix */ /* from rule and returns the Base_Word */ /* to be looked-up in dictionary. */ /* Author: P J Hancox */ /* Date: 9 February 1995 */ /* */ /* ************************************************ */ reform(Count, Word, Suffix1, Suffix2, Base_Word) :- reverse_ascii(Word, ASCII_Word_Rev), reverse_ascii(Suffix1, ASCII_Suffix1_Rev), % match Suffix1 with end of Word match(ASCII_Suffix1_Rev, ASCII_Word_Rev), reverse_ascii(Suffix2, ASCII_Suffix2_Rev), % remove 'Count' no of letters from end of Word remove(Count, ASCII_Word_Rev, Stem_Rev), % add Suffix2 to Stem and make into the Base_Word app(ASCII_Suffix2_Rev, Stem_Rev, ASCII_Base_Word_Rev), % turn word around reverse(ASCII_Base_Word_Rev, ASCII_Base_Word), % change it back into an atom name(Base_Word, ASCII_Base_Word). /* ************************************************ */ /* */ /* reverse_ascii/2 */ /* Arg 1: Atom */ /* Arg 2: List of Integers */ /* Summary: Arg 2 is a list of integers which */ /* correspond to the ASCII characters */ /* of Atom, but in reverse order. */ /* Author: P J Hancox */ /* Date: 9 February 1995 */ /* */ /* ************************************************ */ reverse_ascii(Atom, ASCII_Rev) :- name(Atom, ASCII_List), reverse(ASCII_List, ASCII_Rev). /* ************************************************ */ /* */ /* remove/3 */ /* Arg 1: Counter (integer) */ /* Arg 2: A list */ /* Arg 3: A list */ /* Summary: Removes a given number of elements */ /* from end of list. */ /* Author: P J Hancox */ /* Date: 9 February 1995 */ /* */ /* ************************************************ */ % 1 - terminating condition remove(0, List, List). % 2 - recursive condition remove(Count, [Head|Tail], List) :- Count > 0, % don't count below 1 !!! Count1 is Count - 1, remove(Count1, Tail, List). /* ************************************************ */ /* */ /* match/2 */ /* Arg 1: A list */ /* Arg 2: A list */ /* Summary: Matches elements of first list with */ /* elements of second list. */ /* Author: P J Hancox */ /* Date: 9 February 1995 */ /* */ /* ************************************************ */ % 1 - terminating condition match([], Stem). % 2 - recursive condition match([Letter|Letters], [Letter|Word]) :- match(Letters, Word).
The remainder of the program consists of utilities that are defined in standard textbooks. There is no claim made for them being the most efficient way of programming these procedures: they work!.
/* ************************************************ */ /* */ /* reverse/2 */ /* Arg 1: A list */ /* Arg 2: A list */ /* Summary: Arg 1 is Arg 2 in reverse order. */ /* Author: P J Hancox */ /* Date: 9 February 1995 */ /* */ /* ************************************************ */ % 1 - terminating condition reverse([], []). % 2 - recursive condition reverse([Head|Tail1], Reversed) :- reverse(Tail1, Tail2), app(Tail2, [Head], Reversed). /* ************************************************ */ /* */ /* A standard definition of append/3 */ /* */ /* ************************************************ */ % 1 - terminating condition app([], L, L). % 2 - recursive condition app([H|L1], L2, [H|L3]) :- append(L1, L2, L3). /* ************************************************ */ /* */ /* End of program */ /* */ /* ************************************************ */
Finite State techniques have the edge over Morphological Stripping for several reasons. The networks allow the expression of control: that is the network specifies the order in which tests will be made. If the network is deterministic, there will never be any point at which there could be one than one possible rule to follow. This contrasts with Stripping Systems which can have conflicts of rules to be applied. Moreover, because the rules are declarative, it is difficult to impose a preferred order of application of rules without adding a complicated extra level of non-declarative control.
Finite State Automata require extensive effort in constructing networks, but this is often done using software to automatically create the network. Stripping systems require a dictionary on which to base its rules, but if such a dictionary is available in machine-readable form, this makes the method rather more attractive.