A transducer is a piece of software that maps one stream of symbols on to another stream of symbols. We say that the program transduces one stream of symbols into another.
This section uses a spelling corrector as an example of a Finite State Transducer. The software used is taken from the Poplog library of NLP software as described by Gazdar and Mellish (1989).
With the FSTN given in previous sections, the software recognised the input. So the more elementary software simply stated that a word such as diva was a word defined by the FSTN, and in the more advanced system, a FSTN was organised to give syntactic category and grammatical feature information. A Finite State Transducer (FST) can be made to output symbols while the network is being traversed.
A conventional view of a FST is of a machine that has two tapes which pass under a reading head.
In its basic mode, if one tape is advanced by one symbol, then the other tape is advanced by the same amount. This statement doesn't say which tape is the "input tape" and which is the "output tape". An FST can be used in two ways. First, to check that one tape corresponds with another, for instance to check that two files are the identical. Second, to read a symbol from one tape and to output another symbol onto the other tape. This is what the example given above does. The left-hand tape is a string of letters: the right-hand tape outputs an ASCII code corresponding to each letter. Or you can view it the other way round and think of the input as being a stream of ASCII codes, each one of which is transduced into a letter.
In this example, we will be attempting to check the word train. We start by encoding the basic network for the valid sequence of letters:
arc( 1, 2, t). arc( 2, 3, r). arc( 3, 4, a). arc( 4, 5, i). arc( 5, 6, n). word(a, [a, a]). word(i, [i, i]). word(n, [n, n]). word(r, [r, r]). word(t, [t, t]).
The network has a more complicated form than used for the FSTN examples. The arc/3 facts give the start and end states and a key to another fact which acts as a dictionary. The word/2 facts acts as a dictionary. The first argument is the key and the second is a pair of symbols, for instance, [a, a]. The left-hand symbol represents the left-hand tape: the right-hand symbol represents the right-hand tape. So when the FST traverses from state 1 to state 2, it might read r on the left-hand tape and output r on the right-hand tape.
Next we need to add arc/3 and word/2 facts to represent some of the possible mis-spellings. For this example, we'll assume that mis-spellings consist only of letters being mistyped in pairs, so we would need to correct rtain to train. The extra arcs and "words" are:
arc( 1, 9, r_t). arc( 2, 10, a_r). arc( 3, 8, i_a). arc( 4, 11, n_i). arc( 8, 5, a_i). arc( 9, 3, t_r). arc(10, 4, r_a). arc(11, 6, i_n). word(a_i, [a, i]). word(a_r, [a, r]). word(i_a, [i, a]). word(i_n, [i, n]). word(n_i, [n, i]). word(r_a, [r, a]). word(r_t, [r, t]). word(t_r, [t, r]).
As an example, the arc from 1 to 9 can be traversed if the left-hand tape has r and the right-hand tape has t . This is followed by a transition from 9 to 3 which has requires that the left-hand tape has t and the right-hand tape has r . So the left-hand tape represents the mis-spelling and the right-hand tape gives the correct spelling.
We start with some constants. The special character '#' represents a jump label, which we don't use in this application.
initial(1).
final(6).
special('#').
The remainder of the code is the controller proper. transduce/3 moves from a start node to the next node with the two "tapes" represented by lists.
/* ************************************************ */
/* */
/* transduce/3 */
/* Arg 1: Start state */
/* Arg 2: Left-hand "tape" (list) */
/* Arg 3: Right-hand "tape" (list) */
/* Summary: defines a FST. */
/* Author: Taken from Gazdar, G and Mellish, C. */
/* (1989). Natural language processing in */
/* Prolog. Addison-Welsey. pp. 440-441. */
/* */
/* ************************************************ */
% 1 - terminating condition
transduce(Node, [], []) :-
final(Node).
% 2 - recursive condition
transduce(Node_1, String1, String2) :-
% there is an arc/3 fact
arc(Node_1, Node_2, Label),
% traverse the arc
traverse2(Label, String1, NewString1, String2, NewString2),
% now see if you can recursively get to the end of the net
transduce(Node_2, NewString1, NewString2).
/* ************************************************ */
/* */
/* traverse2/5 */
/* Arg 1: Pair of symbols on the tape */
/* Arg 2: Left-hand "tape" (input) */
/* Arg 3: Left-hand "tape" (output) */
/* Arg 2: Right-hand "tape" (input) */
/* Arg 3: Right-hand "tape" (output) */
/* Summary: traverses a FST arc. Arg 3 is Arg 2 */
/* without left-hand symbol from Arg 1. */
/* Arg 5 is Arg 4 without right-hand */
/* symbol from Arg 1. */
/* Author: Taken from Gazdar, G and Mellish, C. */
/* (1989). Natural language processing in */
/* Prolog. Addison-Welsey. pp. 440-441. */
/* */
/* ************************************************ */
% 1 - terminating condition - read symbol from both tapes
traverse2([Word1, Word2], [Word1|RestString1], RestString1,
[Word2|RestString2], RestString2) :-
\+ special(Word1),
\+ special(Word2).
% 2 - retrieve Symbol pair and recurse
traverse2(Abbrev, String1, NewString1, String2, NewString2) :-
word(Abbrev, NewLabel),
traverse2(NewLabel, String1, NewString1, String2, NewString2).
% 3 - left-hand symbol is a special character
traverse2(['#', Word2], String1, String1,
[Word2|RestString2], RestString2).
% 4 - right-hand symbol is a special character
traverse2([Word1, '#'], [Word1|RestString1],
RestString1, String2, String2).
% 4 - symbol is a special character
traverse2('#', String1, String1, String2, String2).
As an example of the use of the FST network and controller, consider the following queries. The first one is for a mis-spelling of train (starting from state 1), while the second is for a mis-spelling of rain (starting from state 2).
| ?- transduce(1, [r,t,a,i,n], S). S = [t,r,a,i,n] ; no | ?- transduce(2, [a,r,i,n], S). S = [r,a,i,n] ; no
For this example, we need a procedure to call the transducer software. This simply has two arguments: one for the left-hand tape, the other for the right-hand tape. When calling this procedure, it is necessary to have either one or other arguments instantiated.
test(String_A, String_B) :-
initial(Node),
transduce(Node, String_A, String_B),
write(String_B), nl.