HELP CREF A.Sloman March 1988 The use of the name CREF is provisional. It may be changed to avoid a clash with the browser in KEATS. A possible alternative name is TAT (text analysis tool). CONTENTS - (Use g to access sections) -- CREF - an introduction -- Explicit and implicit keys -- What CREF does -- Logical subsets -- How to use CREF -- A tutorial example, using TEACH VED -- Some options (chatty, showentries) -- Restarting: clearentries -- Utilities operating on fragments -- Browsing with logical subsets -- Setting up a logical subset: GET and conditions -- Examples of conditions -- Semantics of conditions -- Use of ^^ in conditions -- GET and STORE -- Traversing logical subsets: FIRST, NEXT, PREV, LAST -- Move or copy all fragments in a subset to another file -- Marking a logical subset: mark -- Structured keys -- Summary of facilities -- Global variables -- Utility procedures -- Summary of CREF commands -- ved_clearentries -- ved_copy_subset -- ved_cref -- ved_df -- ved_first -- ved_get -- ved_last -- ved_mark -- ved_mf -- ved_mfo -- ved_move_subset -- ved_next -- ved_nf -- ved_pf -- ved_prev -- ved_sf -- ved_store -- ved_tfo -- Acknowledgements -- [TO BE CONTINUED - SUGGESTIONS WELCOME] -- CREF - an introduction --------------------------------------------- CREF is an experimental VED-based document analyser/browser/cross- referencer. The assumption is that you have a file containing information that you wish to analyse, for example a transcript of an interview, or a set of notes for a book. You wish to re-organise the material in any one of a variety of different forms, for instance: 1. A statistical summary of different kinds of texts. 2. A theory about the concepts, beliefs, and rules used by the person interviewed 3. A new draft of the book, with the material re-organised more logically. The CREF package does not automate any of this, but provides a tool to help you in the process. It assumes the text is composed of fragments, each of which is associated with a set of keys. Keys can either be explicit artificial symbols that you have associated with the fragments as a result of reading them and labelling them in various ways, or else words or portions of words that occur in the original text. The former we call explicit keys the latter implicit. (At present it doesn't handle phrases sensibly, so you need to map significant phrases into labels associated with fragments, then use the labels as explicit keys.) CREF builds indexes mapping text fragments onto keys and keys onto fragments. Collections of fragments can be defined as logical subsets, in terms of boolean combinations of keys. E.g. a logical subset may be defined as the set of fragments containing certain keys and not others. Commands are provided for traversing logical subsets, e.g. going to the first or last element, or the next or previous element in a subset. Fragments in a logical subset can be automatically marked with the name of that subset. So in future the subset will be found directly and need not be defined explicitly. Facilities are provided for doing things to the current fragment (e.g. mark, move or copy it), move to its start, move to the next or previous fragment, etc. Because the package is simply a collection of POP-11 global variables and procedures, based on the editor procedures, it is very easily extendable and tailorable. -- Explicit and implicit keys ----------------------------------------- You can associate explicit keys with a fragment by inserting them between square brackets in the text (starting on a new line), e.g. [hypothesis premiss model-change] Note that the only delimiters are spaces, tabs and newlines, so that explicit keys can contain arbitrary characters apart from these, or "]". You can give CREF a list of words to search for as implicit keys. In a preliminary sweep it finds all the labels you have inserted into the text as explicit keys, and adds them to the list. When it has found which fragments contain which keys, CREF enables you to define logical subsets of fragments and to explore them. -- What CREF does ----------------------------------------------------- When requested to build its indexes, CREF then does the following using the list of implicit and explicit keys. 1. It numbers all the fragments in sequence, inserting before each fragment a label containing the number, e.g. <|25|> for fragment number 25. 2. It builds up two indexes: The key-to-fragment (KTF) index: this associates with each key a list of the fragments tagged with that key (actually it uses the fragment numbers). The fragment-to-key (FTK) index: this associates with with each fragment number a list of the keys found there. 3. If required, it will insert in braces before each fragment, immediately after the label, an explicit alphabetically sorted list of the keys found associated with the fragment, e.g.: <|25|> {check electric engine fault hypothesis} I needed to check the hypothesis that the fault in the engine was due to a poor electrical connection. (Even if this is not inserted in the text, the FTK index has all the information immediately accessible. But sometimes the visual presentation is helpful: you can then discover that there ought to be a new label associated with the fragment, and extend your list of keys.) 4. At the top of the file it inserts an index mapping all the keys to the numbers of the fragments containing them. This enables you to see which keys have been found in lots of fragments, which in few, etc. For example given the list of keys: [key number FTK KTF fragment organise explicit implicit list] it built the following index for a draft of a portion of this text: 66 FRAGMENTS FOUND. 9 USED KEYS: INDEX: [[FTK 20 22 23 25 35] [KTF 20 23 25 35] [explicit 12 13 15 16 18 21 23 25 31 35 37] [fragment 12 13 15 16 19 20 21 22 23 25 27 29 31 34 35 37 39 43 45 46 47 49 51 54 56 59 60 64] [implicit 4 12 14 16 18 23 25 35] [key 4 12 13 14 15 16 18 20 21 22 23 25 29 30 31 35 39 41 46 49 60 65] [list 16 18 20 21 22 23 25 35 37 39 64 65] [number 19 20 23 25 30 35 36 41 43 51 54 64 65] [organise 7 10 23 25 35]] -------------------------------------- -- Logical subsets ---------------------------------------------------- Various utilities are then available for forming different logical subsets of the fragments and scanning through them or doing other operations. A subset can be given a name, so that you can later refer to it. For example, you can define a logical subset by means of a specification like: [and [not a] [or b c d] e f [not [or g h]]] This will define a subset consisting of all the fragments whose associated set of keys satisfies the condition "a" is not in the set One of "b" "c" "d" is in the set "e" and "f" are in the set Neither "g" nor "h" is in the set Effectively any boolean combination of keys can be specified using "and" "or" or "not", where "and", "or" and "not" are allowed any number of arguments. Additional operators are explained below. Together with the provision for associating explicit keys with fragments to represent concepts that cannot be defined as boolean combinations of existing keys, this provides a powerful analytic tool. -- How to use CREF ---------------------------------------------------- First of all prepare the file to be analysed. This preparation consists of three main activities. a. Divide the file into fragments separated by one or more blank lines. Do not leave any blank lines within a piece of text you wish to be regarded as ONE fragment. It suffices to put a dot, or a hyphen at the beginning of the otherwise blank line to indicate to CREF that the fragment continues on the next line. b. Collect a list of words to be used as implicit keys and in a separate program file assign the list to the variable cref_keywords, e.g. [key number FTK KTF fragment organise explicit implicit list] -> cref_keywords; Note that these will be found even if they occur as sub-words of other words, so "number" will be found in "numbering". (This gives more flexibility, but might be made a switchable option). [At present there is no mapping between upper and lower case, but this will be added.] You need not specify a list of implicit keywords if you provide explicit keywords, as explained below. c. Associate with each fragment a list of explicit labels in square brackets starting at the beginning of a line. (Should this restriction be removed - it speeds up the preliminary survey.) It may be advisable to keep a copy of the original file, in case you later wish to start again. CREF does, however, provide a utility to restore the file to its state after (a) to (c) and before CREF runs. Decide whether you wish to have the keywords found inserted in curly brackets (braces) before each fragment. This happens by default. It is turned off by false -> listfoundwords (for which a VED command can be defined). Now run the program with the command: cref Building the index takes a time that depends on the number of keywords and the length of the file. On a 635 line file with 136 entries and 51 keywords it took 6.2 seconds of CPU time on a Sun3/280 and about 7 second real time. (The program could be optimised if required.) If you wish the index not to be printed at the top of the file, but in another file, then add the name as an extra argument, i.e. cref -- A tutorial example, using TEACH VED -------------------------------- TEACH VED is an online file explaining out to use VED. It can be analysed using CREF as follows. First mark and load the following assignment, to set up the cref_keywords for CREF: [beginning break buffer character column command compile condition define delete end END ENTER file function graphic help HELP insert introduction key length left line load mark mode move position quit range refresh right screen scroll search size start substitution TEACH terminal TOP type vertical width window write] -> cref_keywords Then do teach ved To get the file into VED. Then do: cref This will get all the fragments labelled with the keys they contain. An index will be inserted at the top of the file. E.g. you can search for fragment number 53 /|53| If you wish to examine the subset of fragments containing the word "ENTER" do get ENTER Then give the following commands to move around the subset first, last, next, prev Logical subsets can be defined in terms of boolean combinations of other subsets. For example, you can define a new logical subset of all the fragments containing either the word "start" or the word "beginning" thus: get [or start beginning] That collection of fragments is now the current logical subset, and various operations are available on the subset. E.g. first Takes you to the first fragment in the subset. Then try, a few times next Check that each fragment found includes one of the keys specified, i.e. "start" or "beginning". Then try, a few times: prev check that the fragment found includes one of the keys specified store TOP Makes "TOP" the name of the logical subset set defined above using "get" :TOP Prints the list of fragment numbers corresponding to the logical subset. TOP can now be used as the argument for various commands. first TOP last TOP If you just wish to move to the immediately preceding fragment or the next one use the pf and nf commands. Try, a few times. pf to go to previous fragment, i.e. the one immediately preceding current fragment. By contrast pf TOP go to previous fragment in the set TOP. I.e. with an argument 'pf' works like 'prev'. Similarly, try a few times: nf To go to next fragment. Then with an argument 'nf' works like 'next' nf TOP prev next go to previous or next fragment in the "current" logical subset, whereas pf and nf without arguments go to the numerically next fragment. mark TOP This adds "TOP" as an explicit key for all the fragments in the logical subset. You can search for occurrences of TOP to check that it has been added to all the relevant fragments. It will occur between square brackets, at the top of the fragment. clearentries This removes all the fragment headers, and the index put at the top of the file. However, it leaves things marked with [TOP] -- i.e. the explicit key is not removed, and this is useful if you then re-process the file, thus: cref Now the set of fragments with keyword "TOP" includes all those that previously had "start" or "beginning", so it is no longer necessary to define it as a logical subset. Now get TOP will get all the fragments with TOP as implicit or explicit keyword, and the commands first, last, prev, next will work as before. get file Makes the current logical subset all the fragments containing "file". store FILE makes "FILE" a name of the logical subset mark will mark it as an explicit keyword for every fragment in the subset, and this, like TOP, will not be remove by clearentries. -- Some options (chatty, showentries) --------------------------------- While analysing the file, CREF announces every twentieth fragment on the command line, to show it is doing something. If you wish every fragment number to be announced do true -> chatty; This slows things down a bit. If you wish each fragment to be shown on the screen as it is processed, do true -> showentries; This also shows the keywords found for each fragment, if insertion is switched on. Naturally showing each fragment slows things down a lot, depending on the speed of your terminal. -- Restarting: clearentries ------------------------------------------- If you don't like the result of CREF and wish to start again using a different set of keys, or with a different decomposition into fragments, you can restore the file to its pre-CREF state with the command: clearentries however, this will not remove explicit keywords added by mark. If you have saved a copy of the original file, it may be quicker to clear the file and read in the saved copy. CLEARENTRIES will be useful if you have done some re-ordering of fragments, and would like to have the indexes re-built using the new ordering as a basis for numbering, or if you have added new explicit keys. -- Utilities operating on fragments ----------------------------------- Once the index is built you can browse using a collection of utilities provided. Normal VED searching, moving, copying commands are available unchanged. To find fragment number 23, you can search for it as normal, e.g. /23 or, to make sure you don't find fragment 223 or 236, do: /|23| There is also a procedure cref_goto_num defined below. Other facilities nf move to the next fragment pf move to the previous fragment NOTE If given an argument then nf and pf work like next and prev, defined below. sf move to the start of the current fragment mf This command will Mark the current Fragment. It can then be copied, deleted, etc. mfo This Moves the current Fragment Out. It assumes there is another VED file that you edited before the current one, and the fragment will be moved into that file. Optionally an argument can specify the other file to be used. It works like mo, as described in HELP * INOROUT. tfo This Transcribes the Fragment Out. I.e. a copy is made in the other file, which can be optionally specified as an argument. This works like to, as described in HELP *INOROUT -- Browsing with logical subsets -------------------------------------- The browsing facilities enable you to link different threads through the text fragments in the file. Then by pulling on a particular thread you get all the fragments associated with it. The threads are called logical subsets, and they can be given names. The logical subsets are defined by boolean conditions on keywords associated with fragments. The program supports a notion of the "current" subset, so commands which don't mention a subset by name refer to the current subset. The GET command is used to define a subset, STORE to give it a name, and FIRST, NEXT and LAST to traverse it, MARK to mark all the fragments in the subset with an explicit keyword. -- Setting up a logical subset: GET and conditions -------------------- get Given a condition as argument, this finds all fragments for which the condition is true, using the indexing information last built up by CREF. The fragment number are stored in a list, which is then available for subsequent manipulation. A condition is either a keyword or a list starting with one of the operators: "not" "or" "and" "is" and followed by arguments which are conditions. There is no limit to the number of arguments used with an operator. Each argument is either a key word or a condition, or, in the case of "is", a name of a previusly defined logical subset. For example: 1. The word "foo" defines a logical subset containing all those fragments which include the keyword word "foo" (which must either have been in the list -cref_keywords- used when ved_cref was run, or must have been an explicit keyword added to the list by CREF). 2. [not a b c d] defines a logical subset of fragments containing none of the keywords "a", "b", "c" or "d". 3. If s1, s2 and s3 are already defined, then [is s1 s2 s3] defines a logical subset of fragments common to all of them. -- Examples of conditions --------------------------------------------- The following are examples of conditions, where "a", "b", etc are keywords and "s1" and "s2" are names of previously defined logical subsets: b [and a ] [or a b c d] [not a b] [and a [or b c d] e [not f] g] [or [and a [or b c d] e [not f] g] [and k [not l]]] [is s1 s2] [or [is s1] [is s2]] For complex conditions used frequently it is normally useful to define new VED commands, to save typing. -- Semantics of conditions -------------------------------------------- The semantics of a condition depends on the main operator and the additional arguments. The following is an informal exposition. 1. A keyword is a condition that is true of a fragment if the fragment includes the keyword. 2. [not ....] This condition is true of a fragment if and only if all the arguments represent conditions that are false of that fragment. 3. [and ...] This condition is true of a fragment if and only all the arguments represent conditions satisfied by the fragment. 4. [or ...] this condition is true of a fragment if and only if at least one of the arguments is true of the fragment. 5. [is ...] this condition is true of a fragment if the arguments are all names of logical subsets and the fragment is in all the logical subsets -- Use of ^^ in conditions -------------------------------------------- Since conditions are POP-11 list structures, the normal POP-11 list constructor facilities are available, in particular "^^". (See TEACH * ARROW). So in order to form a logical subset containing all fragments with at least one of the keywords, use the condition [or ^^cref_keywords] For a subset including all the keywords except a and b use: [and [or ^^cref_keywords] [not a b]] To form a subset containing fragments which include ALL the keywords use [and ^^cref_keywords] (this is usually an empty subset!). If the value of "cref_keywords" is the list [a b c d e f ...] then this condition is equivalent to [ and a b c d e f ...] Warning: if you use the VEDEXPAND mechanism with `^` as the expand character, (see TEACH VEDEXPAND from POPLOG Version 12.2) then "^" will have to be represented by "^^" and "^^" by "^^^^" in he command line -- GET and STORE ------------------------------------------------------ GET goes through the whole list of fragment numbers looking at the associated keywords, and checking the condition given as argument. It makes a list of all the numbers of fragments that satisfy the condition, and stores the list in the global variable "current_subset", used by other utilities. The STORE command allows a logical subset to be given a name. The FIRST and NEXT commands allow you to traverse logical subsets. Commands can easily be defined for copying a logical subset to another file. store This is used to assign the current value of current_subset to the . E.g. store red will give the name "red" to the list of fragment satisfying the last condition given to get. It warns you if was already a variable with a value, but nevertheless does the assignment. The is also remembered as the value of the variable current_name, so that it can be accessed by other utilities, e.g. ved_mark used without an argument. -- Traversing logical subsets: FIRST, NEXT, PREV, LAST ---------------- first first This moves you to the first fragment in a logical subset. If no argument is given, the logical subset is the one associated with the variable current_subset, namely the last explicitly mentioned subset. So first red will take you to the first fragment in the red logical subset. After that this will be the current subset. next next This moves you to the NEXT fragment in a logical subset. I.e. starting from the current fragment (the one the cursor is in) it checks the list of numbers in the required subset, and then moves to the next fragment in the file whose number is in that list. (If the fragment has been moved to a different part of the same file it will still work. I.e. next uses only the number labels produced by CREF not the ordering in the file.) If no argument is given, the logical subset is the one associated with the variable current_subset, e.g. the last explicitly mentioned subset. prev prev Like next, but goes to previous fragment. last last This is similar, except that it goes to the last fragment in the subset. -- Move or copy all fragments in a subset to another file ------------- copy_subset Will copy all fragments in the current subset to the last file edited. If given an argument, it takes that as the name of the file to which they should be copied. move_subset This is similar but the original entries are deleted. -- Marking a logical subset: mark ----------------------------- If a logical subset has been given a name, it is possible to add that name as a new explicit key at the top of every fragment in the logical subset. Suppose you have defined a subset called "FRED". In order to label its elements explicitly type: mark FRED Then if, for example, fragment number 31 in subset FRED looked like: <|31|> {character file insert type} Printing characters are inserted into the file at the current location as they are typed. After the mark command it would look like <|31|> {character file insert type} [FRED] Printing characters are inserted into the file at the current location as they are typed. If there is already an explicit set of keywords at the top in square brackets, then FRED is added immediately after the left square bracket. The clearentries command will NOT remove explicit keywords fed back into the text in this way, so the next time CREF is used on the file these new labels will automatically become keywords and it will no longer be necessary to define them in terms of logical combinations of other keywords. If mark is used withat an argument, then the value of variable -current_name-, whose valof should be curren_subset, will be used to define the logical subset to be marked explicitly. -- Structured keys ---------------------------------------------------- For many purposes it is necessary to have more complex descriptions associated with fragments than a list of labels. For example, there might be a set of themes running through the text, and you may wish to associate with a fragment the fact that it is part of a theme, which theme, and which part of the theme - opening, development, counterplot, conclusion, etc. Or you may wish to indicate that a particular fragment follows from some other particular fragment. The existing mechanism supports a relatively primitive way of doing this using structured keywords, though more sophisticated extensions are possible. A structured keyword might be a complex label composed of a set of keywords separated by dots, e.g. THEME.A.START THEME.A.DEV THEME.A.FIN These could either be assigned by hand to individual fragments as explicit keys (i.e. in square brackets in the text), or tools, perhaps menu-based, could be provided easily to reduce typing. It is then easy to define a logical subset that contains all the fragments which are part of the DEVelopment of THEME A, or all the fragments which are in THEME A, or all the fragments that are conclusions of any theme, etc. Note, however, that if you want to use ved_get to define logical subsets containing these, then since THEME.A.START is not one word but five according to POP-11 syntax, string quotes must be used. E.g. get [or 'THEME.A.START' 'THEME.A.DEV'] Similarly, to bring out the relationship of particular fragments to other particular fragments, certain fragments can be given unique names by adding the name before or after the fragment, then in another fragment the explicit keyword mechanism can be used to refer back to this one. E.g. [P_23] All men are mortal ... more fragments ... [P_35] Socrates is a man ... more fragments ... [P_90] Socrates is mortal [conclusion From.P_23 From.P_35] Ideally the machine should work out which conclusions follow from which premisses, but that's beyond the state of the art, so users will have to put in such links themselves. -- Summary of facilities ---------------------------------------------- -- Global variables --------------------------------------------------- chatty (default false) If true then print out more information during processing. current_name A word, the name of the last subset created, e.g. by ved_store. It is used by ved_next and ved_first. current_subset The last subset created. valof(current_name) is current_subset. fragment_key_index A vector of size N, where N is the number of fragments found, and the kth component of the vector is the list of keywords found in the kth fragment. key_fragment_index An ordered list of lists, each containing a keyword and a list of numbers of the fragments found to contain the keywords. All the explicit and implicit keywords are included. cref_keywords (default []) A list of keywords to search for as implicit keywords in the text. The explicit keywords are added to this list when CREF runs. listfoundwords (default true) If true insert before each fragment, in curly braces the explicit and implict keywords found in that fragment. showentries (default false) If true display each fragment as it is processed by ved_cref -- Utility procedures ------------------------------------------------- cref_fragment_number() -> Returns the number of the current fragment cref_fragments() -> Returns the list of fragment numbers corresponding to the keyword given cref_goto_num() -> Go to start of fragment with that number. Returns true if the fragment exists, otherwise false. findfragments() -> listfragments() -> is the sort of condition that can define a logical subset of fragments. The first procedure simply returns all the numbers of fragments satisfying the conditin. The second makes a list of them. store_entries(,) Declare the word as a variable if necessary, and assign the list to it. Also make it the current_name mark_entries() Go through all the fragments in the logical subset associated with giving them the word as an explicit keyword (i.e. in square brackets at the top of the entry. -- Summary of CREF commands ----------------------------------- Some of the commands can be given an optional argument, indicated as . Some of the commands, if given no argument, refer to the "current" logical subset (i.e. the last one created by ved_get, or the last one referred to explicitly). If they are given an argument it is assumed to be the name of a logical subset which is made the current one. -- ved_clearentries Remove all the fragment headers, and the index if it exists. The file can then be re-processed, e.g. after selecting a new list for cref_keywords. -- ved_copy_subset Copy all fragments in current_subset to the file named in . File defaults to last file edited (see HELP * INOROUT) -- ved_cref Process the current file. If no argument given, display the index at the top of the current file. If given then store the index in a file of that name. -- ved_df Delete the current fragment. Equivalent to ved_mf then ved_d. -- ved_first Go to first fragment of current logical subset. If given take it to be the name of the current logical subset -- ved_get Make a list of the numbers of fragments satisfying the condition, assign the list to current_subset, and make "current-subset" the current_name. are defined above. -- ved_last Go to last fragment associated with current subset. If given use that as name of current subset. -- ved_mark defaults to the value of current_name. If supplised, should be the name of a logical subset. All the fragments in that subsect will be given the name as an explicit keyword. Invokes mark_entries() -- ved_mf Mark the current fragment -- ved_mfo Move the current fragment out, to the last file edited, or to file named by if supplied. -- ved_move_subset Move all fragments in current_subset to the file named in . File defaults to last file edited (see HELP * INOROUT) -- ved_next Like ved_first, but go to next fragment in the file associated with the current subset (i.e. the next one after the current fragment). If is supplied then make the corresponding subset the current one. -- ved_nf Go to beginning of next fragment. If supplied then go to next fragment in logical subset of that name. -- ved_pf Go to previous fragment. If supplied go to previous fragment in logical subset of that name. -- ved_prev Like ved_next, but go to previous fragment in current logical subset, (i.e. the one before the current fragment). If supplied it should be the name of a logical subset. -- ved_sf Go to start of current fragment -- ved_store Associate the current subset (created by ved_get), with the name, and make it current_name -- ved_tfo Transcribe the current fragment out, to last file edited, or to file specified by if provided. The fragment is copied and the current file is unchanged. -- Acknowledgements --------------------------------------------------- This browser was inspired partly by learning that something similar existed in the KEATS system Developed by Marc Eisenstadt and colleagues at the Open University, in collaboration with British Telecom. It was partly inspired by members of the Alvey UK AI Toolkit committee who said that a tool such as this was most unusual and would be much in demand. It was partly inspired by conversations with Alex Morrison, of Cognitive Applications Ltd, who pointed out the need for structured keys to bring out the structure of the text. Most of the detailed ideas came from experimenting with an early draft of the program and finding how limited it was. -- [TO BE CONTINUED - SUGGESTIONS WELCOME] ---------------------------- Some possible extensions a. Improved user interface (more things done with keystrokes or menus?) b. Links with graphics etc - different windows into the same file - one window per logical subset c. A collection of packaged sets of labels and higher order constructs for use in a variety of application areas, e.g. analysing - fault diagnosis protocols, - design protocols, etc... d. Tools for storing the results of analysis in useful format, and for printing things out. e. Some statistical tools f. Make it ignore case (will be much slower?). Could be a disadvantage. Perhaps just ignore capitalised initial letters? g. Is using a blank line as a fragment delimiter OK? Allow other options? --- File: local/help/cref --- Distribution: csuna --- University of Sussex Poplog LOCAL File ------------------------------