Working with collections

Exercise

The exercise is to write the program DifferentWords, which prints out the set of different words in a given file, and their number. For example, if given the file containing

This is some sample text. Some text is sampled for this purpose, but this text is merely a sample.

then the output will be

a but for is merely purpose sample sampled some text this

Number of words: 11

Use the file sometext. There should be 2443 different words.

Notes

  1. Each word is output once only, regardless of how often it occurs in the file
  2. Punction should be ignored (i.e. stripped out)
  3. All words should be converted to lowercase

Hints

  1. Use the StreamTokenizer to read in the file word by word.
  2. Store the words in an appropriate collection, such as HashSet (an implementation of the Set interface).
  3. Using an iterator, print out the collection word by word.

Exercise 2

Write a class class BarChart with methods
public void add(double value)
public void draw(Graphics g)

that displays a chart of the added values. You use it by: creating a new one; adding some values; calling the draw method to draw it. You can assume that all the values added are positive. Hint: you must figure out the maximum of the values. Set a coordinate system so that the x-range equals the number of bars and the y-range goes from 0 to the maximum.

Exercise 3

A problem with predictive text on mobiles is that there are clashes: for example, the words "good" and "home" clash because they both have the key signature 4663. Nokia has asked you to investigate this problem, and your first task is to figure out the largest class of words all of which clash together, starting from a given body of text. Roughly speaking, your program has to read in each word of the body of text, compute its key signature, and store it along with the other words it has encountered with the same key signature. Then, when all the words have been read, it looks for the key signature with the biggest set of words stored against it, and prints out that set.
Hint: think about how to organise the data. One way is to keep it as a TreeSet of WordSigs, where a WordSig is a word and its signature. Your WordSigs would be ordered by the signature field. Another more efficient way would be to use HashMaps, but you'd have to read beyond what was said in this lecture.

Exercise 4

As in Exercise 1 we want to print out the words in a file, but for this program we also want to print out their count, in ascending order of occurrence. Thus, for the input file

This is some sample text. Some text is sampled for this purpose, but this text is merely a sample.

then the output will be

[a=1, but=1, for=1, merely=1, purpose=1, sampled=1, sample=2, some=2, is=3, text
=3, this=3]


Hints

  1. Use the Map interface. Study the program MapTest to see how it works.
  2. Similarly to what is done at the end of MapTest, you will need to define a comparator which orders the map entries by value (number of occurrences of the word), and then by key (word) when the number of occurrences is the same.
  3. Because collections can only store Object types, it is necessary to wrap primitive types like int with corresponding Object types, like Integer. Here is how it works:
    int n=6;
    Integer objInt = new Integer(n); // making a new Integer
    if (objInt.equals(objInt2)) ... // comparing two Integers
    int m = objInt.intValue(); // retrieving an int from an Integer

2001 Mark Ryan and Alan Sexton