Software System Components 2 - Year 2, Semester 2, Week 9

Concurrency - Dr Steve Vickers

3: Thread examples

This week we look at some particular aspects and examples of threads.

At the end of this week's lectures you should know -

Examples from the library

Now you have seen the basic tools - threads, runnables, locks and conditions - for dealing with concurrency. It is possible to use these in many different kinds of thread interaction. In older courses you would have been taught how to implement some standard examples, but in fact in the Java 1.5 library this is already done for you. In Java, it is much better to use those library utilities.
If you move on to a language other than Java you may have to write your own implementations; but you should not find this so hard if you first get used to the standard patterns. Also, the Java library source code is readable, so you can see what algorithms are used there.

When I describe these utilities, I shall also mention some new features of Java 1.5 that you may not have seen before.

Producer-consumer buffer

This is a classic pattern which you can find described in many texts about concurrency.

Imagine a program where a number of threads (the producers) are producing items, and a number of other threads (the consumers) are taking those items and processing them in some way. Each consumer can process any item - it doesn't matter which thread produced it.

When an item has been produced, it is stored temporarily in a buffer, until a consumer is available to process it.

All these threads share a common resource: the buffer. When a producer stores an item in the buffer, or when a consumer removes it, they must have exclusive access to the buffer. This is because the buffer must store various pieces of information to stay, for instance, how many items it holds, where the next one to read is, etc. The code that stores or removes an item is a critical region, since if it interfered with part way through, the buffer will probably be in an inconsistent state (the buffer has invariants that must be maintained). Hence the buffer must be protected with a lock.

However, there are also two conditions involved. Even if a producer gets the lock, it cannot store an item if the buffer is full - it must wait for an item to be consumed. And if a consumer gets the lock, it cannot remove an item if the buffer is empty - it must wait for an item to be produced.

Hence the standard pattern here is to have some kind of collection for the buffer, protected by a lock with two conditions. (Exercise: try this. It is not too hard. One solution is to use an array for the buffer, with extra fields to say how many items it has and where the next one is to be stored. You must imagine the array arranged in a circle.)

In Java this can be done using the interface java.util.concurrent.BlockingQueue<E>. This is a new part of the collections framework.

What does <E> mean? In older versions of Java, the elements of a collection were treated as of type Object. That meant that when you read an element you had to cast it to the type it actually was. In Java 1.5 you can tell the compiler what type to use for the elements of a collection, and it can then check that you only store elements of the right type. The interface BlockingQueue<E> is a generic type, which means it covers a range of more specific possibilities specified by replacing E (which is like a type parameter) by an actual type. For example, BlockingQueue<String> is the type of blocking queue whose elements are all strings.

Other collection types, such as ArrayList, have also been made generic. This is a very useful feature of Java 1.5, and you should use it if you can. However, if you are unsure about generic types you can still use collection types in the old way, so that they are collections of Objects.

API for BlockingQueue

The API says -

Interface BlockingQueue<E>


Type Parameters:
E - the type of elements held in this collection

The type parameter E will also be used in the method declarations. The most useful are put and take.

void put (E o) - adds the specified element o (which must be of type E) to the queue, waiting if necessary for space to become available. o must not be null. You cannot store null elements in the queue.

Notice how the type parameter is used. For example, if your blocking queue has specific type BlockingQueue<String>, then put is really void put (String o).

E take() - gets the head of the queue, removing it from the queue. Waits if necessary for an element to be available.

Note - Both put and take can throw InterruptedException, for the same reason as sleep and await.

BlockingQueue itself is an interface, and so you cannot instantiate it. There are various classes implementing it. The one that most closely matches the classic producer-consumer buffer is java.util.concurrent.ArrayBlockingQueue<E> (generic, again!). Its constructor can have an integer parameter to say how many elements the buffer can hold.
Example: producer-consumer
The example is in four classes NamedInt, IntProducer, IntConsumer and ProdConTest. NamedInt is for objects that store a string and an int. These pairs are the items that are produced and consumed. An IntProducer is a run object that repeatedly reads integers from an input frame, then pairs them with its own ID and stores them in a buffer. An IntConsumer is a run object that repeatedly takes NamedInts from the buffer, and then displays messages showing the producer and consumer IDs, the integer n, and the nth Fibonacci number. The Fibonacci algorithm used is very inefficient, so the time taken by the consumers can vary a lot. The main program (in ProdConTest) creates a buffer, of type BlockingQueue<NamedInt>, and starts two producer threads (1 and 2) and two consumer threads (A and B) using it.

When you run this, you get two input frames, for the two producers. Actually, that's fairly pointless, since you cannot possibly type in more than one at a time. However, you will see that they both work. They are both sending items to the same buffer, so the choice of consumer is independent of the choice of producer.

Normally the consumer threads alternate, but if you give one of them a large n (say 42 or more) the other consumer will take a succession of small n's. If you give two large n's then you can fill up the buffer because both consumers are busy.

The example uses an array blocking queue of capacity 5, created in the main method by

BlockingQueue<NamedInt> queue = new ArrayBlockingQueue<NamedInt> (5);

Algorithm for BlockingQueue

As I said, at present you don't actually need to know how to implement the algorithm for these buffers, since the BlockingQueue classes do that for you. However, it is a very standard algorithm and you may one day have need to implement it yourself. If you are interested you can find a short description on the API page for java.util.concurrent.locks.Condition. It uses one lock and two conditions, created as
   final Lock lock = new ReentrantLock();
final Condition notFull = lock.newCondition();
final Condition notEmpty = lock.newCondition();

For a more detailed description, you can look at the actual code for ArrayBlockingQueue. Find that in the projects pane of netBeans under Libraries>>JDK 1.5 (Default)>>rt.jar>>java.util.concurrent. You will find the basic operations are in the private methods extract and insert. The public methods such as take and put have been carefully optimised. For example, they use a method signal instead of signalAll. The difference is that signal only wakes up one thread waiting on a condition, instead of all of them. This is a little bit more efficient but requires more careful coding in case the thread woken up has been interrupted and so no longer wants to use the lock - so it has to signal the next thread.

Other examples from the library

Other useful classes can be found in the java.util.concurrent package. You do not need to know them all now, but be aware of their existence and try to find out more about them as you develop your programming skills.

The class java.util.concurrent.Semaphore is a classic construct (invented by E. Dijkstra) described in lots of text books. The best way to think of a semaphore is that it protects a shared resource, in the same way as a lock does, but that there may be several units of that resource available. The semaphore controls permissions to access a unit of that resource.

The class java.util.concurrent.CyclicBarrier is for situations where a number of threads, at a certain point, have to wait for all the others to catch up before they carry on.

Thread safety

Last week we saw different versions of a bank account class. The later versions were thread safe. The instances could safely be used by different threads running concurrently, because access to the balance field was protected with locks by all the methods.

The first version was not thread safe. If different threads, running concurrently, accessed a bank account instance then they might interfere with each other because of race conditions.

You might conclude that every class should be thread safe and protect its fields with locks. However, this is not always necessary. The problem arises only when more than one thread may share access to something. If you know for sure that only one thread needs access to that resource, then it is simpler and more efficient not to enforce thread safety.

Many classes in the Java library are not thread safe (though the API does not always make the situation clear). We shall look at two particular examples, collections and Swing.

Note that the problem arises only when updates are possible for a class. If a class is immutable, then it is automatically thread safe. Immutable means "cannot be changed". So the field values cannot be changed. What is more, if the field values are references to other objects, then those objects cannot be changed in any way either. String is an example of an immutable class. So is NamedInt, from an earlier example.

Collections

In general, the collection classes are not thread safe although they are thread compatible - you can make them thread safe by using them with locks. Hence if you don't need the thread safety, you can use the simpler implementation that is not thread safe. We shall look at ArrayList as an example.

There are three levels of problem for thread interaction on array lists. Each has a different kind of critical region.
  1. When using the methods get and set to read and write elements, the array list behaves like an array. There are the usual interference problems for threads reading and writing the same elements. The critical region, just as for the bank accounts, is likely to be a get-calculate-set sequence. If you have a get on its own, it is best to make that a critical region because of the memory model rules.
  2. The methods add and remove change the structure of the array list. These can interfere in a worse way, since potentially the interference can leave the array list in an inconsistent state. The critical region here is just the add or remove.
  3. If one thread gets the size of an array list, and another thread calls add or remove, the size will no longer be correct. Similarly, indexes into the array list, or iterators over it, will be interfered with by other threads calling add or remove. The critical region is the piece of code where you call size() and then use your knowledge of the result; or where you use your knowledge of which elements have which indexes; or where you get the iterator and then use it.
In general, to make a collection object thread safe you must protect it with a lock, using lock; try{critical region}finally{unlock}.

Wrappers

A "wrapper" class for array lists (or other kinds of collections) is one that has the array list as one of its fields, but also has the necessary locks and redefines the methods to be thread safe, as far as possible. For example, its add method would first get the lock and then call add on the array list.

The class java.util.Collections provides some useful factory methods for creating instances of wrapper classes.

Collections.synchronizedList takes as parameter any object o (such as an array list) that compatible with the List interface, and returns another object that is compatible with List, and stores its data in o, but whose methods are thread safe.

An example given in the API for synchronizedList is

List list = Collections.synchronizedList(new ArrayList());

You can also do this using generic types, e.g.

List<String> list = Collections.synchronizedList<String>(new ArrayList<String>(...));

Each method on list now works the same way as it would have done for the unwrapped array list, but in a thread safe way, taking ownership of a lock.
Getting the lock on a wrapper object
That deals with simple method calls, but you also need to know how to get the lock for larger critical regions such as using an iterator. Again, there is an example in the API.

  synchronized(list) {
Iterator i = list.iterator(); // Must be in synchronized block
while (i.hasNext())
foo(i.next());
}
To understand this, you must know that the lock used to protect the synchronized list is of the old kind, before the new java.util.concurrent.locks features of Java 1.5. Every object has an old-style lock associated with it. You never call lock() or unlock() with those, but instead say

synchronized( object ) {
    critical region
}

and this means

get old-style lock on object;
try {
    critical region
} finally {
    release old-style lock on object
}
Example
The example is in class ArrayListTest. It creates an array list of Integers, and starts three threads on it. One adds elements, one removes them, and one prints all the elements. Because ArrayList is not thread safe, there are various exceptions that can arise.

IndexOutOfBoundsException arises in the thread that adds elements, because that thread also calls set on the last element. If there is a remove between the add and the set, then that last element no longer exists.

ConcurrentModificationException arises in the thread that prints elements, which uses an iterator. Iterators can often detect if other threads have added or removed elements, and then the iterator throws an immediate exception. (This is not reliable for production versions of your software, but can be useful for debugging.)

The thread safe version is ArrayTestList1. It uses a wrapped array list, and the run methods have synchronized blocks.
Java 1.5 notes
The example uses some new features of Java 1.5.

1. Integer is (and has always been) a class for objects with a single int field, and so converts the primitive type int into a reference type Integer. It is useful in collections, because the elements of a collection have to be of a reference type. Java 1.5 now has automatic conversion between Integer and int - this is called "automatic boxing and unboxing" -, and you can see this used in the example.

2. Java 1.5 has a new form of for loop that makes it much easier to use iterators. In the example, the printing thread Lister has a for loop

for (Integer n: list) {
    System.out.print(n + ", ");
}

Read the colon ":" here as in, so "for Integer n in list". The new for loop is equivalent to

Iterator<Integer> it = list.iterator();
while (it.hasNext()) {
    System.out.print(it.next() + "");
}

Thread safe collections

You have already seen BlockingQueue, which is thread safe. There are some others in java.util.concurrent, such as ConcurrentLinkedQueue and ConcurrentHashMap.

Swing

Swing is not thread safe. This is deliberate. If Swing components used locks then the event dispatch thread might sometimes be blocked, and that risks freezing the GUI. In fact when people have designed thread safe GUIs, they have often found that it is easy to deadlock them. Instead, the general rule is to manipulate Swing components only on the event dispatch thread.

The repaint-paint pattern is a good example of this. Repaint can be called on any thread, but it doesn't directly do the repainting. Instead a paint is put on the queue of jobs for the event dispatch thread to execute.

The method javax.swing.JComponent.revalidate() is similar. It puts on the event dispatch queue a job to layout this component and its containers.

There are a few exceptions to this rule. setText can normally be called from any thread. Also, usually, if a component has not yet been set visible or added to a visible container, then any thread (typically the main thread when it creates objects) can work safely on it. [Though Sun now say even this is not completely reliable.  Their tutorial page at http://java.sun.com/docs/books/tutorial/uiswing/misc/threads.html recommends creating Swing components on the event dispatch thread.]

There are thus two opposing considerations.
There are many situations when a lengthy computation involves Swing components. The thread that is performing the lengthy computation therefore needs to be able to request the event dispatch thread to do the Swing manipulations. This is normally done using a static method invokeLater in class javax.swing.SwingUtilities. To use this, first create a run-object (its class implements Runnable) whose run method does the manipulation, and then use it as the parameter for a call of invokeLater. This will put the task into the queue of jobs for the event dispatch thread.

An alternative method to use is invokeAndWait. This is similar to invokeLater, but the thread calling it waits until the event dispatch thread has done the task.

Some examples of code can be found on the Sun tutorial page mentioned above. Here are some typical situations.

SwingWorker

A common pattern is for a lengthy computation on one thread to be followed up with a short computation on the event dispatch thread. This has been made into an abstract class SwingWorker. It is not in the libraries, but is available from Sun. To use it you extend it, overriding two methods construct and finished.

public Object construct() is executed on a thread specially created for each SwingWorker instance. The object it returns can be accessed by a method get().

public void finished() is executed on the event dispatch thread after construct has finished.

When you have created a SwingWorker instance, you must then call start() on it to start its thread executing construct().

Summary