Streams in general
Streams are one of the canonical data structures in Computer Science in general, like lists and trees. A stream is a potentially infinite list.
The two most important operations on a stream are reading its first element, and testing whether there is more data. A stream is only potentially infinite, so it may end at some point; we have to stop reading from it when that is the case.
An example of a stream that is finite is a file on disk, as the disk is only so big. An example of an infinite stream is user input or a network connection, which could in principle continue to produce data forever.
In some advanced programming languages, notably Haskell, we can program with streams as if they were lists; so-called lazy lists. The implementation is lazy in the sense that only as much as is needed is computed of the list, say the first n elements. Otherwise, an infinite list would always lead to an infinite loop and no output. For instance, standard Haskell programming examples include the list of all Fibonacci numbers. The ease of dealing with such infinite data structures is advocated as a major selling point of lazy functional languages.
Streams in Java
Streams in Java are similar to files in C and Unix. In Unix, one can read characters from a file until a value of -1 is encountered, indicating the end of the file. In Unix, files can be combined using pipes, which is similar to processing lazy lists.
Whereas Unix treats everything as a stream of bytes, streams in Java form an abstraction hierarchy. In the abstraction layers, more and more complex objects get assembled from underlying bytes.
The APIs for streams are so voluminous that there is little point in trying to cover them in a lecture. You need to consult the documentation as needed. There is a great deal of material available at Oracle (formerly Sun). See the tutorial on Java IO.
In an imperative language like Java, streams can be used for reading and writing. There is a class hierarchy below InputStream for byte input streams and below OutputStream for byte output streams. For unicode, the corresponding class hierarchies are rooted at Reader and Writer. There are also DataInputStream and DataOutputStream for the primitive Java types, such as integers and floats.
For Reader, here is some of the class hierarchy:
Reader > BufferedReader Reader > InputStreanReader > FileReader Writer > BufferedWriter Writer > OutputStreamWriter > FileWriter Writer > PrintWriter
The constructors build unicode from byte streams, e.g.
new InputStreamReader(System.in)
Files can be opened with
FileWriter("filename.txt");
First example: echoing a stream in Java (from oracle.com)
FileReader inputStream = null;
FileWriter outputStream = null;
inputStream = new FileReader("xanadu.txt");
outputStream = new FileWriter("characteroutput.txt");
int c;
while ((c = inputStream.read()) != -1) {
outputStream.write(c);
Buffering is more a efficient way to access IO devices. We use constructors to construct the buffered streams:
BufferedInputStream(InputStream in);
builds a buffered input stream. Analogously, we have
BufferedOutputStream(OutputStream out);
The example from oracle.com continued:
inputStream =
new BufferedReader(new FileReader("xanadu.txt"));
outputStream =
new BufferedWriter(new FileWriter("characteroutput.txt"));
Output buffers may have to be flushed
A scanner builds another abstraction layer. We use the scanner as an iterator:
Scanner s = null;
try {
s = new Scanner(new BufferedReader(new FileReader("xanadu.txt")));
while (s.hasNext()) {
System.out.println(s.next());
}
} finally {
if (s != null) {
s.close();
Regular expression matching
Regular expressions were introduced by Stephen Kleene in 1956 as a purely theoretical model of computation. The theory of regular expressions will be covered in some more depth in Models of Computation.
Going back to Ken Thompson's work in Unix, reg exps are widely used in practice, particularly in Unix (grep, sed), compiler construction (lex), and scripting languages such as Perl.
In Java, we do not need any additional tools, as regular expression
matchers can be constructed using java.util.regex.
Oracle has tutorials on regular expression matching.
You may find this applet for testing reg exps useful.
Regular expressions can be compiled into finite automata which can then be run to match patterns very efficiently.
Strangely, the Java reg exp implementation uses an inefficient backtracking algorithm instead. If you are interested in efficiency, see this article.
Exception handling and IO operations
Stream operations can cause IO exceptions. A typical idiom is the
finally clause wrapped around the stream operations. The
finally make sure that closing of streams happens
whenenver control leaves the block of code, even by way of an
exception. That way, all stream are closed. If too many streams are
left open, we are leaking operating system resources.
With the finally clause, our first example above becomes:
FileReader inputStream = null;
FileWriter outputStream = null;
try {
inputStream = new FileReader("xanadu.txt");
outputStream = new FileWriter("characteroutput.txt");
int c;
while ((c = inputStream.read()) != -1) {
outputStream.write(c);
}
} finally {
if (inputStream != null) {
inputStream.close();
}
if (outputStream != null) {
outputStream.close();
}
Object serialization
We can write arbitrary objects using serialization. The main point is that the implementation generates "serial numbers" for references.
Sockets give access to the network via streams. Java objects can be sent over the network using serialization. Sockets will be covered in more detail in SSC2.
NB Some I/O topics addressed last year in SSC1, such as polling, will not be covered this year.