3rd Year UG, MSc in Advanced Computer Science
Imaging and Visualisation Systems
Course Material and Useful Links
P.Tino@cs.bham.ac.uk
Lecture Timetable and Handouts
Here is a preliminary outline of the structure and lecture timetable
for my part of the module.
I will develop most of the ideas on the blackboard.
You are encouraged to take notes during the lectures.
Any handouts used
will be made
available here as pdf files shortly after the paper versions have been
distributed.
|
Week |
Session 1 Tuesdays 10:00-11:00
|
Session 2 Thursdays 12:00-13:00
|
| 1 |
B. Hendley
|
B. Hendley
|
| 2 |
Dimensionality reduction of vectorial data. Basic concepts of
vector and matrix algebras.
|
B. Hendley
|
| 3 |
Linear models.
Principal Component Analysis I.
|
B. Hendley
|
| 3 |
Principal Component Analysis II.
|
B. Hendley
|
| 4 |
Nonlinear methods.
Self-organizing topographic maps I.
|
B. Hendley
|
| 5 |
Self-organizing topographic maps II.
|
B. Hendley
|
| 6 |
Probabilistic approaches. Basic probability and statistics.
|
B. Hendley
|
| 7 |
Latent-space reformulations of topographic maps.
Generative topographic mapping.
|
B. Hendley
|
| 9 |
Enhancing the information content of visualisation plots.
|
B. Hendley
|
| 10 |
Hierarchical visualisation
|
Fractal Images.
Iterative function systems.
|
| 11 |
Mandelbrot and Julia sets.
|
Fractal modelling of real world objects.
|
| 12 |
Two Revision Lectures Covering the Whole
Module |
Useful Links
Suggested reading + software
-
Principal Component Analysis
- Bishop: Section 8.6, Appendix E
- tutorial by Lindsay Smith:
soft introduction to PCA with elementary vector and
matrix algebras
-
Self-Organizing Maps and Vector Quantization
-
Fractals
Assignment -
hand in during revision lectures on May 3
and May 5
Make yourself
familiar with my
implementation (in C) of SOM.
You are very welcome to use your own implementation, or
download a working code from the web.
Un-tar the file
som.tar.gz and go to the folder
"SOM".
The subfolder "SOURCE" contains the c-implementation of som,
"som.c".
There is an example in the folder "GAUSS.2D".
Please consult the "read.me" file there.
You can chose any data set(s) from the list bellow.
Un-tar the relevant file and go to the corresponding folder.
The folder contains the data set, as well as additional information
about the data. Read the available information, especially
description of the features (data dimensions).
You will need to clean the data, so that it contains
only numerical features (dimensions)
and the features are space-separated (not comma-separated.
To make the plots informative, you should
come up with a labelling scheme for data points.
If the data can be classified into several classes
(find out in the data and feature description!), use that information
as the basis for your labelling scheme.
In that case exclude the class information
from the data dimensions.
Alternatively, you can make labels out of any dimension,
e.g. by quantising it into several intervals. For example,
if the data dimension represents age of a person,
you can quantise it into 5 labels (classes)
[child, teenager, young adult, middle age, old].
Associate the data labels with different markers and
use the markers to show what kind of data points get projected
to different regions of the visualization plot
(computer screen).
-
Learn as much as you can about an assigned
data set(s) using visualization methods
developed in the module, namely PCA and/or SOM.
-
Use various data labelling schemes.
-
Compare more
complex visualization schemes with straightforward co-ordinate projection
methods.
Before starting to work on the assignment,
please carefully study the example
I prepared using
the boston database.
Un-tar the file
boston.ex.tar.gz and go to the folder
"BOSTON.EX".
The subfolder "FIGURES" contains all the relevant figures
as eps or gif files.
Please consult the "boston.read.me" file in BOSTON.EX.
The report should describe experiments with a chosen data set(s)
along the lines of `boston
example'.
In the labeling scheme,
concentrate on more than
one coordinate (dimension),
e.g. in the `boston example',
consider not just the `price' feature,
but run separate experiments
with `per capita crime rate in the town', or
'pupil-teacher ratio in the town' instead of the
`price' coordinate).
In the report concentrate on the following questions:
- How did you preprocess the data?
- What features (coordinates) did you use for labeling the
projected points with different markers?
- How did you design the labeling schemes?
- What visualisation techniques did you use?
- What interesting aspects of the data did you detect
based on the data visualisations?
You should demonstrate that you
- understand the visualisation techniques used
- are able to extract useful information about otherwise
inconceivable high-dimensional data using
dimensionality-reducing visualisation techniques.
Data Visualisation using PCA and SOM
Try a
Java applet for an interactive PCA/SOM created by Daoxiao Jin.
Source code (tar+gzip)
Preparing for the exam - Sample Questions
-
Principal Component Analysis (PCA)
- Why is it good to concentrate on variance/covariance structure of the
multidimansional data?
- How can the variance/covariance structure be quantified?
Define variance of a random variable and covariance
of two random variables.
- How can the variance of a random variable X be estimated from a finite
sample of realizations of X?
- Consider two random variables X,Y that form a two-dimensional
vector random variable Z=(X,Y). How can the covariance between
X and Y be estimated given a finite sample of realizations of Z?
- What is covariance matrix of a vector random variable?
- Define eigenvectors and eigenvalues of a square matrix A.
- Why are the eigenvectors/eigenvalues of covariance matrix important?
- Write down the PCA algorithm.
- How can you quantify the amount of lost information
when projecting points onto a low-dimensional plane via PCA?
- What are the advantages/drawbacks of PCA?
-
Self-Organizing Maps (SOM)
- What is the principal advantage of SOM over PCA?
- What is vector quantization (VQ)?
- What is the relation between SOM and VQ?
- What types of neighborhoods in the neural field can you think of?
- Why do we need to reduce both the neighborhood size and the learning rate?
- Write down the SOM algorithm.
- How would you quantify the amount of lost information
when projecting points onto a low-dimensional surface via SOM?
- What are the advantages/drawbacks of SOM?
-
Fractal images
- How would you describe self-similar objects?
- What is an Iterative Function System (IFS)?
- What is an affine mapping?
- Why can IFS pruduce realistic images of e.g. plants?
- How is an image produced from a given IFS?
- What are complex numbers and how do we add and multipy them?
- Define Mandelbrot set.
- In what ways can the Mandlebrot set be made visually appealing?
- Define Julia set associated with a complex number c.
- In what ways can Julia sets be made visually appealing?
- What is the relation between the Madelbrot set and Julia sets?
Problems to solve for those really interested
-
Principal Component Analysis
- Show how the need for eigen-decomposition of the data covariance
matrix arises from minimizing the distance between the data
points and their corresponding projections in a low-dimensional
linear
subspace
- What happens if there are two (or more) equal eigenvalues
of the covariance matrix?
- In general, the eigenvaules can be complex numbers. Why are the eigenvalues
of the covariance matrix always real numbers?
Try the models out! - Benchmark data sets
As you get familiar with different types of models,
try them out on benchmark data sets people in the
machine learning community have been using to support
their claims about yet another excellent learning system :-)
Here are two of the widely used data repisitories that contain
data description, data itself and other useful things,
like previously obtained results.
DELVE -
Data for Evaluating Learning in Valid Experiments
UCI Knowledge Discovery in Databases Archive
Aims, Objectives, and Assessment
For formal details about the aims and objectives and assessment you should look at the official
Module Description Page
and Syllabus Page.
There are two components to the assessment of this module: A two hour
examination (80%) and
a continuous assessment by mini-project report (20%).
As the material is developed
I will give you
ideas of the standard and type of questions you can expect
in this year's examination.
I will address questions related to the material covered in previous
lectures in great detail during the timetabled
Exercise Sessions.
Recommended Books
The Recommended Books for this module are:
| Title |
Author(s) |
Publisher, Date |
Comments |
| Introduction to Visualisation and Virtual Environments |
Chaomei Chen |
Springer, 2002 |
- |
| Data Visualization: The state of the art |
Frits H. Post, Gregory M. Nielson, Georges-Pierre Bonneau |
Kluwer Academic, 2002 |
- |
| Fractals Everywhere |
Michael F. Barnsley |
Morgan Kaufmann, 2000 |
Fractal image generation/compression mostly via
iterative function systems.
Highly recommended for mathematically minded students. |
| The Computational Beauty of Nature:
Computer Explorations of Fractals,
Chaos, Complex Systems, and Adaptation
|
William Gary Flake |
Bradford Book, 2000 |
A beautiful book exploring
links between science and art. |
| Neural Networks: A Comprehensive Foundation |
Simon Haykin |
Prentice Hall, 1999 |
Very comprehensive, a bit heavy in maths. |
Neural Networks for Pattern Recognition |
Christopher Bishop |
Clarendon Press, Oxford, 1995 |
Highly recommended for mathematically minded students. |
This page is maintained by
Peter Tino.
Last updated on 1 March 2004.