![]()
Machine
Learning – 2011, Autumn term
![]()
1. Basic Notions of Learning,
Introduction to Learning Algorithms, Literature
2. Probabilistic
Models of Sequences. Worksheet (model answers included in the RL1 model
answers)
You can read more about the Web link prediction
example here.
3. Reinforcement
Learning I. Worksheet1 Solutions
4. Reinforcement
Learning II. Worksheet2 Solutions
5. Probabilistic Latent
Semantic Analysis [check related Practical Assignment 3 below - code
provided!]
6. Instance-Based Learning
(k-nearest neighbour, Case-based Reasoning) Worksheet5 Solutions
7. Decision Tree
Learning Worksheet4
Solutions [check related Practical Assignment
below - no programming]
8. Intro to Bayesian
Learning: Bayesian classification. Worksheet3 Solutions
9. Independent Component
Analysis. The cocktail
party problem (demo). [check related Practical Assignment 4 below – code
provided!]
10. Clustering and Visual Data
Analysis K-means MatLab code: K-means, calling script, distance function, another function needed
11. Support vector machines Worksheet Solutions
Types of assignments:
1) There are exercises set on Worksheets (see as
above). The deadline for handing in solutions for any of these is before the
next class. We usually solve these exercises in the tutorial classes, hence the
tight deadline.
Note that Worksheet-exercise type of questions may
appear in the exam, so it is highly recommended that you put in some effort in
trying to solve them yourself.
Easiest way to hand in: directly to me before the
class.
2) There are also practical exercises set for you (see
below), which may involve some programming work and using machine-learning
techniques. For these works, the deadline is the end of term.
These assignments allow you to ‘get your hands dirty’ and gain experience with
how these methods work. Please give them a try before complaining about the
module being too ‘theoretical’.
Choices and restrictions on continuous assessment
ML: Most exercises are worth 5%, so for gathering 20%
you will need to complete 4 pieces of work.
ML-EXTENDED: to get 40%, you will need to get 8 pieces
of work done. Of these, 20% (equivalent 4 pieces) MUST be chosen from those
marked with “EXT”.
ALL: You can choose which pieces of work you want to
put forth for marking. To make most out of this module you should try each
exercise. I will mark all the work that you submit and will take the best 4 (8
for ‘extended’) marks from those.
Feedback
You get feedback straight away on your efforts of
solving the Worksheet exercises, as we solve them in the class.
You will also get feedback by getting your marked work
returned to you within 2 weeks. In you miss to collect your marked work in the
class then you should to come to my office (in my office hour) to collect it.
In addition, ask me questions any time during
lectures, tutorials and office hours on anything you found unclear. I am happy
to help those of you who are interested to learn. But please prepare concrete
questions. Don’t expect me to repeat a whole lecture or to solve your
exercises!
Important
note
For homework problems or programming assignments you
are allowed to discuss the problems or assignments verbally with other class
members, but under no circumstances can you look at or copy anyone else's
written solutions or code relating to homework problems or programming
assignments. All problem solutions submitted must be material you have
personally written during this term. Failure to adhere to this policy can
result in a student receiving a failing grade in the class.
Handing in procedure
Submit your work on paper. You can hand in any time
during the term, until the last day of the term. Don’t forget to put your
student ID on it.
Practical Assignment 1 [EXT]. [a) 5% b) 5% c) 5%] Prediction using probabilistic sequence models.
Download the data (in MatLab .mat
format) that you can use for this exercise. The same data is also provided in
text format (file1
file2) in case
you want to use another programming language of your choice. Alternatively you
can use any symbolic sequence data set you like. If you go for the last option,
please consult me first.
Practical Assignment 2 [EXT]. [5%] Write a program
that implements Q-learning in non-deterministic environments, for finding the
optimal action plan for you for the situation described onWorksheet2 above.
You may use your favorite programming language. Write up cca. 2 pages about the
data structures you have used, the way you have chosen the actions during
learning as well as your results obtained by running your program. In
particular, the following are of interest to show:
- Plot the evolution of (s,a) values (Q-values) against iterations in any form
you find suggestive so as to show the convergence of the algorithm.
- On a separate figure, plot the evolution of the cumulative reward against
iterations.
- Give the Q-table obtained after convergence.
- Comment on all these plots, i.e. explain in words what you see from these
figures.
Hand in your write-up, not your code.
Practical Assignment 3 – Probabilistic Latent Semantic Analysis [5%] I prepared a term by
document matrix of a subset of the 20Newsgroups text collection for you
together with the associated dictionary of terms: data file (100
terms x 348 docs); dictionary
file (the 100 terms listed). Use my MatLab implementation
of PLSA (or alternatively implement your own in your favorite programming
language) to seek 4 topics in this data. Use the relevant parameters returned
by the algorithm to list the 10 most probable words that characterize these
topics. Try also to search for 5 or more topics. Write up your findings (1-2
pages). What topics can you identify in this document collection? How is the
presence of these topics distributed across the documents?
You
may find the following useful, if you chose to solve this exercise in MatLab:
- Have
a look at the parameter arrays involved. An array (matrix) M can be plotted
e.g. like this:
>>
imagesc(1-M);colormap gray
- For
loading in the dictionary file '4news_dictionary.txt' into the MatLab
workspace, use the following:
>>
[ignore, terms]=textread('4news_dictionary.txt','%d %s',-1);
- When
writing your code to list the most probable words in each topic, be aware that
MatLab has a built in function 'sort' for sorting, that you can use without
having to code it from scratch. Type 'help sort' to find out more about this
function.
For
your convenience, I have added a small demo MatLab script which calls
PLSA and plots the variables involved. Here is a pretty figure
obtained by running this demo that illustrates the workings of the algorithm.
It shows how the initial terms x docs matrix is decomposed into the product of
a topics x terms matrix and a documents by topics matrix.
Practical Assignment 4 – ICA.
a) [5%] Download the
FastICA toolbox. Generate 4 signals using the ‘demosig’ function included in
the toolbox. Retain 2 of those signals only. Generate a random linear mixture.
In MatLab, this is accomplished with the commands below.
>>s=demosig; ind=[index1, index2]; s=s(ind,:);
>>A=rand(2); x=A*s;
Then use the software to try to recover the two
signals s. Repeat the experiment a few times, using different 2 signals (out of
the overall 4) and each time testing different nonlinearities g (out of those
pre-set in the sw). Record and report on at least one case [i.e. indexes of the
2 initial signals used and the g used] where the signal separation was
consistently successful and one other where it wasn’t.
b) [EXT] [5%] Re the
previous question, explain *why*. (To answer this, you would need to read
around the subject, starting from the tutorial and links given on the last page
of the handout and become familiar with basic statistical issues involved.)
Introduction
to MatLab
Many Machine Learning methods and algorithms are readily implemented and
downloadable. As new solutions are continuously being designed, most of
this stuff is written in MatLab. MATLAB® is a high-performance
language for technical computing. It is easy to use. The name MATLAB stands for
matrix laboratory.
This is because in MatLab the basic type is the matrix. A scalar number is just
a 1 x 1 matrix! You can do operations with matrices in a single line of code.
To use MatLab, login into your unix account and start
MatLab by typing
matlab –nojvm
You will get a prompt like this:
>>
To get general help, type
>> help
To get help on a <command>, type
>> help <command>
To quit, type
>> quit
Here is a simple MatLab tutorial. It contains all you need for this module.
http://www.cyclismo.org/tutorial/matlab/
Here is another one -- in case you are still not convinced that MatLab is easy.
http://users.rowan.edu/~shreek/networks1/matlabintro.html
Here is a complete online MatLab help
http://www.mathworks.com/access/helpdesk/help/techdoc/learn_matlab/learn_matlab.html
Here is the MatLab manual. This is a 184 pgs book (especially chapters 2 and
4).
http://www.mathworks.com/access/helpdesk/help/pdf_doc/matlab/getstart.pdf
By the end of the practical work in this module you
will be able to:
- use Machine Learning methods available as MatLab programs
- know how to apply them to real data analysis problems
- know how to look for help on a program or on a method