HW1 Expectation Maximization
Due Sunday, January 29 at 11:59pm
The purpose of this homework is to help you familiarize yourself with MATLAB and
explore the properties of the Expectation-Maximization algorithm using Gaussian
Probability Density Functions.
For info on EM and Gaussian mixture models, see Andrew Ng's lecture notes, Bishop pp. 430-439, or the slides posted on the syllabus. I find the first to be the clearest.
You will accomplish this project with the following steps:
- Build a model. Randomly generate a gaussian mixture model that generates two dimensional data points. First, randomly generate two mean vectors and two
covariance matricies to represent two gaussian models: 1 and 2. Also randomly generate the prior probabilities P(1) and P(2) = 1 - P(1), which control the probability of picking each of the two Gaussians. Make sure the covariance matrix is symmetric and positive definite. You can find recipes for generating such matrices on the web.
- Generate data. Generate a set of random data to use as training samples.
You will likely find the MATLAB function mvnrnd helpful.
- Implement the Expectation Maximization algorithm
You may want to use the following MATLAB functions: mvnpdf, mean, and cov.
You will also need to understand
different ways MATLAB provides for indexing arrays/matrices of data and for loops. Try not to use 'for...loops' when possible -- they're slow in Matlab. For more info on this, see the Matlab tutorials linked on the syllabus, or this on improving the speed of Matlab code via vectorization.
- Create plots. See help plot. You must show the original
data points with different colors depending on which Gaussian generated it and then show the datapoints classified according to
which Gaussian your model says is most likely to have generated it.
You must plot the accuracy of means and other parameters
found by your algorithm compared over several trials (generate multiple sets of data using different models) in comparison to the ground truth parameters.
- Describe your work in a well-written report. This should not be very long (two pages or less, unless you have a very compelling
reason to wax verbose). You can use whatever method you want to generate the report (Latex, MSWord, etc), but the document you submit must be in PDF format.
Mention implementation details (parameters used, etc.) and explain your plots. You don't need to
explain the whole EM algorithhm. For this and all assignements, all plots must have titles, axis labels, and captions briefly describing the plot!
- Submit. Submit your assignment before the deadline using turnin as described on the class homepage to dkit.