CS 343 Artificial Intelligence
Homework 4: Machine Learning and Natural Language Processing


Due: Dec 1, 2010
  1. Consider examples described using the following three binary-valued features:
    A:  0,1
    B:  0,1
    C:  0,1
    
    Show the decision tree ID3 learns from the training set of [A,B,C] examples:
    [1,0,0]: positive
    [0,0,1]: positive
    [0,0,0]: negative
    [0,1,0]: negative
    
    Explicitly show the gain of each feature at each choice point. If there is a tie in gain, prefer splitting on the feature first in the list: [A,B,C].

  2. Consider the same learning problem and training data from the previous problem and show the trace produced by the perceptron learning algorithm. Assume all of the weights and the threshold start at 0 and the learning rate is 1 and that during each epoch the examples are processed in the exact order given above. Show the weight vector (in the order [A,B,C]) and the threshold after every example presentation. The procedure should converge after only 3 epochs. NOTE: For the purposes of this problem, assume that the net input must be strictly greater than the threshold in order output a 1 (so this is slightly different from the equation on page 8 of the neural-net lecture slides).

    If represented as a logical rule, what is the function learned? Is it the same or different from the function learned using decision trees?

  3. Given the following context-free grammar, draw trees for all parses for the sentence:

    The suicide bomber near the market in Israel killed the surrounding people on the street.

    S -> NP VP,
    NP -> Det N, NP -> Det Adj N, NP -> PN, NP -> NP PP,
    VP -> V, VP -> V NP, VP -> VP PP,
    PP -> Prep NP,
    Det -> the,
    Prep -> on, Prep -> in, Prep -> near,
    PN -> Israel,
    N -> bomber, N -> market, N -> suicide, N -> people, N -> street,
    V -> killed, V -> surrounding,
    Adj -> suicide, Adj -> surrounding, 
    
    Do the obvious semantic interpretations of each of these various parse trees differ significantly from each other? If so, explain the interpretation of each parse tree and how it differs from that of the others. If not, explain why not.

  4. Given the simple PCFG in the domain of air travel given on slide 70 of the slide packet on NLP, show all of the parse trees for the sentence "Include this book on that flight from Houston" and compute the probability of each. Clearly explain the meaning of each parse tree (i.e. one explanation should be "There is a particular flight that originated in Houston. There is a book about this flight. You should include this book."). Clearly indicate which parse is selected as the preferred parse by this PCFG. Also compute the probability that the grammar generates this sentence, i.e. P("Include this book on that flight from Houston").