Problem 1) P(man | ) = 0.9(2/5) + 0.1(4/20) = 0.38 P(marries | man) = 0.9(1/4) + 0.1(3/20) = 0.24 P(man | marries) = 0.9(1/3) + 0.1(4/20) = 0.32 P( | man) = 0.9(2/4) + 0.1(5/20) = 0.475 P( man marries man ) = P(man | ) * P(marries | man) * P(man | marries) * P( | man) = 0.0138624 ---------------------- Problem 2) a1(1) = a01 * b1(A) = 0.5 * 0.8 = 0.4 a1(2) = a02 * b2(A) = 0.5 * 0.2 = 0.1 a2(1) = (a1(1) * a11 + a1(2) * a21) * b1(B) = (0.4 * 0.2 + 0.1 * 0.4) 0.2 = 0.024 a2(2) = (a1(1) * a12 + a1(2) * a22) * b2(B) = (0.4 * 0.6 + 0.1 * 0.2) 0.8 = 0.208 P("A B") = a2(1) * a1F + a2(2) * a2F = 0.024 * 0.2 + 0.208 * 0.4 = 0.088 ----------------------- Problem 3) (S (NP (Det the)(N man)) (VP (V kissed) (NP (NP (Det the) (N woman)) (PP (Prep with) (NP (Det a) (N dog)))))) 1 * 0.6 * 0.6 * 0.4 * 0.5*1.0 * 0.4*0.6*0.6* 0.4* 1* 1* 0.6*0.4 * 0.2 = 1.99e-4 (S (NP (Det the)(N man)) (VP (VP (V kissed) (NP (Det the) (N woman))) (PP (Prep with) (NP (Det a) (N dog))))) 1 * 0.6 * 0.6 * 0.4 * 0.5*0.5*1.0 * 0.6*0.6* 0.4* 1* 1* 0.6*0.4 * 0.2 = 2.49e-4 Most probable parse is not chosen because the PP is attached to the VP rather than the NP (men don't usually kiss women by using a dog to do so) . This is because the rule for attaching PP to VP has higher probability (0.5) compared to the one for NP (0.4) ---------------------------------- Problem 4) CNF changes: S -> VP ==> S -> Verb NP 0.08 (0.4*0.2=0.08) S -> VP PP 0.24 (0.4*0.6=0.24) S -> LIKE 0.048 (0.4*0.2*0.6=0.048) S -> FLIES 0.032 (0.4*0.2*0.4=0.032) NP -> Pronoun ==> NP -> I 0.2 (0.4*0.5=0.2) NP -> YOU 0.2 (0.4*0.5=0.2) NP -> PlurNoun ==> NP -> FLIES 0.3 (0.6*0.5=0.3) NP -> ARROWS 0.3 (0.6*0.5=0.3) VP -> Verb ==> VP -> LIKE 0.12 (0.2*0.6=0.12) VP -> FLIES 0.08 (0.2*0.4=0.08) CKY Table: FLIES LIKE ARROWS ------------------------------------------------------------------------ S: 0.032 | S(NP VP): | S(VP PP): VP: 0.08 | 0.6*0.3*0.12=0.0216 | 0.24*0.08*0.12=0.002304 NP: 0.3 | | S(NP VP): PlurNoun: 0.5 | | 0.6*0.3*0.036=0.00648 Verb: 0.4 | | ------------------------------------------------------------------------ | S: 0.048 | VP(Verb NP): | VP: 0.12 | 0.2*0.6*0.3=0.036 | Verb: 0.6 | PP(Prep NP): | Prep: 0.4 | 1.0*0.4*0.3=0.12 | | S(Verb NP) | | 0.08*0.6*0.3=0.0144 ------------------------------------------------------- | NP: 0.3 | PlurNoun: 0.5 | | | ----------------------------- Parses: (S (NP (PlurNoun FLIES)) (VP (Verb LIKE) (NP (PlurNoun ARROWS)))) 0.00648 (S (VP (VP (Verb FLIES)) (PP (Prep LIKE) (NP (PlurNoun ARROWS))))) 0.002304 ------------------------------------------- Problem 5) Consider the following joke: Loy: Let's eat up the street. Roy: No, thanks; I don't like concrete. Explain what specific type of ambiguity in language understanding makes this humorous. Syntactic ambiguity, particularly part-of-speech ambiguity. Roy incorrectly intreprets "up" as a particle that is part of the complex verb "eat up" whose direct object is "the street" rather than interpreting "up" as a preposition that is part of the prepositional phrase "up the street" attached to the simple verb "eat", which is what Loy intended. -------------------------- Do the same for the following joke: Etta: Did you hear about the kidnapping in Dallas? Gretta: No; what happened? Etta: They woke him up. Word segmentation ambiguity in speech recognition. The joke relies on interpretting Etta as saying "kid napping" rather than "kidnapping". ----------------------- What sequence of numbers (which has no closed form) characterizes the number of parses of simple English sentence ending in n prepositional phrases and what closed form expression in n is an informative lower bound on this number. Catalan numbers which are strictly greater than 2^n ----------------------- What is the primary difference between natural languages and computer languages? Natural languages are explosively and ubiquitously ambiguous whereas computer languages are designed to be unambiguous and deterministically parsable. ------------------------ What is the basic idea behind a ``back-off'' approach to smoothing? When (and only when) training data is unavailable for the n-1 word context of an n-gram model does the system "back off" and use the n-2 word context of a lower-order n-1 gram model to estimate the probability of a word. ------------------------- Why is EM called an ``any time'' algorithm? Because the algorithm is an iterative improvement algorithm that increases the likelihood that the learned model assigns to the training data in each iteration and can be stopped at any time and return an answer, and the longer it is run, the better the answer gets (until convergence). --------------------------- Why does CKY parsing take O(n^3) time, where n is the length of the sentence? Because it needs to fill each of the O(n^2) boxes in the diagonal CKY chart and filling each square requires examining O(n) ways of splitting the given phrase into two sub-constituents, giving O(n^2) * O(n) = O(n^3) ------------------------------ How does a PCFG parser help resolve syntactic ambiguity compared to using a normal CFG? By assigning a probability to each parse tree, it can pick the most probable parse as the correct intepretation, thereby prefering some syntactic attachments to others based on which ones are more probable in the training data. ----------------------------- (Extra credit, 2pts) What university named their school of engineering after the founder of Qualcomm and why? University of Southern California (USC), because he (Viterbi) donated a lot of money to the college and he also got his PhD there. ----------------------------- (Extra credit, 2pts) What is probably the best name for the so-called ``Gaussian'' probability distribution and why? The "normal" distribution, because Gauss was only one of several mathematicians (including De Moivre and Laplace) to contribute to the development of this important statistical construct, and therefore the common neutral name "normal" is probably best to avoid giving credit to any one person. ----------------------------- (Extra credit, 2pts) Who is the world's most highly-cited living author? Noam Chomsky