My current research focusses on modelling these characteristics as revealed by recent eye-tracking experiments in the visual worlds paradigm. In these experiments, participants' attention to a visual scene is monitored while they listen to a spoken utterance. The proportion of fixations on an object in the visual scene preceding its mention in the utterance provides evidence for how the utterance is being interpreted as it unfolds. Both utterance and scene are controlled for factors of research interest, such as stereotypical associations between a verb and its arguments, or whether actions are depicted in the scene. German, in particular, has a number of linguistic properties that make it especially suited to investigating these factors. It has case-marking on articles, which allows researchers to easily examine word order effects on attention to objects in a visual scene. For example, when people are presented a scene depicting a hare, fox, and a cabbage, and hear the OVS sentence Den Hasen frißt gleich der Fuchs, "the hare (obj) eats shortly the fox (subj)," they evidently use the fact that the word "hare" occurs in the accusative in the utterance, together with their world knowledge about foxes eating hares, to anticipate the fox as the likely agent, as revealed by proportionately more fixations on the depicted fox rather than the cabbage when they have just processed the verb frißt. The opposite pattern is observed (i.e. more fixations on cabbage) when they hear the SVO sentence Der Hase frißt gleich den Kohl. Moreover, these gaze patterns are reversed when the same scenes are paired with the same sentences, but with object-experiencer verbs such as interessiert instead of agent-object verbs like frißt, demonstrating that people are sensitive to the argument structure of the verb. The ambiguity of the feminine case-marking for nominative and accusative is also very useful for investigating how information from the scene can be utilized when the utterance is initially ambiguous. Thus, given a scene depicting two events, both involving a female character, one in which she is the agent of an action, and another in which she is the patient of a different action, together with an utterance in which only one of the two actions is mentioned, people clearly anticipate the appropriate upcoming argument before it is actually mentioned in the utterance. As an example, a scene that depicts the two events (Pirate washes Princess) and (Princess paints Fencer) is shown to participants along with the utterance Die Princessin wäscht gleich der Pirat, "The princess (subj?/obj?) washes shortly the pirate (subj)." Because the scene and utterance involve characters in non-stereotypical roles, the greater proportion of glances to the pirate rathen than the fencer upon mention of the verb wäscht shows that people are able to use information from a visual context even when useful information is lacking in a spoken utterance.
Presently I am using a connectionist architecture to model these types of experiments. Such models are well-known for their nice cognitively plausible properties, such as the ability to process input incrementally, make predictions based on context, integrate multiple sources of information, and adapt to available information. The model is based on a simple recurrent network (SRN) that is enhanced with a representation of a visual scene. For those experiments involving just depicted characters, the model simply takes representations for those characters as additional input through shared weights. The experiments involving depicted events are modelled by developing representations of events through compression of their arguments, and feeding those through shared weights to the SRN's hidden layer. The network is trained to develop a case-role interpretation of the processed sentence to demonstrate that it is able to anticipate the appropriate upcoming argument as revealed in the experiments.