Text data mining concerns the application of data mining (knowledge discovery
in databases, KDD) to unstructured textual data. Our work focuses on using
information extraction to first extract a structured
database from a corpus of natural language texts and then discovering patterns
in the resulting database using traditional KDD tools. It also concerns
record linkage, a form of data-cleaning that identifies
equivalent but textually distinct items in the extracted data prior to mining.
It is also related to our research on
natural language learning. Our recent work has focused on text mining for
bioinformatics.
This research was formerly supported by the National Science Foundation through
grant IIS-0117308 from the "Information and Data Management" Program.
Subareas: