Title:Computational Linguistics in Museums: Applications for Cultural Datasets
Authors:Robert Stein, Judith Klavans, Susan Chun, Robert Stein, Judith Klavans, Susan Chun, Raul Guerra
Publication:MW2011: Museums and the Web 2011

As museums continue to develop more sophisticated techniques for managing and analyzing cultural data, many are beginning to encounter challenges when trying to deal with the nuances of language and automated processing tools.  How might user-generated comments be harvested and processed to determine the nature of the comment?  Is it possible to use existing collection documentation to derive relations between similar objects?  How can we train systems to automatically recognize (disambiguate) different meanings of the same word? Can automated language processing lead to more compelling browsing interfaces for online collections?

Luckily, a good deal of expertise and tools exist within the field of computational linguistics that can be applied to these problems to achieve meaningful results.  Informed by previous work in computational linguistics and relevant project experience, the authors will address a number of these questions providing insight about how answers to impact museum practice might be found. Authors will share tools and resources that museum software developers can use to prototype and experiment with these techniques - without being experts in language processing themselves.  In addition, the authors will describe the work of the T3: Text, Tags, Trust research project and how they have applied these tools to a large shared dataset of object metadata and social tags collected by the project. 

Specific challenges regarding batch-processing tools and large datasets will be addressed.  Best practices and algorithms will be shared for dealing with a number of sticky issues. Directions for future research and promising application areas will be also be discussed.