P8 Description

This project consists of two parts. One part focuses on a natural language processing (NLP) perspective on questions, in particular, on developing NLP methods combined with visual analytics (VA). The second part consists of an intense interaction with the linguistic projects in the research unit (RU) in that we work with the projects on developing VA approaches supporting the linguistic analysis process. The two parts are intertwined: the cooperation with the linguistic projects should result in a deeper understanding of how to identify and work with non-canonical questions within NLP. In turn, the computational work should lead to a better understanding of what types of VA systems can be used most perspicuously for the linguistic work.Part 1: With respect to the NLP orientation, our overall goal is to find automatic ways of identifying factoid or information-seeking questions (ISQ) vs. non-ISQs in English and German and to use VA techniques as part of the analysis. Our main focus will lie on rhetorical questions and be carried out with respect to their usage in social media. We will focus on Twitter and work towards a specific computational application scenario, namely, sentiment analysis. Previous studies have shown that rhetorical questions are generally used to express a particular sentiment about a certain state of affairs in Twitter and other social media. Our intention is to add an analysis of non- ISQs (primarily rhetorical questions) into the available information for calculating language features derived from questions (e.g., sentiments) and to thus enhance the state-of-the-art.

Part 2: Pursuing an automated analysis of questions involves understanding more about their linguistic structure. We will therefore work in close cooperation with the linguistic projects in the RU. The linguistic data is both textual and auditory data and bears interesting challenges for VA in terms of integrating information about hierarchical data structures, temporal dimensions (diachronic data), multifactorial analysis (complex interactions among components of grammar) and a potential geospatial dimension (e.g., Twitter geolocation data). In previous collaborative work, we have already proposed some innovative VA approaches for the analysis of linguistic material and have identified concrete avenues of research to meet further challenges. Two especially promising lines of research that will be pursued are: (1) develop flexible and widely applicable VA methodologies that can be easily adapted to different related linguistic data sets and tasks; (2) investigate interaction methodologies that provide linguists with rich and intuitive options for interactively configuring and manipulating visualizations, as well as working with automatic analysis and machine learning techniques. This also implies research on dynamic renderings of visualization structures (e.g., graph representations) calculated from large data sets as immediate response to user interactions.