Information theory unravels the subtext in Chekhov

Monday, 15. September 2025 in publications :: #DHQ

The present paper reports a pilot study to approach the subtext in Chekhov's short-story "Ward No. 6" by means of information theory. The original text is enriched by glosses by which we intend to make explicit the implicit knowledge conveyed by the original text, i.e., the subtext. We generated several text variants with meaningful enrichments and one fake variant that served as a baseline. We could not observe that semantic surprisal as a feature of words and uniform information density have a subtext effect throughout all text variants. However, it turned out that kurtosis and skewness are suitable classification criteria to distinguish meaningful enrichments from fake enrichments.

Bibtex | Digital Humanities Quarterly

Surprisal in Action: A Comparative Study of LDA and LSA for Keyword Extraction

Thursday, 18. September 2025 in publications :: #Konvens

This study compares two methods of topic detection, Latent Dirichlet Allocation (LDA) and Latent Semantic Analysis (LSA), by using it in conjuction with the Topic Context Model (TCM) on the task of keyword extraction. The surprisal values that TCM outputs based on LDA and LSA are compared, both, directly and as inputs to a Recurrent Neural Network (RNN). While in the direct comparison LSA slightly outperforms LDA, LDA and LSA perform on a par when a Recurrent Neural Network (RNN) is trained with surprisal values. In general: semantic surprisal as input of an RNN improves its performance.

BibTex | https://aclanthology.org/2025.konvens-1.1/

Can information theory unravel the subtext in a Chekhovian short story?

Friday, 22. August 2025 in publications :: #SlavicNLP, #ACL

In this study, we investigate whether information-theoretic measures such as surprisal can quantify the elusive notion of subtext in a Chekhovian short story. Specifically, we conduct a series of experiments for which we enrich the original text once with (different types of) meaningful glosses and once with fake glosses. For the different texts thus created, we calculate the surprisal values using two methods: using either a bag-of-words model or a large language model. We observe enrichment effects depending on the method, but no interpretable subtext effect.

Bibtex | Poster | ACL Anthology

The role of information in modeling German intensifiers

Monday, 21. October 2024 in publications :: #LanguageSciencePress

In this study, context-free and context-dependent information measures are applied to a new corpus of tweets and blog posts. The aim is to account for the expressive meaning and characterize the variability of available intensifying items. It comes to light that context-free and context-dependent information measures are highly correlated and account for the distribution of intensifiers in the data, giving credence to the notion that intensifiers form a common word class, even across syntactic and semantic differences.

Both information measures show that stacked intensifiers tend to be ordered from least to most expressive within a phrase, i.e., the information tends to increase. We explain this fact using the Uniform Information Density Hypothesis: The first, less expressive intensifier is used to introduce the phrase, ease the reader’s processing load, and smooth the information flow.

BibTex | DOI: 10.5281/zenodo.133837

Forschungsportal BACH

Thursday, 15. February 2024 in publications :: #TUC

The aim of this project is to build a comprehensive online repository that provides access to all surviving documents of the Bach family of musicians – the most influential family dynasty in music history – from the late 16th to the early 19th century. For the first time in the history of Bach research, the written documents preserved in libraries, archives and private collections will be digitally recorded, indexed, processed, annotated and made available via an online portal. Interlinking with existing digital Bach projects is planned.

Poster