![]() |
A Corpus of Indefinite Uses
The Corpus of Indefinite Uses is an output of the project Indefinites and beyond. Evolutionary pragmatics and typological semantics. It makes available data collected and annotated in the course of a cross-linguistic synchronic and diachronic corpus study of indefinite expressions.
The corpus contains data for the following languages and forms:
Synchronic
Diachronic
The indefinites have been annotated with the functions in an extended version of Haspelmath’s (1997) semantic map proposed by Aguilar-Guevara et al. (2011). A description of the functions and the annotation procedure can be found in the Annotation Guidelines. Aloni et al. (2012) reports results on inter-annotator agreement.
The corpus is searchable through an online web interface and is also available as raw data.
Full documentation describing the organization of the database and the search functionality, as well as highlights of key results, is available here.
The following publications are based on data included in the database
Natural languages possess a wealth of indefinite forms that typically differ in distribution and interpretation. Although formal semanticists have strived to develop precise meaning representations for different indefinite functions, to date there has hardly been any corpus work on the topic. In this paper, we present the results of a small corpus study where English indefinite forms any and some were labelled with fine-grained semantic functions well-motivated by typological studies. We developed annotation guidelines that could be used by non-expert annotators and calculated inter-annotator agreement amongst several coders. The results show that the annotation task is hard, with agreement scores ranging from 52% to 62% depending on the number of functions considered, but also that each of the independent annotations is in accordance with theoretical predictions regarding the possible distributions of indefinite functions. The resulting annotated corpus is available upon request and can be accessed through a searchable online database.
@inproceedings{AloniEtAl2012, author = {Aloni, Maria and van Cranenburgh, Andreas and Fernandez, Raquel and Sznajder, Marta}, title = {Building a Corpus of Indefinite Uses Annotated with Fine-grained Semantic Functions}, booktitle = {Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)}, year = {2012}, publisher = {European Language Resources Association (ELRA)} }
External References
This work was financially supported by the NWO.