European Portuguese Verbal Paradigms in Phonemic Notation

DOI

This is a collection of European Portuguese verbal paradigms, in phonemic notation. They are suited for both computational and manual analysis. The paradigms table lists all available lexemes, and provides full paradigms for each. The segments table lists all phonemes used in the transcription, and describes them in terms of distinctive features.

Version 1.0.1 of this lexicon was prepared for the publication:

The data can be downloaded from zenodo or from the gitlab repository.

How this lexicon was prepared

We selected the 5000 most frequent verb lexemes in the CETEMPúblico corpus (Santos and Rocha, 2001), relying on frequency lists provided by the AC/DC project. Full paradigms in phonemic transcriptions for these verbs were generated using pronunciation dictionaries and text to speech tools developed at the University of Coimbra (Candeias, Veiga,and Perdigão, 2015; Marquiafável et al., 2014). We made further adjustments by hand. In the process, a handful of verbs had to be excluded.

References

Scripts

The only dependency is pandas (version 1.2.4 was used). The python version used was 3.8. To re-generate the lexicon, navigate to the data repository and run:

 python3 src/format_lexicon.py            

To run tests:

python3 -m unittest tests/test_lexicon.py

Format

The data files are encoded in csv files, and the metadata follows frictionless standards.

cite as: Fernando Perdigão, Sacha Beniamine, Ana R. Luís and Olivier Bonami (2021). European Portuguese Verbal Paradigms in Phonemic Notation. DOI:10.5281/zenodo.5121543 [dataset]

Contributors