Skip to content

Home

DOI

This is a collection of European Portuguese verbal paradigms, in phonemic notation. They are suited for both computational and manual analysis. The paradigms table lists all available lexemes, and provides full paradigms for each. The segments table lists all phonemes used in the transcription, and describes them in terms of distinctive features.

The European Portuguese Verbs lexicon is licensed under Attribution-ShareAlike 4.0 International

Please cite as:

  • Perdigão, Fernando, Beniamine, Sacha, Luís, Ana R., & Bonami, Olivier. (2021). European Portuguese Verbal Paradigms in Phonemic Notation [Data set]. Zenodo. https://doi.org/10.5281/zenodo.5121543

Version 1.0.1 of this lexicon was prepared for the publication:

The data can be downloaded from zenodo or from the gitlab repository.

How this lexicon was prepared

We selected the 5000 most frequent verb lexemes in the CETEMPúblico corpus (Santos and Rocha, 2001), relying on frequency lists provided by the AC/DC project. Full paradigms in phonemic transcriptions for these verbs were generated using pronunciation dictionaries and text to speech tools developed at the University of Coimbra (Candeias, Veiga,and Perdigão, 2015; Marquiafável et al., 2014). We made further adjustments by hand. In the process, a handful of verbs had to be excluded.

References

Scripts

The only dependency is pandas (version 1.2.4 was used). The python version used was 3.8. To re-generate the lexicon, navigate to the data repository and run:

 python3 src/format_lexicon.py            

To run tests:

python3 -m unittest tests/test_lexicon.py

For paralex validation, after installing paralex:

paralex validate *.package.json

Format

The data files are encoded in csv files, and the metadata follows frictionless standards. The dataset conforms to the Paralex standard