Skip to main content

Tag: annotated corpora

New iRead4Skills Deliverable: Annotated Corpora by Level of Complexity for FR, PT, and SP

We are pleased to announce the release of Dataset 2: Annotated Corpora by Level of Complexity for French (FR), Portuguese (PT), and Spanish (SP). This dataset is a collection of texts categorized by complexity level and annotated for complexity features, presented in Excel format (.xlsx). The corpora were compiled and annotated under the scope of the iRead4Skills.

Dataset 2 is derived from the previously released Dataset 1: Corpora by Level of Complexity for FR, PT, and SP (DOI: 10.5281/zenodo.10055909), which consists of written texts of various genres and complexity levels. A sample of texts from Dataset 1 was selected for classification and annotation, providing additional data and test sets for the complexity analysis systems in the three project languages.

Data Collection and Annotation Process

The classification and annotation tasks were carried out through a structured methodology:

  • Texts were distributed to Adult Learning (AL) and Vocational Education Training (VET) Centres, where trainers and students participated in classification tasks.
  • The classification was conducted via the Qualtrics platform, ensuring a standardized approach.
  • Participants assigned texts to one of four complexity levels:
    • Very Easy (140 texts) – Easily understood by all.
    • Easy (140 texts) – Understandable for those with less than 9 years of schooling.
    • Plain (140 texts) – Readable at a 9th-grade level.
    • More Complex (42 texts) – Challenging for individuals with a 9th-grade education.

For full details on the annotation process, data descriptions, and inter-annotator agreement, refer to the documentation available at Zenodo.

Disclaimer: Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Executive Agency. Neither the European Union nor the granting authority can be held responsible for them.

Logotipo do iRead4Skills - Intelligent Reading Improvement System project
Newsletter
Social media