Datasets

We take the legal and ethical implications of using AI in Human Resources very seriously. It is important to note that the data we utilize inherently excludes any personal information, focusing solely on job titles and skills without involving personal/company information or geographic location.

The data will be hosted on the Zenodo platform under the NLP in HR community, following the file structure outlined below. Each time new data is added, an updated version of the dataset will be published on the platform.

Access the Zenodo download page

The dataset structure on Zenodo is organized into two *.zip files, TaskA.zip and TaskB.zip, each containing folders to support different stages of model development. So far, only the development set of Task A and the training set of Task A have been released, but in future releases, as the tasks progress, additional data will be added to the different subfolders for each task.

TaskA includes language-specific subfolders within the directories, covering English and Spanish data. Development folders include two essential folders (queries, corpus), and a qrels file for evaluating model relevance to search queries.

🗜️️ TaskA
- - 📁 development
    - 📁 english
      - 📁 queries
        📄 1234
        ...
      - 📁 corpus
        📄 1
        ...
      - 📄 qrels.tsv
    - 📁 spanish
      - 📁 queries
        📄 9865
        ...
      - 📁 corpus
        📄 2
        ...
      - 📄 qrels.tsv
  - 📁 test
Last modified February 20, 2026: UPdates for Task B (2a7a28d)