Datasets

The data will be hosted on the Zenodo platform under the NLP in HR community, following the file structure outlined below. Each time new data is added, an updated version of the dataset will be published on the platform.

Access the Zenodo download page

The dataset structure on Zenodo is organized into two *.zip files, TaskA.zip and TaskB.zip, each containing folders to support different stages of model development. So far, only the development set of Task A and the training set of Task A have been released, but in future releases, as the tasks progress, additional data will be added to the different subfolders for each task.

TaskA includes language-specific subfolders within the directories, covering English and Spanish data. Development folders include two essential folders (queries, corpus), and a qrels file for evaluating model relevance to search queries.

  • πŸ—œοΈοΈ TaskA
      • πŸ“ development
        • πŸ“ english
          • πŸ“ queries
            • πŸ“„ 1234
            • ...
          • πŸ“ corpus
            • πŸ“„ 1
            • ...
          • πŸ“„ qrels.tsv
        • πŸ“ spanish
          • πŸ“ queries
            • πŸ“„ 9865
            • ...
          • πŸ“ corpus
            • πŸ“„ 2
            • ...
          • πŸ“„ qrels.tsv
      • πŸ“ test

    • TaskB follows a similar structure but without language-specific subfolders, providing general .tsv files for training, validation, and testing. This consistent file organization enables efficient data access and structured updates as new data versions are published.

      • πŸ—œοΈοΈ TaskB
        • πŸ“ training
          • πŸ“„ job2skill.tsv
          • πŸ“„ jobid2terms.json
          • πŸ“„ skillid2terms.json
        • πŸ“ validation
        • πŸ“ test
    Last modified February 1, 2026: Update datasets description (81ca478)