A Biblissima+ summer-school-transcribathon for automatic text recognition
11-13 Sep 2024 Lyon (France)

TranscriboQuest 2024

First edition of TranscriboQuest - September 2024

TranscriboQuest, organized by Biblissima+ with the support of the ATRIUM project, is a three-day transcribathon combining training sessions and practical transcription activities. This event provided participants with the opportunity to learn the basics of automatic handwriting recognition, become proficient in the eScriptorium tool, and contribute to the creation of a training dataset now integrated into the HTR-United platform.

  • Date: September 11-13, 2024
  • Location: Lyon, ENS Descartes site, Buisson building
  • Total number of participants: 33
  • Number of trainers: Enki Baptiste, Julie Bordier, Olivier Brisville, Thibault Clérice, Simon Gabay, Matthias Gille Levenson, Ariane Pinche, Marianne Reboul.
  • Organisation committee: Thibault Clérice (Inria) and Ariane Pinche (CNRS)
  • International audience: 17 countries represented (Germany, Austria, Belgium, Bulgaria, Canada, Denmark, Spain, France, Greece, Italy, Norway, Poland, Portugal, Switzerland, Turkey, UK, USA)

Training and Teams

For this first edition, six teams were formed:

  • Medieval Literary (6 people) – Trainer: Matthias Gille Levenson
  • Medieval Practice (6 people) – Trainer: Julie Bordier
  • Modern (7 people) – Trainer: Simon Gabay
  • Ancient Greek (6 people) – Trainer: Marianne Reboul
  • Old Norse (5 people) – Trainer: Ariane Pinche
  • Arabic (3 people + Enki Baptiste, assistant trainer and participant) – Trainer: Olivier Brisville

Training Overview

Before the event:

The teams were formed based on the fields of study and paleographic skills of the trainers. Throughout the summer, each team collaborated with its dedicated trainer to prepare the corpus to be transcribed. Preliminary methodological meetings were also held to refine the transcription approaches before the transcribathon.

During the event:

  • Day 1: Intensive training on the basics of ATR and mastering eScriptorium.
  • Days 2 and 3: The transcription sessions were the core of the training. Each team worked on creating a training dataset for ATR, combining document transcription with in-depth discussions on establishing transcription rules to ensure high-quality data production.

On September 12, 2024, from 5:00 PM to 6:00 PM, a public conference was given via videoconference by Elena Pierazzo"Working at the Frontier of Knowledge: 25 + 5 Years of Digital Philology".

  • Day 3: Documentation of datasets and final presentations before a jury composed of Peter Stokes and Thibault Clérice, followed by an awards ceremony.

Results and Datasets Produced

At the end of the training, each team produced an open source dataset:

Congratulations to the DReMAR team, the grand winner of this first edition!

Online user: 2 Privacy | Accessibility
Loading...