OCR Training Data Collection and Annotation Tutorial
This guide will walk you through the process of collecting and annotating OCR training data using the “Training Dojo” tool.
Collecting OCR Training Data
- To start collecting OCR training data, click the button in the top right corner of the main interface.
- The system will begin capturing data for OCR training.
Managing OCR Training Data
- Once you’ve collected some data, go to the File menu.
- Click on “OCR Training Data Settings” to access data management options.
- In the settings window, you’ll find:
- The folder where data is stored
- A button to open this folder directly
- A button to save the collected data as a ZIP file
- An option to set the maximum size of saved data in MB
- In the OCR Training Data Settings, click the button to open the Training Dojo tool.
- The tool will display all collected files for annotation.
Interface Overview
- Left side: List of all files
- Files with approved annotations have a checkmark
- Right side: Current image and annotation interface
- Line edit field for entering or approving annotations
- Approved annotations are highlighted in green
Annotation Process
- For each image, enter the correct text in the line edit field.
- Press Enter to approve the annotation and move to the next image.
- Use Up/Down arrow keys to navigate through the list.
- Use Ctrl+Down and Ctrl+Up to jump to the next unapproved image.
- Click the filter button to show only unapproved images.
Keyboard Shortcuts
- Enter: Approve annotation and move to next image
- Up/Down Arrows: Navigate through the list
- Ctrl+Down: Jump to next unapproved image
- Ctrl+Up: Jump to previous unapproved image
Exporting Annotated Data
- Once you’ve finished annotating, exit the Training Dojo tool.
- In the OCR Training Data Settings:
- Click the button to save data as a ZIP file
- Choose a location to save the file
- Send the exported ZIP file to our team at support@scoresight.live for processing. We will be in touch with a new OCR model.
Best Practices
- Annotate regularly to maintain a manageable workload.
- Use keyboard shortcuts for faster navigation and annotation.
- Double-check your annotations before approving.
- Use the filter option to focus on unapproved images when nearing completion.
- Set a reasonable max size for saved data to prevent overwhelming file sizes.
By following this process, you’ll contribute valuable training data to improve the OCR system’s accuracy and performance. Thank you for your efforts in enhancing our OCR capabilities!