In 2023, semANT brought its first tangible fruit: the TextBite software package. TextBite provides a semantic layout analysis on top of plain OCR output. It enhances a PAGE XML description of an analyzed page by introducing title elements, clustering text lines in semantically related parts (chapters, articles, dictionary entries, …), reading order and altering already present regions as needed. All of this new information is stored in a standard way described by the PAGE standard, allowing for further processing.

TextBite Read More »