The evaluation methodology involved testing various tools/methods including Reducto, Azure Document Intelligence, AWS Textract Tables, GPT4o, Google Cloud Document AI, Unstructured, and Chunkr. The evaluation process used a hierarchical alignment approach, treating table comparison as a hierarchical alignment problem, similar to DNA sequence alignment. The final similarity score, normalized between 0 and 1, indicates the level of match between tables. RD-TableBench aims to provide a more diverse set of real-world examples, ensuring accuracy with manual annotations.
Key takeaways:
- RD-TableBench is an open benchmark developed to evaluate extraction performance for complex tables, including scenarios like scanned tables, handwriting, language detection, and merged cells.
- The data for RD-TableBench was manually annotated by a team of PhD-level human labelers, comprising 1000 complex table images from a diverse set of publicly available documents.
- The evaluation methodology involved several tools/methods including Reducto, Azure Document Intelligence, AWS Textract Tables, GPT4o, Google Cloud Document AI, Unstructured, and Chunkr.
- The benchmark uses a hierarchical alignment approach for table comparison, treating it as a problem similar to DNA sequence alignment, and uses the Needleman-Wunsch algorithm for measuring table similarity.