Sign up to save tools and stay up to date with the latest in AI
bg
bg
1

Reducto Document Ingestion API

Nov 05, 2024 - reducto.ai
RD-TableBench is an open benchmark developed to evaluate the extraction performance of complex tables. It includes a variety of challenging scenarios such as scanned tables, handwriting, language detection, merged cells, and more. The benchmark was created by Reducto, who employed a team of PhD-level human labelers to manually annotate 1000 complex table images from a diverse set of publicly available documents. The dataset includes examples with different structures, text density, and language.

The evaluation methodology involved testing various tools/methods including Reducto, Azure Document Intelligence, AWS Textract Tables, GPT4o, Google Cloud Document AI, Unstructured, and Chunkr. The evaluation process used a hierarchical alignment approach, treating table comparison as a hierarchical alignment problem, similar to DNA sequence alignment. The final similarity score, normalized between 0 and 1, indicates the level of match between tables. RD-TableBench aims to provide a more diverse set of real-world examples, ensuring accuracy with manual annotations.

Key takeaways:

  • RD-TableBench is an open benchmark developed to evaluate extraction performance for complex tables, including scenarios like scanned tables, handwriting, language detection, and merged cells.
  • The data for RD-TableBench was manually annotated by a team of PhD-level human labelers, comprising 1000 complex table images from a diverse set of publicly available documents.
  • The evaluation methodology involved several tools/methods including Reducto, Azure Document Intelligence, AWS Textract Tables, GPT4o, Google Cloud Document AI, Unstructured, and Chunkr.
  • The benchmark uses a hierarchical alignment approach for table comparison, treating it as a problem similar to DNA sequence alignment, and uses the Needleman-Wunsch algorithm for measuring table similarity.
View Full Article

Comments (0)

Be the first to comment!