To use NuExtract, users need to import the necessary modules and define a function that prepares the input and processes the output. The model and tokenizer are loaded from the pretrained "numind/NuExtract". The input text, schema, and example are defined and passed to the predict_NuExtract function. The function prepares the input, generates the output, and returns the extracted information. The model is recommended to be used with bf16 for negligible performance loss.
Key takeaways:
- NuExtract is a structure extraction model by NuMind, fine-tuned on a private high-quality synthetic dataset for information extraction.
- The model is purely extractive, meaning all text output by the model is present as is in the original text.
- NuMind provides a tiny(0.5B) and large(7B) version of this model: NuExtract-tiny and NuExtract-large.
- The model can be used by providing an input text (less than 2000 tokens) and a JSON template describing the information you need to extract.