A popular technique to make AI more efficient has drawbacks

Quantization, a technique used to make AI models more efficient by reducing the number of bits needed to represent information, may have more limitations than previously thought. A study by researchers from Harvard, Stanford, MIT, Databricks, and Carnegie Mellon found that quantized models perform worse if the original model was trained over a long period with lots of data. This could be problematic for AI companies that train large models and then quantize them to make them less expensive to serve. The study also found that AI model inferencing is often more costly than model training.

The study suggests that training models in "low precision" could make them more robust. However, precisions lower than 7- or 8-bit may see a noticeable decrease in quality. The researchers believe that more effort will be put into meticulous data curation and filtering, so that only the highest quality data is put into smaller models. They also predict that new architectures that aim to make low precision training stable will be important in the future.

Key takeaways:

Quantization, a technique used to make AI models more efficient, may have more limitations than previously assumed, particularly when applied to models trained over a long period on lots of data.
AI model inferencing is often more expensive in aggregate than model training, with the cost of inference being a significant issue in the AI industry.
Training models in 'low precision' could potentially make them more robust, but precisions lower than 7- or 8-bit may see a noticeable step down in quality.
There's no free lunch when it comes to reducing inference costs, and the future may see more effort put into meticulous data curation and filtering, as well as new architectures that aim to make low precision training stable.

A popular technique to make AI more efficient has drawbacks | TechCrunch

Key takeaways:

Comments (0)

Newsletter