Ask HN: Benchmarks for models other than LLMs

The article discusses the use of benchmarks in evaluating the abilities of LLMs and wonders if similar benchmarks exist for propensity modelling, churn prediction, or other types of models. The author is interested in understanding if there are established best practices for comparing the performance of different models, especially when these models are based on different underlying datasets.

The author seeks to understand the standards for comparing model performance beyond just benchmark data. The discussion is centered around finding a more comprehensive and fair method of comparison that takes into account the unique characteristics of each model's underlying dataset.

Key takeaways

The author has observed impressive benchmarks used for ranking LLMs abilities.
The author is curious if similar benchmarks exist for propensity modelling, churn prediction, or other types of models.
The author is interested in best practices for comparing model performance beyond just benchmark data.
The author acknowledges that different models may have different underlying datasets, which could affect comparisons.

Ask HN: Benchmarks for models other than LLMs

Key takeaways

Discussion (0)