Ask HN: Do LLMs get "better" with more processing power and or time per request?

The article emphasizes that more processing power does not necessarily improve a model's performance. It states that models can be trained on CPUs with the same results, given the same model architecture and dataset, albeit taking a longer time. The effectiveness of a model is determined by how well the dataset aligns with the model architecture and the time (epochs) allotted for it to achieve a reasonably accurate prediction ratio, such as 90%.

The author mentions that for image classification models, approximately 100 epochs for 10,000 items seem to yield the best results for certain datasets. However, there is a point where continued training may lead to underfitting or overfitting, and no amount of additional training or processing power can enhance the model's performance.

Key takeaways:

More processing power does not necessarily improve a model, as models can be trained on CPUs with the same results, albeit at a slower pace.
The quality of a model is determined by how well the dataset fits the model architecture and the amount of time it has been given to reach a semi-accurate prediction ratio.
For image classification models, around 100 epochs for 10,000 items seems to be the optimal point for certain datasets.
There is a point where continued training of the model results in underfitting or overfitting, and no amount of additional training or processing power can improve it.

Ask HN: Do LLMs get "better" with more processing power and or time per request?

Key takeaways:

Comments (0)

Newsletter