1
Feature Story
Ask HN: How do I train a custom LLM/ChatGPT on my own documents in Dec 2023?
Dec 25, 2023 · news.ycombinator.comThe author strongly advises against feeding a set of documents into fine-tuning, stating that it only results in learning the patterns within those documents. The author has personally disproven this method multiple times due to a persistent client who has been misled into believing it's effective.
Key takeaways
- Training on documents is a misleading term used by many startups, the actual process involves using RAG and Llamaindex.
- Llamaindex is the best option for most startups with working products.
- Creating question and answer pairs with gpt-4 and using it for qLoRA might work, but it requires a lot of data and repeated concepts.
- Feeding a set of documents into fine tuning does not work, it only helps in learning the patterns in those documents.