The author emphasizes the importance of organizational thinking and discipline in achieving repeatable results with machine learning. They argue that obtaining training data is often the most significant challenge in most projects. They suggest that even with an outdated classification model, having a substantial amount of labeled examples can yield a useful classifier. They caution against the temptation to use the latest models from recent research papers, as they often require significantly more effort and risk without substantial improvement in results. They also highlight the importance of UI skills in building systems for data labeling.
Key takeaways:
- The author has a diverse background in physics, web development, machine learning, and data science, and has worked on a wide range of projects in these fields.
- The author believes that data scientists should apply the same organizational thinking and discipline used in application development to get repeatable results with machine learning.
- According to the author, obtaining training data is the biggest challenge for most projects, and having a large number of labeled examples can significantly improve the performance of a classifier.
- The author has developed UI skills and has worked on side projects to build systems that allow people to label data.