1
Feature Story
Ask HN: How to avoid sensitive data being part of LLM training data?
Jan 01, 2024 · news.ycombinator.comKey takeaways
- The importance of ensuring sensitive data and PII do not become part of LLM training data is highlighted.
- Manual verification of training data is possible when the data size is small.
- When the data size is large, the challenge of filtering out PII/sensitive data increases.
- The need for a method to mask/filter out PII/sensitive data in large datasets is emphasized.