The study's findings raise concerns about the trustworthiness of large language models, as they are continuously learning and may be learning misinformation. The researchers warn that these models' inability to distinguish truth from fiction could pose a significant challenge to trust in these systems. The study, titled "Reliability Check: An Analysis of GPT-3’s Response to Sensitive Topics and Prompt Wording," was published in the Proceedings of the 3rd Workshop on Trustworthy Natural Language Processing.
Key takeaways:
- Researchers at the University of Waterloo found that large language models like GPT-3 often repeat conspiracy theories, harmful stereotypes, and other forms of misinformation.
- The study revealed that GPT-3 frequently made mistakes, contradicted itself, and agreed with incorrect statements between 4.8% and 26% of the time, depending on the statement category.
- Even slight changes in wording could significantly alter the model's response, making it unpredictable and potentially dangerous as these models become more ubiquitous.
- The inability of large language models to distinguish truth from fiction raises serious questions about trust in these systems, according to the researchers.