In a landmark study, OpenAI researchers reveal that large language models will always produce plausible but false outputs, even with perfect data, due to fundamental statistical and computational limits.
nine out of 10 major evaluations used binary grading that penalized “I don’t know” responses while rewarding incorrect but confident answers.
This is how we treat people, too. I can’t count the number of times I’ve heard IT staff spouting off confident nonsense and getting congratulated for it. My old coworker turned it into several promotions because the people he was impressing with his bullshit were so far removed from day to day operations that any slip-ups could be easily blame shifted to others. What mattered was that he sounded confident despite knowing jack about shit.
This is how we treat people, too. I can’t count the number of times I’ve heard IT staff spouting off confident nonsense and getting congratulated for it. My old coworker turned it into several promotions because the people he was impressing with his bullshit were so far removed from day to day operations that any slip-ups could be easily blame shifted to others. What mattered was that he sounded confident despite knowing jack about shit.