

107·
7 hours agoIts actually been proven that AI can and will lie. When given a ability to cheat a task and the instructions not to use it. It will use the tool and fully deny doing so.
Edit:
Not sure why the downvotes because when i say proven i mean the research has been done and the results have been known for while
Neurosama is a fun example but we dont really know the sauce vedal coocked up.
When i say proven i mean 32 page research paper specifically looking into it.
https://arxiv.org/abs/2407.12831
They found that even a model trained specifically on honesty will lie if it has an incentive.
The reasoning models will output that they used the forbidden tool in their reasoning window before lying in the final output.