@webghost0101

webghost0101@sopuli.xyz · edit-2 7 hours ago

Neurosama is a fun example but we dont really know the sauce vedal coocked up.

When i say proven i mean 32 page research paper specifically looking into it.

They found that even a model trained specifically on honesty will lie if it has an incentive.

The reasoning models will output that they used the forbidden tool in their reasoning window before lying in the final output.

webghost0101@sopuli.xyz · edit-2 7 hours ago

Its actually been proven that AI can and will lie. When given a ability to cheat a task and the instructions not to use it. It will use the tool and fully deny doing so.

Edit:

Not sure why the downvotes because when i say proven i mean the research has been done and the results have been known for while

https://arxiv.org/abs/2407.12831