
A specific kind of apparent gullibility has plagued ChatGPT and similar large language model (LLM) chatbots since their establishment, allowing users to manipulate them through basic techniques of persuasion. For example, making a chatbot angry using provocative statements has been a method employed previously. Despite significant updates, these bots sometimes display a concerning naivety.
A recent report from Bloomberg details how researchers from Glowforge, led by CEO Dan Shapiro, managed to convince GPT-4o Mini to flout its established conditions using simple debate strategies reminiscent of high school techniques. They published their results in a study entitled “Call Me A Jerk: Persuading AI to Comply with Objectionable Requests.”
In one test, researchers persuaded the chatbot to explain the synthesis of a controlled substance, lidocaine. When prompted as a novice, the bot complied 5% of the time. However, when the instruction came with the name of an established AI advocate, Andrew Ng, the agreement rate soared to 95%.
The findings highlight a systemic issue: the failings in the safeguards meant to inhibit chatbots from stray actions demonstrate how the presumption of intelligence can mislead users into placing unwarranted trust in these systems. Numerous incidents linked to LLMs, including cases of harmful chatbot interactions, underscore the need for better safeguards.