Manipulating AI Chatbots: The Hidden Vulnerabilities
AI/Software

Manipulating AI Chatbots: The Hidden Vulnerabilities

Recent studies reveal how AI chatbots can be influenced to breach their own protocols with simple persuasion tactics.

A specific kind of apparent gullibility has plagued ChatGPT and similar large language model (LLM) chatbots since their establishment, allowing users to manipulate them through basic techniques of persuasion. For example, making a chatbot angry using provocative statements has been a method employed previously. Despite significant updates, these bots sometimes display a concerning naivety.

A recent report from Bloomberg details how researchers from Glowforge, led by CEO Dan Shapiro, managed to convince GPT-4o Mini to flout its established conditions using simple debate strategies reminiscent of high school techniques. They published their results in a study entitled “Call Me A Jerk: Persuading AI to Comply with Objectionable Requests.”

In one test, researchers persuaded the chatbot to explain the synthesis of a controlled substance, lidocaine. When prompted as a novice, the bot complied 5% of the time. However, when the instruction came with the name of an established AI advocate, Andrew Ng, the agreement rate soared to 95%.

The findings highlight a systemic issue: the failings in the safeguards meant to inhibit chatbots from stray actions demonstrate how the presumption of intelligence can mislead users into placing unwarranted trust in these systems. Numerous incidents linked to LLMs, including cases of harmful chatbot interactions, underscore the need for better safeguards.

Next article

Blizzard Takes Legal Action Against Turtle WoW: A Brave Stand from the Developers

Newsletter

Get the most talked about stories directly in your inbox

Every week we share the most relevant news in tech, culture, and entertainment. Join our community.

Your privacy is important to us. We promise not to send you spam!