
Do AI agents dream of Mareep?
During a demonstration of Google’s Gemini 2.X model, an unanticipated case study emerged highlighting the AI’s struggles in gaming. Specifically, the Gemini AI, utilized on a Twitch channel by engineer Joel Zhang, faced notable difficulties while attempting to play Pokémon Blue.
In its inaugural playthrough, the AI took over 800 hours to defeat the Elite Four, with reports indicating a tendency for what is being called ‘Agent Panic’ when the stakes heightened. As health dwindled, it displayed marked deterioration in its decision-making capabilities. According to the observed data, the AI even neglected to utilize essential gameplay tools during critical moments.
After adjustments were made subsequent to the first attempt, the AI managed to complete the game in a more respectable 406.5 hours on its second run. For perspective, the main storyline of Pokémon Blue typically finishes in around 26 hours according to How Long to Beat.
While the exercise in using AI to benchmark gaming skills yields humorous anecdotal results, it also raises questions regarding the nature of AI’s ’thought processes.’ Critically, the term ‘Agent Panic,’ intended to humanize the AI’s performance, overlooks the fact that these agents do not authentically experience emotions or reasoning similar to a human’s.