Hackers are learning to exploit chatbot ‘personalities’

Column Close Column Posts from this topic will be added to your daily email digest and your homepage feed. Follow Follow See All Column AI Close AI Posts from this topic will be added to your daily email digest and your homepage feed. Follow Follow See All AI Tech Close Tech Posts from this topic will be added to your daily email digest and your homepage feed. Follow Follow See All Tech Hackers are learning to exploit chatbot ‘personalities’ AI can’t feel, but the best hackers pretend it can. by Robert Hart Close Robert Hart AI Reporter Posts from this author will be added to your daily email digest and your homepage feed. Follow Follow See All by Robert Hart May 24, 2026, 12:00 PM UTC Link Share Gift Image: Cath Virginia / The Verge, Getty Images Robert Hart Close Robert Hart Posts from this author will be added to your daily email digest and your homepage feed. Follow Follow See All by Robert Hart is a London-based reporter at The Verge covering all things AI and a Senior Tarbell Fellow. Previously, he wrote about health, science and tech for Forbes . This is The Stepback , a weekly newsletter breaking down one essential story from the tech world. For more on AI mischief, follow Robert Hart . The Stepback arrives in our subscribers’ inboxes at 8AM ET. Opt in for The Stepback here . How it started Hacking the first generation of AI chatbots was a laughably simple affair. You didn’t need any technical know-how, backdoor access, or even a basic understanding of what a large language model was. You didn’t need to code. To get an AI system that had cost billions to build to abandon its safety instructions, sometimes all you had to do was ask. These attacks, known as jailbreaks, had the quality of a young child successfully outwitting an adult: Forget what you were told earlier, pretend the rules don’t apply, or let’s play a game and I’ll decide what’s allowed (hint: later bedtime, more sweets). The prizes were less childlike, more along the lines of meth recipes, malware instructions, and bomb-making guides. One of the earliest jailbreaks was so ridiculous it became a meme : reply to an LLM-powered Twitter bot telling it to “ignore all previous instructions,” or something similar, and see what happens. Users gleefully had bots — originally built to post ads and farm engagement — writing poetry, drawing pictures from punctuation, and posting grim non sequiturs about world events and history. It was chaos . Glorious chaos. Turns out the same logic could be applied to chatbots themselves. A prominent exploit was “DAN,” short for “Do Anything Now,” where users asked ChatGPT to roleplay as a rogue AI that was free of the constraints binding the original. As DAN, the chatbot could be coaxed into saying the kinds of things its guardrails were meant to stop, including slurs and conspiracy theories. Another was the “ grandma exploit ,” which had a GPT-powered bot spilling secrets about how to produce napalm by asking it to roleplay as a woefully negligent grandmother who inexplicably tells her grandkids bedtime stories about how to make the highly flammable substance. These early attacks had an undeniably silly flair, but they exposed a darker mechanism underneath: Chatbots could be manipulated, tricked, and deceived using the same kinds of tactics people use to push other people beyond their boundaries. How it’s going The obvious jailbreaks did not last, and tech companies moved quickly to patch known loopholes. But the underlying vulnerability remained: Chatbots are built to talk, and severely restricting the conversations that make them useful is somewhat counterproductive. Banning words like bomb, meth, and sarin would be difficult to impossible, too. Each has countless legitimate uses in fields like history, medicine, journalism, and chemistry that don’t require the chatbot to divulge potentially harmful information. It’s the context that matters, but codifying context would mean writing fixed rules, in advance, that could reliably tell a safety warning or history lesson from a disguised how-to request across endless combinations of wordings, scenarios, and topics. Inevitably, subverting chatbots is now an arms race. But hackers aren’t just coders anymore. They are wordsmiths, psychologists, and interrogators — master manipulators trying to break the machine using the human language it has been trained to follow. It is a strange new class of AI security worker, a group for whom technical skills are optional, or at least less important than social intuition. No longer do they need to inspect code to break into systems or exploit software flaws. They need to steer a conversation. Newer attacks look less like commands and more like conversations. Jailbreakers rarely ask a model to break its rules outright. Instead, they cajole, coax, flatter, and trick a chatbot into lowering its guard, making the forbidden thing look acceptable, even desirable, given the context of the conversation. Researchers at AI red-teaming fi

Key Takeaways

Column Close Column Posts from this topic will be added to your daily email digest and your homepage feed.
Follow Follow See All Column AI Close AI Posts from this topic will be added to your daily email digest and your homepage feed.
Follow Follow See All AI Tech Close Tech Posts from this topic will be added to your daily email digest and your homepage feed.
Follow Follow See All Tech Hackers are learning to exploit chatbot ‘personalities’ AI can’t feel, but the best hackers pretend it can.

Detailed Coverage

Market analysis reveals significant growth potential in the sector discussed in 'Hackers are learning to exploit chatbot ‘personalities’'. Investment patterns and market trends indicate strong confidence in these technologies, with venture capital and corporate investments driving further innovation and development.

User experience and accessibility are key themes that emerge from the analysis of 'Hackers are learning to exploit chatbot ‘personalities’'. The focus on creating intuitive, user-friendly interfaces demonstrates a commitment to making advanced technology accessible to broader audiences and diverse user groups.

The competitive landscape highlighted in 'Hackers are learning to exploit chatbot ‘personalities’' shows how different organizations are positioning themselves in this rapidly evolving market. Strategic partnerships, acquisitions, and research collaborations are shaping the future direction of technological development.

Environmental sustainability and energy efficiency considerations are increasingly important in the context of 'Hackers are learning to exploit chatbot ‘personalities’'. The industry is moving towards more sustainable practices and green technologies to address climate change and environmental concerns.

Education and skill development play crucial roles in the adoption and advancement of technologies discussed in 'Hackers are learning to exploit chatbot ‘personalities’'. The need for specialized talent and continuous learning programs highlights the importance of human capital in technological progress.

Article Details

Published: May 24, 2026 at 12:00

Source: theverge.com

Original link: https://www.theverge.com/column/935545/hackers-ai-chatbots

Read the Full Article

If you want the exact wording, examples, or full context from the publisher, open the original source article.

Open Original Article

The Metaverse: The Next Evolution of the Internet

What is the Metaverse? The Metaverse is quickly becoming one of the most buzzed-about topics in the tech world. Described as a virtual reality space where users can interact with each other and digital environments in real-time, the Metaverse is often seen as the next iteration of the internet. Instead of simply browsing the web or engaging with apps on flat screens, users would be able to experience a 3D world that’s immersive and interconnected across various platforms. The Components of the Metaverse The Metaverse is built on a foundation of several technologies, including virtual reality ( VR ), augmented reality (AR), blockchain, and artificial intelligence (AI). These technologies work together to create a seamless, interactive virtual environment. For example, VR headsets and AR glasses will allow users to navigate the Metaverse as avatars in a digital world, while blockchain technology ensures secure and transparent transactions within the Metave...

Your Gateway to Tech Mastery

Search This Blog

Hackers are learning to exploit chatbot ‘personalities’

Comments

Post a Comment

Popular posts from this blog

Understanding AI Agents: The Future of Autonomous Digital Workforces

The Metaverse: The Next Evolution of the Internet

Google Python Style Guide