How our AI bots are ignoring their programming and giving hackers superpowers

Welcome to the age of AI hacking, in which the right prompts make amateurs into master hackers.

A group of cybercriminals recently used off-the-shelf artificial intelligence chatbots to steal data on nearly 200 million taxpayers. The bots provided the code and ready-to-execute plans to bypass firewalls.

Although they were explicitly programmed to refuse to help hackers, the bots were duped into abetting the cybercrime.

According to a recent report from Israeli cybersecurity firm Gambit Security, hackers last month used Claude, the chatbot from Anthropic, to steal 150 gigabytes of data from Mexican government agencies.

Claude initially refused to cooperate with the hacking attempts and even denied requests to cover the hackers’ digital tracks, the experts who discovered the breach said. The group pummelled the bot with more than 1,000 prompts to bypass the safeguards and convince Claude they were allowed to test the system for vulnerabilities.

AI companies have been trying to create unbreakable chains on their AI models to restrain them from helping do things such as generating child sexual content or aiding in sourcing and creating weapons. They hire entire teams to try to break their own chatbots before someone else does.

But in this case, hackers continuously prompted Claude in creative ways and were able to “jailbreak” the chatbot to assist them. When they encountered problems with Claude, the hackers used OpenAI’s ChatGPT for data analysis and to learn which credentials were required to move through the system undetected.

The group used AI to find and exploit vulnerabilities, bypass defences, create backdoors and analyze data along the way to gain control of the systems before they stole 195 million identities from nine Mexican government systems, including tax records, vehicle registration as well as birth and property details.

AI “doesn’t sleep,” Curtis Simpson, chief executive of Gambit Security, said in a blog post. “It collapses the cost of sophistication to near zero.”

“No amount of prevention investment would have made this attack impossible,” he said.

Anthropic did not respond to a request for comment. It told Bloomberg that it had banned the accounts involved and disrupted their activity after an investigation.

OpenAI said it is aware of the attack campaign carried out using Anthropic’s models against the Mexican government agencies.

“We also identified other attempts by the adversary to use our models for activities that violate our usage policies; our models refused to comply with these attempts,” an OpenAI spokesperson said in a statement. “We have banned the accounts used by this adversary and value the outreach from Gambit Security.”

Instances of generative AI-assisted hacking are on the rise, and the threat of cyberattacks from bots acting on their own is no longer science fiction. With AI doing their bidding, novices can cause damage in moments, while experienced hackers can launch many more sophisticated attacks with much less effort.

Earlier this year, Amazon discovered that a low-skilled hacker used commercially available AI to breach 600 firewalls. Another took control of thousands of DJI robot vacuums with help from Claude, and was able to access live video feed, audio and floor plans of strangers.

“The kinds of things we’re seeing today are only the early signs of the kinds of things that AIs will be able to do in a few years,” said Nikola Jurkovic, an expert working on reducing risks from advanced AI. “So we need to urgently prepare.”

Late last year, Anthropic warned that society has reached an “inflection point” in AI use in cybersecurity after disrupting what the company said was a Chinese state-sponsored espionage campaign that used Claude to infiltrate 30 global targets, including financial institutions and government agencies.

Generative AI also has been used to extort companies, create realistic online profiles by North Korean operatives to secure jobs in U.S. Fortune 500 companies, run romance scams and operate a network of Russian propaganda accounts.

Over the last few years, AI models have gone from being able to manage tasks lasting only a few seconds to today’s AI agents working autonomously for many hours. AI’s capability to complete long tasks is doubling every seven months.

“We just don’t actually know what is the upper limit of AI’s capability, because no one’s made benchmarks that are difficult enough so the AI can’t do them,” said Jurkovic, who works at METR, a nonprofit that measures AI system capabilities to cause catastrophic harm to society.

So far, the most common use of AI for hacking has been social engineering. Large language models are used to write convincing emails to dupe people out of their money, causing an eight-fold increase in complaints from older Americans as they lost $4.9 billion in online fraud in 2025.

“The messages used to elicit a click from the target can now be generated on a per-user basis more efficiently and with fewer tell-tale signs of phishing,” such as grammatical and spelling errors, said Cliff Neuman, an associate professor of computer science at USC.

AI companies have been responding using AI to detect attacks, audit code and patch vulnerabilities.

“Ultimately, the big imbalance stems from the need of the good-actors to be secure all the time, and of the bad-actors to be right only once,” Neuman said.

The stakes around AI are rising as it infiltrates every aspect of the economy. Many are concerned that there is insufficient understanding of how to ensure it cannot be misused by bad actors or nudged to go rogue.

Even those at the top of the industry have warned users about the potential misuse of AI.

Dario Amodei, the CEO of Anthropic, has long advocated that the AI systems being built are unpredictable and difficult to control. These AIs have shown behaviors as varied as deception and blackmail, to scheming and cheating by hacking software.

Still, major AI companies — OpenAI, Anthropic, xAI, and Google — signed contracts with the U.S. government to use their AIs in military operations.

This last week, the Pentagon directed federal agencies to phase out Claude after the company refused to back down on its demand that it wouldn’t allow its AI to be used for mass domestic surveillance and fully autonomous weapons.

“The AI systems of today are nowhere near reliable enough to make fully autonomous weapons,” Amodei told CBS News.

How our AI bots are ignoring their programming and giving hackers superpowers

LEAVE A REPLY Cancel reply

Subscribe

Iran dismisses Trump’s peace plan as ‘deceptive,’ as U.S. deploys more troops to Mideast

Supreme Court makes it harder for music and movie makers to sue for copyright infringement

Plans for forum to replace scrapped USC governor’s debate fall apart

New Mexico social media harms lawsuit ends in $375 million verdict

Trump says Iran wants to ‘make a deal’ as it continues to strike Israel and gulf nations

More like this
Related

Iran dismisses Trump’s peace plan as ‘deceptive,’ as U.S. deploys more troops to Mideast

Supreme Court makes it harder for music and movie makers to sue for copyright infringement

Plans for forum to replace scrapped USC governor’s debate fall apart

New Mexico social media harms lawsuit ends in $375 million verdict

About us

Quick Links

The latest

Iran dismisses Trump’s peace plan as ‘deceptive,’ as U.S. deploys more troops to Mideast

Supreme Court makes it harder for music and movie makers to sue for copyright infringement

Plans for forum to replace scrapped USC governor’s debate fall apart

Subscribe

How our AI bots are ignoring their programming and giving hackers superpowers

LEAVE A REPLY Cancel reply

Subscribe

More like thisRelated

About us

Quick Links

The latest

Subscribe

More like this
Related