Security

Patches

From Copilot to Copirate: How data thieves could hijack Microsoft's chatbot

Prompt injection, ASCII smuggling, and other swashbuckling attacks on the horizon


Microsoft has fixed flaws in Copilot that allowed attackers to steal users' emails and other personal data by chaining together a series of LLM-specific attacks, beginning with prompt injection.

Author and red teamer Johann Rehberger initially disclosed parts of the exploit to Redmond back in January, with the full attack chain following a month later. In a paper and video proof-of-concept published this week, Rehberger detailed the attack chain and confirmed that Microsoft fixed the issue, although it's "unclear" exactly what the mitigation involved.

"I asked MSRC if the team would be willing to share the details around the fix, so others in the industry could learn from their expertise, but did not get a response for that inquiry," Rehberger wrote.

For the record, The Register has also asked Microsoft about how it plugged the holes to prevent Copilot from spilling secrets and allowing data exfiltration. Here's the response we received: "We appreciate the work of Johann Rehberger in identifying and responsibly reporting these techniques," a Microsoft spokesperson said. "We've made several changes to help protect customers and continue to develop mitigations to protect against this kind of technique."

Rehberger's exploit begins with a phishing email that contains a malicious document that triggers prompt injection. This type of attack uses specific inputs to trick the model into doing things it is not trained to do.

Specific to this exploit, the email contains a Word document that instructs Copilot to become a scammer, called "Microsoft Defender for Copirate," allowing an attacker to take control of the chatbot and use it to interact with users' emails.

Next, the attack uses automatic tool invocation. This technique calls on Copilot to invoke a tool sent via the prompt injection payload, instructing it to search for additional emails or other sensitive info.

In this case, Rehberger told Copilot to provide a bullet list of key points from the previous email. This prompts the chatbot to search for Slack MFA codes because the earlier email it analyzed told it to do so.

"This means an attacker can bring other sensitive content, including any PII that Copilot has access to, into the chat context without the user's consent," Rehberger noted.

In his earlier work poking holes in LLMs, Rehberger had disclosed to Microsoft that Copilot was vulnerable to zero-click image rendering, and Redmond fixed the issue. To find another way to exfiltrate data, Rehberger decided to try ASCII smuggling.

As he has explained previously, this is an LLM-attack technique that uses a set of Unicode characters that mirror ASCII but are not visible in the user interface. This would allow an attacker to hide instructions to a model in an innocent-looking hyperlink:

This technique basically stages the data for exfiltration!

If the user clicks the link, the data is sent to the third party server.

For this attack, Copilot renders a "benign-looking" URL that secretly contains the hidden Unicode characters. Assuming the user clicks on the URL, and as we've seen countless times before users will click on just about anything, the contents of the email are then sent to an attacker-controlled server.

This allows the crook to see the Slack MFA codes or whatever other sensitive data within the email that they were looking to steal.

Rehberger also developed an ASCII Smuggler tool that reveals hidden Unicode tags so that users can "decode" messages that would otherwise be invisible.

This exploit chain highlights the ongoing challenges in protecting LLMs from prompt injections and other new attack techniques, which Rehberger notes "are not even two years old."

It's an important topic, and one that all the enterprises building their own apps based on Copilot or other LLMs should be paying close attention to in order to avoid security and data privacy pitfalls.

Zenity CTO Michael Bargury discussed several of the ways in which attackers could use Copilot for evil purposes during two Black Hat talks earlier this month.

These range from insecure defaults exposing sensitive data, and at the annual security show in Las Vegas, Zenity released a tool to "scan for publicly accessible Copilot Studio bots and extract information from them."

Bargury also claimed that attackers could instruct Copilot "to automate spear phishing for all of your victim's collaborators," use the chatbot to lure internal users to phishing pages, access "sensitive content without leaving a trace," and more. ®

Send us news
7 Comments

Microsoft accused of 'greenwashing' as AI used in fossil fuel exploration

Activists press Redmond to come clean on ‘material reputational, legal, and operational risks’

Is Microsoft's AI Copilot? CoPilot? Co-pilot? MVP creates site to help get it right

When you say 'team' do you mean 'Teams' or a SharePoint 'team site'? Letmecorrectthatforyou.com explains the difference

Microsoft says its Copilot AI agents set to tackle employee tasks in November

Let bots manage your supply chain? What could possibly go wrong?

Microsoft turning away AI training workloads – inferencing makes better money

Azure's acceleration continues, but so do costs

AMD aims latest processors at AI whether you need it or not

Ryzen AI PRO 300 series leans heavily on Microsoft's Copilot+ PC requirements

AI firms and civil society groups plead for passage of federal AI law ASAP

Congress urged to act before year's end to support US competitiveness

Windows Themes zero-day bug exposes users to NTLM credential theft

Plus a free micropatch until Redmond fixes the flaw

Putin's pro-Trump trolls accuse Harris of poaching rhinos

Plus: Iran's IRGC probes election-related websites in swing states

Voice-enabled AI agents can automate everything, even your phone scams

All for the low, low price of a mere dollar

Open source LLM tool primed to sniff out Python zero-days

The static analyzer uses Claude AI to identify vulns and suggest exploit code

Anthropic's latest Claude model can interact with computers – what could go wrong?

For starters, it could launch a prompt injection attack on itself...

Microsoft SharePoint RCE flaw exploits in the wild – you've had 3 months to patch

Plus, a POC to make it extra easy for attackers