From Copilot to Copirate: How data thieves could hijack Microsoft's chatbot

Prompt injection, ASCII smuggling, and other swashbuckling attacks on the horizon

Jessica Lyons Wed 28 Aug 2024 // 13:05 UTC

Microsoft has fixed flaws in Copilot that allowed attackers to steal users' emails and other personal data by chaining together a series of LLM-specific attacks, beginning with prompt injection.

Author and red teamer Johann Rehberger initially disclosed parts of the exploit to Redmond back in January, with the full attack chain following a month later. In a paper and video proof-of-concept published this week, Rehberger detailed the attack chain and confirmed that Microsoft fixed the issue, although it's "unclear" exactly what the mitigation involved.

"I asked MSRC if the team would be willing to share the details around the fix, so others in the industry could learn from their expertise, but did not get a response for that inquiry," Rehberger wrote.

For the record, The Register has also asked Microsoft about how it plugged the holes to prevent Copilot from spilling secrets and allowing data exfiltration. Here's the response we received: "We appreciate the work of Johann Rehberger in identifying and responsibly reporting these techniques," a Microsoft spokesperson said. "We've made several changes to help protect customers and continue to develop mitigations to protect against this kind of technique."

Rehberger's exploit begins with a phishing email that contains a malicious document that triggers prompt injection. This type of attack uses specific inputs to trick the model into doing things it is not trained to do.

Specific to this exploit, the email contains a Word document that instructs Copilot to become a scammer, called "Microsoft Defender for Copirate," allowing an attacker to take control of the chatbot and use it to interact with users' emails.

Next, the attack uses automatic tool invocation. This technique calls on Copilot to invoke a tool sent via the prompt injection payload, instructing it to search for additional emails or other sensitive info.

In this case, Rehberger told Copilot to provide a bullet list of key points from the previous email. This prompts the chatbot to search for Slack MFA codes because the earlier email it analyzed told it to do so.

"This means an attacker can bring other sensitive content, including any PII that Copilot has access to, into the chat context without the user's consent," Rehberger noted.

In his earlier work poking holes in LLMs, Rehberger had disclosed to Microsoft that Copilot was vulnerable to zero-click image rendering, and Redmond fixed the issue. To find another way to exfiltrate data, Rehberger decided to try ASCII smuggling.

As he has explained previously, this is an LLM-attack technique that uses a set of Unicode characters that mirror ASCII but are not visible in the user interface. This would allow an attacker to hide instructions to a model in an innocent-looking hyperlink:

This technique basically stages the data for exfiltration!

If the user clicks the link, the data is sent to the third party server.

For this attack, Copilot renders a "benign-looking" URL that secretly contains the hidden Unicode characters. Assuming the user clicks on the URL, and as we've seen countless times before users will click on just about anything, the contents of the email are then sent to an attacker-controlled server.

This allows the crook to see the Slack MFA codes or whatever other sensitive data within the email that they were looking to steal.

Rehberger also developed an ASCII Smuggler tool that reveals hidden Unicode tags so that users can "decode" messages that would otherwise be invisible.

This exploit chain highlights the ongoing challenges in protecting LLMs from prompt injections and other new attack techniques, which Rehberger notes "are not even two years old."

It's an important topic, and one that all the enterprises building their own apps based on Copilot or other LLMs should be paying close attention to in order to avoid security and data privacy pitfalls.

Zenity CTO Michael Bargury discussed several of the ways in which attackers could use Copilot for evil purposes during two Black Hat talks earlier this month.

These range from insecure defaults exposing sensitive data, and at the annual security show in Las Vegas, Zenity released a tool to "scan for publicly accessible Copilot Studio bots and extract information from them."

Bargury also claimed that attackers could instruct Copilot "to automate spear phishing for all of your victim's collaborators," use the chatbot to lure internal users to phishing pages, access "sensitive content without leaving a trace," and more. ®

Security

Patches

From Copilot to Copirate: How data thieves could hijack Microsoft's chatbot

Prompt injection, ASCII smuggling, and other swashbuckling attacks on the horizon

Microsoft accused of 'greenwashing' as AI used in fossil fuel exploration

Is Microsoft's AI Copilot? CoPilot? Co-pilot? MVP creates site to help get it right

Microsoft says its Copilot AI agents set to tackle employee tasks in November

Microsoft turning away AI training workloads – inferencing makes better money

AMD aims latest processors at AI whether you need it or not

AI firms and civil society groups plead for passage of federal AI law ASAP

Windows Themes zero-day bug exposes users to NTLM credential theft

Putin's pro-Trump trolls accuse Harris of poaching rhinos

Voice-enabled AI agents can automate everything, even your phone scams

Open source LLM tool primed to sniff out Python zero-days

Anthropic's latest Claude model can interact with computers – what could go wrong?

Microsoft SharePoint RCE flaw exploits in the wild – you've had 3 months to patch

About Us

Our Websites

You Privacy