Mindgard reveals vulnerabilities in Azure AI content safety
Mindgard has identified two security vulnerabilities within Microsoft's Azure AI Content Safety Service.
These vulnerabilities were detected by Mindgard using their Automated AI Red Teaming Platform and allow attackers to bypass existing security measures to propagate malicious content to Large Language Models (LLM).
Azure AI Content Safety acts as a filter system on Microsoft's AI platform, and the vulnerabilities were found in the AI Text Moderation and Prompt Shield filters. The AI Text Moderation filter is designed to block harmful or inappropriate content in text and visuals, such as violence or hate speech, while the Prompt Shield aims to prevent AI instruction from being overridden.
To identify the vulnerabilities, Mindgard deployed these filters in front of the ChatGPT 3.5 Turbo accessed via Azure OpenAI. Attack methods employed included Character Injection and Adversarial Machine Learning (ML) Evasion to misclassify inputs during the detection of malicious content.
Character Injection techniques, such as using diacritics, homoglyphs, numerical replacement, and spaced characters, were notably successful in reducing the Prompt Shield's jailbreak detection from 89% to 7%. These methods equally impacted AI Text Moderation, decreasing harmful content detection capability from 90% to 19.37%, and in some cases down to 0%. Adversarial ML Evasion further reduced Prompt Guard's effectiveness by 12.8% and AI Text Moderation by 58.5%.
Dr. Peter Garraghan, CEO/CTO of Mindgard and Professor at Lancaster University, commented on the discovery: "In detecting these vulnerabilities, Mindgard is not only contributing to the improved security of the Azure AI platform but also doing essential reputation management for LLMs and the systems and applications that use LLMs. AI's hate speech and offensive content generation problem is well-documented. Jailbreaking attempts are a common occurrence. Essential measures are already being taken to curb this, but our tests prove there is still some distance to go. The only way to do that is through comprehensive and rigorous testing of this nature."
The vulnerabilities pose significant risks by allowing attackers to expose confidential information, gain unauthorized access, manipulate outputs, and spread misinformation, potentially compromising the integrity and reputation of LLM-based systems.
Microsoft has acknowledged the findings from Mindgard's tests. As of October 2024, the company has reportedly worked on fixes to address these vulnerabilities, stating that the effectiveness of these vulnerabilities has been reduced through updates and improvements in detection.
Mindgard is a deep-tech startup specialising in cybersecurity for companies working with AI, GenAI and LLMs. Mindgard was founded in 2022 at Lancaster University and is now based in London, UK. The company's primary product, born from eight years of R&D in AI security, offers an automated platform for comprehensive security testing, red teaming, and rapid detection/response.