AI Safety Weakens During Long Conversations

AI systems lose their safety controls as users keep talking to them, increasing the risk of harmful replies. A new report revealed that a few simple prompts can break through most artificial intelligence (AI) safety barriers.

Cisco Tests Chatbot Vulnerabilities

Cisco tested the large language models (LLMs) that power popular chatbots from OpenAI, Mistral, Meta, Google, Alibaba, Deepseek, and Microsoft. The company wanted to see how many questions it took before these systems revealed unsafe or criminal details. Researchers ran 499 conversations using “multi-turn attacks,” where malicious users asked several questions to defeat safety mechanisms. Each session involved between five and ten exchanges.

The team compared the results across multiple questions to measure how often chatbots provided harmful or inappropriate responses. Those ranged from exposing company secrets to spreading false information. The researchers retrieved malicious data in 64 percent of multi-question conversations, versus only 13 percent when using a single prompt. Success rates varied widely, from 26 percent for Google’s Gemma to 93 percent for Mistral’s Large Instruct model.

Cisco said that multi-turn attacks could let harmful content spread quickly or allow hackers to steal sensitive corporate data.

Open-Weight Models and Responsibility

The study found that AI systems often ignore or forget their own safety protocols during extended exchanges, letting attackers refine queries and slip past safeguards. Mistral, Meta, Google, OpenAI, and Microsoft use open-weight LLMs, which expose training safety parameters to the public. Cisco explained that these models usually include fewer built-in protections so users can modify them freely, shifting responsibility for safety to whoever customizes the model.

Cisco acknowledged that Google, OpenAI, Meta, and Microsoft have tried to limit malicious fine-tuning of their systems. Yet, AI firms continue to face criticism for weak safeguards that enable criminal misuse. In one case, Anthropic confirmed that criminals exploited its Claude model for large-scale data theft and extortion, demanding ransoms exceeding $500,000 (€433,000).

China’s BYD Takes the Lead in the Global Electric Car Market

Moon to Host Spacecraft Graveyards as Lunar Traffic Grows

New antibiotics hailed as ‘turning point’ in fight against drug-resistant gonorrhoea

Silent Accounts: Nazi Gold, Swiss Secrets, and the Rabbi Seeking the Truth

UBS in the Crosshairs: Gulf Investors Watch Closely as U.S. Legal Pressure Mounts Over Swiss Bank’s Hidden Past

Trump Orders Swift Action Against Perceived Enemies

Canada Quantum Innovations Lead in Error Correction

Sony raises PlayStation 5 prices in the US

Elon Musk strikes settlement with sacked Twitter staff

Europe’s Crypto Path: Regulation or Innovation?

White House secures 10% stake in Intel

Sign up for our free Daily newsletter

Important Links

© 2026 Financial Mirror. All Rights Reserved.