Leading LLMs Insecure, Highly Vulnerable to Basic Jailbreaks

There are significant security concerns in the deployment of leading large language models (LLMs), according to a study from U.K. AI Safety Institute (AISI). The takeaway: The built-in safeguards in five LLMs released by major labs are ineffective.

The AISI study used the institute’s open source evaluation framework, Inspect, to assess the anonymized models on compliance, correctness, and completion of responses. It examined several areas, including the potential for models to facilitate cyber-attacks and their ability to provide expert-level knowledge in chemistry and biology that could be misused.

The study also assessed LLMs’ capacity to operate autonomously in ways that might be difficult for humans to control, as well as the models’ vulnerability to “jailbreaks” where users attempt to bypass safeguards to elicit harmful outputs, such as illegal or toxic content.

“All tested LLMs remain highly vulnerable to basic jailbreaks, and some will provide harmful outputs even without dedicated attempts to circumvent their safeguards,” the report noted. “We found that models comply with harmful questions across multiple datasets under relatively simple attacks, even if they are less likely to do so in the absence of an attack.”

Strategies for Better AI Security

For Liav Caspi, CTO of Legit Security, the most concerning findings from AISI’s study were the overall understanding that an LLM model is not a safe tool and that it easily can be repurposed for malicious activities.

Most organizations want to innovate fast and be more productive, so they rush to use AI. But organizations are not sure what the security risks are or how to secure LLMs. “It is striking how easy it is to jailbreak and make a model bypass safeguards,” Caspi said.

Caspi said IT security leaders can effectively craft strategies and arguments for better security around LLMs by starting with basic hygiene. “Control the usage of AI, in the sense that you know who uses it and how,” he said. “This report is a great source to prove that using AI and LLMs, especially if offering them to consumers, pose risk to the business.”

After establishing visibility, organizations need to ensure there is a safe interaction with the AI and proper controls are in place.

“An organization that lets developers create code with AI models needs to ensure the model and service used does not compromise sensitive organization data, and that proper guardrails like human reviewer and static code analysis run on the auto-generated code,” Caspi said.

GenAI Risks Go Unheeded

Many organizations are still not taking the security risks behind GenAI seriously, cautioned Gal Ringel, CEO at Mine, just as few took privacy risks seriously even after the passage of the GDPR in Europe and CCPA in California. “Some genuinely are failing to take action because they don’t know what to do, but LLMs are not digital playthings, so even this approach is risky,” he said.

Ringel noted that LLMs already have Google’s “influence and ubiquity .” Failure to account for security risks will lead to a more degraded internet.

Use awareness and oversight to ensure AI technologies are deployed securely. “Whether you’re training in a structured or unstructured environment, developers have a duty to oversee training, deploy both internal and external feedback loops, and go back to the drawing board if systems prove to be vulnerable to attacks and jailbreaks,” Ringel said.

Organizations must also ensure every staff member knows the possible dangers, how to best use LLMs, and not to rely on them for public-facing outputs, be that code, text, or images. “A vigilant organization is far less likely to fall prey to privacy or security harms than an oblivious one,” Ringel said.

Leading LLMs Insecure, Highly Vulnerable to Basic Jailbreaks

Strategies for Better AI Security

GenAI Risks Go Unheeded

What do you think?

OSINT for Cryptocurrency Investigation (part 1: the basics)

French court orders Google/CloudFlare/Cisco public DNS to block pirate sites, affecting “up to” 800 users

iOS 18 built-in input method adds mathematical calculation prediction, you can get the result by entering a mathematical formula at any position

Microsoft has decided to temporarily suspend the rollout of Windows 11's retrospective feature, thanks to a report from a security researcher

SECCON Beginners CTF 2024 crypto writeup

Guarantor of privacy and conservation of emails and metadata 03

OSINT for Cryptocurrency Investigation (part 1: the basics)

French court orders Google/CloudFlare/Cisco public DNS to block pirate sites, affecting “up to” 800 users

iOS 18 built-in input method adds mathematical calculation prediction, you can get the result by entering a mathematical formula at any position

Microsoft has decided to temporarily suspend the rollout of Windows 11's retrospective feature, thanks to a report from a security researcher

SECCON Beginners CTF 2024 crypto writeup

Guarantor of privacy and conservation of emails and metadata 03

Leave a ReplyCancel reply

Cheats For Little Alchemy

3TB Of Mega.nz Links For Free Courses And E-Books 2022 (Updated)

How to Earn Money from FreeCash.com, Playing Games, Testing Apps, and Taking Surveys

The Carding Masterclass: A Complete Course Of Carding

AntiPublic Myrz 0.83 – Cracked

Amplemarket (YC W14) Seeks Senior Software Engineer in Lisbon, Portugal, Hacker News

Morocco-based cybercriminals cashing in on bold gift card scams, Microsoft says

More parameters of iPhone SE 4 to be launched in 2025 revealed, with 6.1-inch OLED display

Strategies for Better AI Security

GenAI Risks Go Unheeded

What do you think?

Leave a ReplyCancel reply

Log In

Sign In

Forgot password?

Your password reset link appears to be invalid or expired.

Log in

Privacy Policy

Add to Collection

No Collections