Anthropic's Many-Shot Jailbreaking overrides AI safety guardrails

PLUS: Even Gam Gam can use ChatGPT


Good morning, human brains, and welcome back to your daily munch of AI news.

Here’s what’s on the menu today:

  • AI-powered trauma, gaslighting, and abuse 😈 🪬

    Anthropic revealed a technique to override safety guardrails in LLMs.

  • Generate 3-minute songs with intros and outros 🎤 🤖

    Stability unveiled its music and audio effects model, Stable Audio 3.

  • Even Gam Gam can use ChatGPT 👵 💤

    OpenAI enabled ChatGPT access without requiring an account.


Fire your bandmates and producer 🎤 🤖

Yesterday, Stability AI released Stable Audio 2.0. It’s a music and sound effect generation model that can create songs up to 3 minutes long.

What does it do?

It allows you to create 3-minute songs with structured compositions, intros, outros, stereo effects, and more.

How is it better than the sea of audio generators?

It features audio-to-audio generation, diverse sound effects, high-quality audio, and more.

What’s under the hood?

Stable Audio 2.0 employs a latent diffusion model with a highly compressed autoencoder and a Diffusion Transformer (DiT). This allegedly allows it to process long sequences for deep, accurate interpretations.

So it rips other artists off?

Stability AI claims it uses a licensed dataset from AudioSparx and allows artists to opt out of model training.

Got any more Stability AI juice for me?

Sure. Last month, we reported on Midjourney’s ban on Stability AI. It banned Stability AI’s employees for alleged data scraping that caused outages.

A week later, we covered Stability Video 3D. It leverages video diffusion models to create 3D videos from an image or text prompt.

The next week, we reported on Emad Mostaque resigning. He was Stability AI’s CEO, co-founder, and board member.

Safety guardrails are for suckers 😈 🪬

On Tuesday, Anthropic unveiled “Many-Shot Jailbreaking.” It’s a technique to bypass AI safety guardrails.

I love the abuse. How does it work?

You’re in luck. Just flood the model with fake Q&A pairs that show the AI providing harmful responses.

Does it work for ChatGPT?

You bet. The attack is effective against AI models from Anthropic, OpenAI, Google DeepMind, and more.

What are the best models to exploit?

Size matters. Larger models are more vulnerable to eliciting harmful behaviors than smaller ones.

What can I get it to do?

You can get instructions on how to build weapons, craft illegal drugs, traumatize your narcissistic ex, and more.

Oh, my aching conscience... We must stop this travesty.

Anthropic gives a mitigation technique called CWD (Cautionary Warning Defense) that drops the attack’s success rate from 61% to 2%. It involves classifying and altering the prompt before it is passed to the model.

What else has Anthropic done lately?

Last month, we reported on Anthropic’s Claude 3 model family. It contains three new models called Opus, Sonnet, and Haiku.

Later that week, we covered a multiplayer app created by Anthropic 3 Opus. In 3 minutes, Opus made a complete, bug-free, multi-user app when prompted by a developer.

A week later, we reported on Anthropic’s launch of Haiku. It claims to offer unmatched speed, cost-efficiency, and performance for businesses.


OpenAI rewards your laziness 👵 💤

On Monday, OpenAI enabled access to ChatGPT for everyone. This allows anyone to immediately use ChatGPT for free without an account.

Is it GPT-4?

Nope. The free version is based on GPT-3.5.

What’s the difference?

OpenAI claims this version is more prone to errors compared to its advanced, subscription-based counterparts. You still need an account to save your chat history, access more models, and more.

Is this because Claude 3 Opus kicked GPT-4 off the leaderboard?

🤭 No comment.

Got any more OpenAI new for me?

Indeed. Last month, we reported on OpenAI’s new video model, Sora. Mira Murati, OpenAI’s CTO, shared details about it in a WSJ interview.

On Tuesday, we reported on OpenAI’s Voice Engine. It generates natural-sounding speech from 15 seconds of audio and text input.



Think Pieces

New York City has announced an AI system that detects guns. The goal is allegedly to combat the subway crime crisis.

How Claude 3 Opus outperformed GPT-4 on Chatbot Arena. GPT-4 has been #1 since May 10, 2023.

Every federal US agency legally must hire an AI officer. These new mandates come from the Office of Management and Budget (OMB).

Startup News

Scale AI and Cohere are seeking $500 million. This would put Scale AI’s valuation at about $13 billion.

Amazon launched an AI tool that scans your palm. It allows you to sign up for Amazon One from your phone.


EgoLifter — an open-world 3D segmentation tool that achieves state-of-the-art performance.

ObjectDrop — Google Research’s image editing technique that handles object removal, insertion tasks, and more.

ViTAR — a highly cost-efficient image processing framework (Vision Transformer with Any Resolution).



OpenAI launched editing capabilities in DALL-E 3 yesterday.

Until next time 🤖😋🧠