Bot Eat Brain
Posts
Anthropic's Many-Shot Jailbreaking overrides AI safety guardrails

Anthropic's Many-Shot Jailbreaking overrides AI safety guardrails

PLUS: Even Gam Gam can use ChatGPT

Michael Parrish
April 04, 2024

TOGETHER WITH

Good morning, human brains, and welcome back to your daily munch of AI news.

Here’s what’s on the menu today:

AI-powered trauma, gaslighting, and abuse 😈 🪬
Anthropic revealed a technique to override safety guardrails in LLMs.
Generate 3-minute songs with intros and outros 🎤 🤖
Stability unveiled its music and audio effects model, Stable Audio 3.
Even Gam Gam can use ChatGPT 👵 💤
OpenAI enabled ChatGPT access without requiring an account.

Sponsor Bot Eat Brain | New here? Subscribe!

MAIN COURSE

Fire your bandmates and producer 🎤 🤖

Yesterday, Stability AI released Stable Audio 2.0. It’s a music and sound effect generation model that can create songs up to 3 minutes long.

Source: Stability AI

What does it do?

It allows you to create 3-minute songs with structured compositions, intros, outros, stereo effects, and more.

How is it better than the sea of audio generators?

It features audio-to-audio generation, diverse sound effects, high-quality audio, and more.

What’s under the hood?

Stable Audio 2.0 employs a latent diffusion model with a highly compressed autoencoder and a Diffusion Transformer (DiT). This allegedly allows it to process long sequences for deep, accurate interpretations.

So it rips other artists off?

Stability AI claims it uses a licensed dataset from AudioSparx and allows artists to opt out of model training.

Got any more Stability AI juice for me?

Sure. Last month, we reported on Midjourney’s ban on Stability AI. It banned Stability AI’s employees for alleged data scraping that caused outages.

A week later, we covered Stability Video 3D. It leverages video diffusion models to create 3D videos from an image or text prompt.

The next week, we reported on Emad Mostaque resigning. He was Stability AI’s CEO, co-founder, and board member.

Unlock the secrets of your ideal customer.

Tired of shooting in the dark? 🎯

There’s a better way. Eliminate the guesswork and streamline your lead generation with M1-project.

😩 Sick of struggling to find clients?

Eradicate the chaos of hit-and-miss strategies with laser-focused AI from M1-project.

Ideal customer profile generator - AI shows WHO your customers are and WHERE to find them.

Discover your customer’s goals, problems, pains, etc with AI. Also get 20+ places, where you can find leads: Social media groups, Websites, Newsletters.

SIDE SALAD

Safety guardrails are for suckers 😈 🪬

On Tuesday, Anthropic unveiled “Many-Shot Jailbreaking.” It’s a technique to bypass AI safety guardrails.

Source: Anthropic

I love the abuse. How does it work?

You’re in luck. Just flood the model with fake Q&A pairs that show the AI providing harmful responses.

Does it work for ChatGPT?

You bet. The attack is effective against AI models from Anthropic, OpenAI, Google DeepMind, and more.

What are the best models to exploit?

Size matters. Larger models are more vulnerable to eliciting harmful behaviors than smaller ones.

What can I get it to do?

You can get instructions on how to build weapons, craft illegal drugs, traumatize your narcissistic ex, and more.

Source: Anthropic

Oh, my aching conscience... We must stop this travesty.

Anthropic gives a mitigation technique called CWD (Cautionary Warning Defense) that drops the attack’s success rate from 61% to 2%. It involves classifying and altering the prompt before it is passed to the model.

Source: Anthropic

What else has Anthropic done lately?

Last month, we reported on Anthropic’s Claude 3 model family. It contains three new models called Opus, Sonnet, and Haiku.

Later that week, we covered a multiplayer app created by Anthropic 3 Opus. In 3 minutes, Opus made a complete, bug-free, multi-user app when prompted by a developer.

A week later, we reported on Anthropic’s launch of Haiku. It claims to offer unmatched speed, cost-efficiency, and performance for businesses.

A LITTLE SOMETHING EXTRA

OpenAI rewards your laziness 👵 💤

On Monday, OpenAI enabled access to ChatGPT for everyone. This allows anyone to immediately use ChatGPT for free without an account.

Source: OpenAI

Is it GPT-4?

Nope. The free version is based on GPT-3.5.

What’s the difference?

OpenAI claims this version is more prone to errors compared to its advanced, subscription-based counterparts. You still need an account to save your chat history, access more models, and more.

Is this because Claude 3 Opus kicked GPT-4 off the leaderboard?

🤭 No comment.

Got any more OpenAI new for me?

Indeed. Last month, we reported on OpenAI’s new video model, Sora. Mira Murati, OpenAI’s CTO, shared details about it in a WSJ interview.

On Tuesday, we reported on OpenAI’s Voice Engine. It generates natural-sounding speech from 15 seconds of audio and text input.

YOUR DAILY MUNCH

GrowthSchool

ChatGPT & AI Masterclass: Learn how to research faster, automate tasks & simplify your work & life using AI & ChatGPT for FREE

Think Pieces

New York City has announced an AI system that detects guns. The goal is allegedly to combat the subway crime crisis.

How Claude 3 Opus outperformed GPT-4 on Chatbot Arena. GPT-4 has been #1 since May 10, 2023.

Every federal US agency legally must hire an AI officer. These new mandates come from the Office of Management and Budget (OMB).

Startup News

Scale AI and Cohere are seeking $500 million. This would put Scale AI’s valuation at about $13 billion.

Intellifusion, a Chinese AI chipmaker, launches an innovative processor. It’s 90% more cost-effective than GPUs.

Amazon launched an AI tool that scans your palm. It allows you to sign up for Amazon One from your phone.

Research

EgoLifter — an open-world 3D segmentation tool that achieves state-of-the-art performance.

ObjectDrop — Google Research’s image editing technique that handles object removal, insertion tasks, and more.

ViTAR — a highly cost-efficient image processing framework (Vision Transformer with Any Resolution).

MEMES FOR DESSERT

TWEET OF THE DAY

OpenAI launched editing capabilities in DALL-E 3 yesterday.

Source: @OpenAI

Tag us on Twitter @BotEatBrain for a chance to be featured here tomorrow.

AI ART-SHOW

“Spring Rain” by @sharonmawbry

Until next time 🤖😋🧠

Anthropic's Many-Shot Jailbreaking overrides AI safety guardrails

PLUS: Even Gam Gam can use ChatGPT

TOGETHER WITH

MAIN COURSE

Fire your bandmates and producer 🎤 🤖

SPONSORED BY M1-PROJECT

Unlock the secrets of your ideal customer.

SIDE SALAD

Safety guardrails are for suckers 😈 🪬

A LITTLE SOMETHING EXTRA

OpenAI rewards your laziness 👵 💤

YOUR DAILY MUNCH

MEMES FOR DESSERT

TWEET OF THE DAY

AI ART-SHOW