PLUS: Instantly generate audio for videos


Good morning, human brains, and welcome back to your daily munch of AI news.

Here’s what’s on the menu today:

  • AI Video is getting dope 🤖 🪄

    Luma unveiled new features for Dream Machine.

  • Instantly generate audio for videos 🔊 🎛️

    Google DeepMind unveiled V2A (video-to-audio).

Peep today's ‘What Would You Do?’ at the bottom. 👇


New AI video generation features 🤖 🪄

Yesterday, Luma AI unveiled new features for Dream Machine. It announced an Extend feature and new editing capabilities for its video generation model.

What’s up with Extend?

It allows you to increase the length of your video in style that matches your original content.

“Extend is an advanced system that is aware of what's happening in your video and extends it in a consistent way to follow instruction.“

— Luma AI

Cool. Anything else?

Luma announced upcoming video editing capabilities. You can change the background, objects, foregrounds, characters, and more in generated videos. They are “coming soon.”


You can also remove the video watermarks if you’re in the Standard, Pro, or Premium tiers.

Automatic sound effects 🔊 🎛️

Yesterday, Google DeepMind unveiled V2A (video-to-audio). It analyzes videos and automatically generates music and sound effects for them.

What does it do?

V2A generates music, voices, sound effects, and more that “matches the characters and tone of a video.”

It doesn’t use text prompts?

You can tell it what to generate via text, but V2A is context-aware, so this step is optional.

How does it work?

First, V2A encodes the video. Then it uses a diffusion model to generate audio that matches the video/text prompts. Then it combines the audio and video data.


Think Pieces 🧠

Is AI just repeating words? Jonathan Marcus of Anthropic says AI models discover surprising semantic connections between concepts.

Startup News 💰

Meta paused AI training with EU and UK user data. This is due to GDPR concerns from the Irish Data Protection Commission (DPC).

NVIDIA unveiled Nemotron-4 340B. It’s a family of open models that generates synthetic data to train large language models (LLMs).

Research 👨‍🔬

OmniCorpus — a multimodal dataset of 10 billion images with text that enhances LLMs with diverse, high-quality image-text data.

HelpSteer2 — an open-source dataset designed to help align LLMs with human preferences by providing high-quality preference data.


On Friday, I spoke with Michael Sim, Principal Founder & Head of Design of MageTCG.

He’s developing a philanthropic trading card game. The goal is to support mental health causes, military veterans, and more.

A team of independent AI artists collaborated to create the artwork. They aim to raise awareness for AI art, celebrate artists, build a community, and more.


