Salesforce's new AI

PLUS: Unlocking ancient history with AI

TOGETHER WITH

Good morning, human brains. Welcome back to your daily munch of AI news.

Here’s what’s on the menu today:

  • AI-developed drug hits stage 2 clinical trials 💉 

    How AI is changing the way we create drugs.

  • Salesforce’s new LLM: XGen-7B 🤑

    The new model snags the top spot on the open-source leaderboard.

  • Ancient translations with NLP 📜 

    Researchers have trained AI to translate cuneiform clay tablets into modern-day English.

APPETIZER

AI-designed drug reaches high-volume human trials

While AI played a role in the COVID-19 vaccines, this is the first instance of AI-fueled drug development on more traditional clinical trial timelines.

What else is unique here?

Insilico used AI to identify an entirely new target protein instead of focusing on existing well-known protein targets. Instead of using AI to only design the drug, AI was able to pinpoint the best candidate target as well: code-named “Target X”.

Once the target protein was identified, Insilico prompted 500 different AIs with the protein structure. After a few iterations, the AIs were able to create a molecule that disrupts Target X.

Our take: medicine and data have been a great match for a long time. But now, it looks like the AI hammer can be applied to a whole host of diseases, as the primary approach. Just look at Insilico’s AI suite:

BUZZWORD OF THE DAY

Token

In the context of LLMs (Large Language Models), a token refers to a small unit of data, typically a word. For example, the sentence:

“The quick brown fox jumps over the lazy dog”

would be tokenized into the following array:

[“The”, “quick”, “brown”, “fox”, “jumps”, “over”, “the”, “lazy”, “dog.”]

Sequence

A sequence is composed of tokens, so when a paper refers to sequence length, it’s talking about how many tokens are in the sequence.

The prior sentence is an example of a sequence, but a novel composed of sentences and paragraphs could also be a sequence.

FREE DTC AI TRAINING

  • For DTC Brands

  • Decrease your CAC by 30% in 72 hours by saving roughly 12 hours a week on reporting

  • Learn the simple 3-step process with 3 videos, a calculator marketing strategy blueprint, and e-commerce metrics maps and relationships.

  • This is 100% FREE

  • These strategy sessions and roadmap regularly sell for $1500 - $2000.

MAIN COURSE

Salesforce: bigger is better 🤑

Their unique spin: 4x longer sequence length.

But I was told size doesn’t matter…

An MMLU benchmark comparison

The 7B in the name refers to 7 billion parameters in the model. And unlike previous models, which have been trained on maximum lengths of 2k token sequences, Salesforce bumped up their sequence length to 8k.

The model was trained on 1.5 trillion total tokens.

Decreasing perplexity means the model has an easier time predicting the next word. Note the sudden spikes once the model goes past the training sequence max.

The Salesforce team took the long-sequence approach after other research indicated that using more training data, not a greater number of parameters, produces better results given the same compute budget. It cost Salesforce $150k to train the model.

Another plus of smaller models: faster performance in mobile applications.

But something to keep in mind — open-source LLMs are still way behind their closed-source counterparts:

GPT-3 outperforms XGen-7B by over 10 percentage points. GPT-4? 50.

But we’re still rootin’ for ‘em!

Our take: after a bout of parameter-size mania, these results may lead other research teams to focus more on sequence length and dataset size. As Sam Altman predicted, there are indeed diminishing returns on maxing out the number of parameters. OpenAI maintains a commanding lead.

A LITTLE SOMETHING EXTRA

Ancient Mesopotamian to English, please

Here’s the big deal. Few people have the expertise needed to translate from Akkadian to English. And there are hundreds of thousands of untranslated tablets. Now, they’re fair game for AI.

The task is far from simple, even for human eyes. For example, the sign for the Sun god has more than 17 phonetic and 6 logographic values that can only be read accurately with enough context.

It’s all Greek— erm, Akkadian to me.

Our take: AI translations could unlock a treasure trove of lost history, even for languages less complicated than ancient Akkadian. Receipts, court documents, inventories galore!

MEMES FOR DESSERT

YOUR DAILY MUNCH

Think Pieces

What will AI look like in 2050? — a dive into the potential of AI in the future.

Embracing productivity with AI — What should we do with all the extra productivity AI will create?

Startup News

Midjourney introduces panning — a better way to do outpainting.

Research

AI: OP in medicine — on average, AI is doubling the pace of new medical advancements.

The AI trained to recognize waste for recycling — AI is rummaging through your trash!

Stable Diffusion XL technical report — the image generation model behind Dream Studio is getting powerful new upgrades.

Tools

Retention Science — AI for e-commerce.

Synthesia — create high-quality AI video and Text-to-Voice speech in minutes.

AskYourPDF — AI extracts the important information you need from PDFs, saving hours in research time.

TWEET OF THE DAY

As if extra fingers weren’t enough…

Tag us on Twitter @BotEatBrain for a chance to be featured here tomorrow.

RECOMMENDED READING

If you like Bot Eat Brain, you might like this newsletter too:

Plugged InWeekly EV newsletter and website that provides tools, information, and news to get smarter about Electric Vehicles.

AI ART-SHOW

“Just make sure you contain the specimen while I eat lunch…” @thedigiguru

Until next time 🤖😋🧠

What'd you think of today's newsletter?

Login or Subscribe to participate in polls.