Protein programmers get support from Cradle's Generative AI

Protein programmers get support from Cradle’s Generative AI

Proteins are the molecules that do work in nature, and an entire industry is springing up dedicated to their successful modification and manufacture for various purposes. However, this is time consuming and haphazard; Cradle aims to change that with an AI-powered tool that tells scientists what new structures and sequences will make a protein do what they want. The company emerged from the cloak today with a sizeable seed round.

AI and proteins have been in the news lately, but mostly because of the efforts of research institutions like DeepMind and Baker Lab. Their machine learning models take easy-to-gather RNA sequence data and predict the structure a protein will take on — a step that used to take weeks and expensive specialized equipment.

But as incredible as this ability is in some areas, for others it is just the starting point. Modifying a protein to be more stable or to bind to a specific other molecule requires much more than just understanding its general shape and size.

“If you’re a protein engineer and you want to build a specific property or function into a protein, just knowing what it looks like won’t help you. It’s like having a picture of a bridge that doesn’t tell you if it’s going to collapse or not,” said Stef van Grieken, CEO and co-founder of Cradle.

“Alphafold takes a sequence and predicts what the protein will look like,” he continued. “We’re the generative brother of that: you pick the traits you want to evolve, and the model generates sequences that you can test in your lab.”

Predicting what proteins – especially those new to science – will do on site is a difficult task for many reasons, but when it comes to machine learning, the biggest problem is that there is not enough data available. So Cradle created much of his own dataset in a wet lab, testing protein after protein and seeing what changes in their sequences seemed to produce what effects.

Interestingly, the model itself is not exactly biotech-specific, but a derivative of the same “grand language models” that have spawned text-production machines like GPT-3. Van Grieken pointed out that these models are not solely limited to the language of how they understand and predict data, an interesting “generalization” feature that researchers are still exploring.

Samples of the Cradle UI in action.

The protein sequences that Cradle picks up and predicts are of course not available in any language we know, but they are relatively straightforward linear text sequences with associated meanings. “It’s like a foreign programming language,” said van Grieken.

Protein engineers are not helpless, of course, but their work inevitably involves a lot of guesswork. One can know with certainty that among the 100 sequences they modify is the combination that will be produced

The model works in three basic layers, he explained. First, an assessment is made as to whether a particular sequence is “natural”, ie whether it is a meaningful sequence of amino acids or just random. This is similar to a language model that can simply say with 99% certainty that a sentence is in English (or Swedish in van Grieken’s example) and the words are in the right order. It knows this from “reading” millions of such sequences that have been determined by laboratory analyses.

Next, the actual or potential meaning in the foreign language of the protein is examined. “Imagine we give you a sequence and that’s the temperature at which that sequence falls apart,” he said. “If you do that for a lot of sequences, you can’t just say, ‘That looks natural,’ you can say, ‘That looks like 26 degrees Celsius.’ this helps the model figure out which regions of the protein to focus on.”

The model can then suggest sequences to insert—essentially educated guesses, but a stronger starting point than a scratch. And the engineer or lab can then try them and bring that data back to the cradle platform where it can be re-sampled and used to fine-tune the model for the situation.

The Cradle team at their headquarters on a nice day (van Grieken in the middle).

Modification of proteins for various purposes is useful in biotechnology, from drug design to biomanufacturing, and the journey from the vanilla molecule to the tailored, effective, and efficient molecule can be long and expensive. Any opportunity to shorten it will probably be welcomed at least by the lab technicians who have to run hundreds of experiments just to get a good result.

Cradle has been working undercover and is now emerging after raising $5.5 million in a seed round co-led by Index Ventures and Kindred Capital and starring angels John Zimmer, Feike Sijbesma and Emily Leproust participated.

Van Grieken said the funding would allow the team to expand data collection – the more the merrier when it comes to machine learning – and work on the product to make it more “self-serve”.

“Our goal is to reduce the cost and time to bring a bio-based product to market by an order of magnitude,” van Grieken said in the press release, “so that anyone — even ‘two kids in their garage’ — can bring a bio-based product.” launch.”

#Protein #programmers #support #Cradles #Generative

Leave a Comment

Your email address will not be published. Required fields are marked *