MosaicML has unveiled their new open-source AI models – MPT-30B Base, Instruct, and Chat.
Forming part of the MPT (MosaicML Pre Trained Transformer) series, these open-source models are considered the most sophisticated in their category, surpassing GPT-3 across most key metrics.
Mosaic trained their new models using NVIDIA’s latest H100 chips, released earlier this year.
MPT-30B is the first publicly known LLM trained on high-end NVIDIA H100 GPUs.
Since their introduction on May 5, 2023, Mosaic’s previous MPT-7B models (Base, Instruct, Chat, StoryWriter) were downloaded over 3.3 million times. MPT-30B has 30 billion parameters – far fewer than GPT-3’s 175 billion parameters or GPT-4’s alleged 1 trillion parameters.
But parameter count isn’t everything – far from it – as MPT-30B has been trained on longer sequences of up to 8,000 tokens, 4 times more than GPT-3, the LLaMA family of models, and the Falcon model.
This enables MPT-30B to better manage data-heavy enterprise workflows and out-point other models for code-heavy workflows.
Several businesses, such as Replit, a leading web-based IDE, and Scatter Lab, an AI startup, have already leveraged MPT’s open-source models, which are more customizable than proprietary models like GPT-3.
Ilan Twig, Co-Founder and CTO at Navan said. “At Navan, we use generative AI across our products and services, powering experiences such as our virtual travel agent and our conversational business intelligence agent. MosaicML’s foundation models offer state-of-the-art language capabilities while being extremely efficient to fine-tune and serve inference at scale.”
MPT-30B, now available through the HuggingFace Hub, is fully open-source, and developers can fine-tune it with their data.
Mosaic seeks to enable businesses to integrate powerful open-source models into their workflow while retaining data sovereignty.
The open-source edge
Open-source models are rapidly bridging the gap with competitors like OpenAI.
As the computing resources required to train and deploy models drops, open-source developers no longer require multi-million dollar supercomputers with hundreds of high-end processors to train their models.
The same goes for deploying models – MPT-30B can run on a single GPU, and the open-source community even managed to run a slimmed-down version of the LLaMa model on a Raspberry PI.
I’ve sucefully runned LLaMA 7B model on my 4GB RAM Raspberry Pi 4. It’s super slow about 10sec/token. But it looks we can run powerful cognitive pipelines on a cheap hardware. pic.twitter.com/XDbvM2U5GY
— Artem Andreenko 🇺🇦 (@miolini) March 12, 2023
Additionally, open-source models confer strategic advantages to business users.
For example, businesses in industries such as healthcare and banking may prefer not to share their data with OpenAI or Google.
Naveen Rao, the co-founder and CEO of MosaicML, identifies open-source projects as allies, stating that they are “closing the gap to these closed-source models.” While he acknowledges the superiority of OpenAI’s GPT-4, he argues open-source models have “crossed the threshold where these models are actually extremely useful.”
Open-source AI is evolving rapidly, leading some to accuse big tech of pushing for regulation to curb its growth. Enterprises are already building their own open-source AI stacks, saving money that might otherwise fund companies like OpenAI.
A leaked memo from a Google employee said that all big tech AI developers – Google, Microsoft, Anthropic, and OpenAI – are competing with open-source developers.
Open-source AI developers can build and iterate models faster than big tech, enabling them to outmaneuver mainstream AI models.
This isn’t without its dangers, as open-source models are challenging to monitor and regulate once passed into public hands.