Is Reflection 70B the most powerful open-source LLM or a scam?

September 9, 2024

  • Matt Shumer claims his Reflection 70B LLM is the most powerful open-source model
  • “Reflection-Tuning“ stops Reflection 70B hallucinating and delivers exceptional benchmark results
  • Some early users say the model is a scam and is simply a wrapper for existing models

HyperWrite founder and CEO Matt Shumer announced that his new model, Reflection 70B, uses a simple trick to solve LLM hallucinations and delivers impressive benchmark results that beat larger and even closed models like GPT-4o.

Shumer collaborated with synthetic data provider, Glaive, to create the new model which is based on Meta’s Llama 3.1-70B Instruct model.

In the launch announcement on Hugging Face, Shumer said. “Reflection Llama-3.1 70B is (currently) the world’s top open-source LLM, trained with a new technique called Reflection-Tuning that teaches a LLM to detect mistakes in its reasoning and correct course.”

If Shumer found a way to solve the issue of AI hallucinations then that would be incredible. The benchmarks he shared seem to indicate that Reflection 70B is way ahead of other models.

Reflection 70B benchmark results provided by Matt Shumer. Source: Hugging Face

The model’s name is a reference to its ability to self-correct during inference. Shumer doesn’t give too much away but explains that the model reflects on its initial answer to a prompt and only outputs it once satisfied that it is correct.

Shumer says that a 405B version of Reflection is in the works and will blow other models, including GPT-4o, away when it is unveiled next week.

Is Reflection 70B a scam?

Is this all too good to be true? Reflection 70B is available for download on Huging Face but early testers weren’t able to duplicate the impressive performance Shumer’s benchmarks showed.

The Reflection playground let’s you try the model out but says that due to high demand the demo is temporarily down. The “Count ‘r’s in strawberry” and “9.11 vs 9.9” prompt suggestions hint that the model gets these tricky prompts right. But some users claim Reflection has been tuned specifically to answer these prompts.

Reflection playground is down for now. Source: Reflection Playground

Some users questioned the impressive benchmarks. The GSM8K of over 99% looked especially suspect.

Some of the ground truth answers in the GSM8K dataset are actually wrong. In other words, the only way to score over 99% on the GSM8K was to provide the same incorrect answers to those problems.

After some testing, users say that Reflection is actually worse than Llama 3.1 and that it was actually just Llama 3 with LoRA tuning applied.

User testing shows Reflection 70B performs poorer than the models Shumer claims it beats. Source: X

In response to the negative feedback, Shumer posted an explanation on X saying, “Quick update — we re-uploaded the weights but there’s still an issue. We just started training over again to eliminate any possible issue. Should be done soon.”

Shumer explained that there was a glitch with the API and that they were working on it. In the meanwhile, he provided access to a secret, private API so that doubters could try Reflection out while they worked on the fix.

And that’s where the wheels seem to come off, as some careful prompting seems to show the API is really just a Claude 3.5 Sonnet wrapper.

Subsequent testing reportedly had the API returning outputs from Llama and GPT-4o. Shumer insists the original results are accurate and that they’re working on fixing the downloadable model.

Are the skeptics a little premature in calling Shumer a grifter? Maybe the release was just poorly handled and Reflection 70B really is a groundbreaking open-source model. Or maybe it’s another example of AI hype to raise venture capital from investors looking for the next big thing in AI.

We’ll have to wait a day or two to see how this plays out.

Join The Future


SUBSCRIBE TODAY

Clear, concise, comprehensive. Get a grip on AI developments with DailyAI

Eugene van der Watt

Eugene comes from an electronic engineering background and loves all things tech. When he takes a break from consuming AI news you'll find him at the snooker table.

×

FREE PDF EXCLUSIVE
Stay Ahead with DailyAI

Sign up for our weekly newsletter and receive exclusive access to DailyAI's Latest eBook: 'Mastering AI Tools: Your 2024 Guide to Enhanced Productivity'.

*By subscribing to our newsletter you accept our Privacy Policy and our Terms and Conditions