API tokens exposed on Huggingface and GitHub a huge risk

December 5, 2023

Lasso Security exposed security vulnerabilities on HuggingFace and GitHub after finding 1681 exposed API tokens hardcoded into code stored on the platforms.

HuggingFace and GitHub are two of the most popular repositories where developers can provide access to their AI models and code. Think of these as folders in the cloud that are managed by the organizations that own them.

HuggingFace and GitHub make it easy for users to interact with hundreds of thousands of AI models and datasets via APIs. It also allows organizations that own the models and datasets to use the API access to read, create, modify, and delete repositories or files.

The permissions associated with your API token determine the level of access you have. Lasso found that with a bit of digging, they were able to find a lot of tokens in code stored in repositories on the platforms.

Of the 1681 valid tokens they found, 655 users’ tokens had write permissions, 77 of which had full account permissions.

Why is this a big deal?

Think of an API token like a key to your front door. It may be convenient to leave the key under your doormat, but if someone finds it then they gain access to your house.

When developers write a piece of code that needs to interact with their AI model or dataset they sometimes get a little lazy. They may hardcode the tokens into their code instead of using more secure ways to manage them.

An API token hardcoded into code. Source: Lasso Security

Some of the tokens Lasso found gave them full read and write permissions to Meta’s Llama 2, BigScience Workshop, and EleutherAI. These organizations all have AI models that have been downloaded millions of times.

If Lasso were the bad guys then they could have modified the models or datasets in the exposed repositories. Imagine if someone added some sneaky code into Meta’s repository and then had millions of people download it.

When Meta, Google, Microsoft, and others heard of the exposed API tokens they quickly revoked them.

Model theft, training data poisoning, and combining third-party datasets and pre-trained models are all big risks to AI companies. Developers casually leaving API tokens exposed in code only makes it easier for bad actors to exploit these.

You’ve got to wonder if Lasso’s engineers were the first to find these vulnerabilities.

If cybercriminals found these tokens they certainly would have kept it very quiet while they opened the front door.

Join The Future


SUBSCRIBE TODAY

Clear, concise, comprehensive. Get a grip on AI developments with DailyAI

Eugene van der Watt

Eugene comes from an electronic engineering background and loves all things tech. When he takes a break from consuming AI news you'll find him at the snooker table.

×

FREE PDF EXCLUSIVE
Stay Ahead with DailyAI

Sign up for our weekly newsletter and receive exclusive access to DailyAI's Latest eBook: 'Mastering AI Tools: Your 2024 Guide to Enhanced Productivity'.

*By subscribing to our newsletter you accept our Privacy Policy and our Terms and Conditions