NVIDIA’s custom chatbot runs locally on RTX AI PCs

February 14, 2024

NVIDIA has released Chat with RTX as a tech demo of how AI chatbots can be run locally on Windows PCs using its RTX GPUs.

The standard approach of using an AI chatbot is to use a web platform like ChatGPT or to run queries via an API, with inference taking place on cloud computing servers. The drawbacks of this are the costs, latency, and privacy concerns with personal or corporate data transferring back and forth.

NVIDIA’s RTX range of GPUs is now making it possible to run an LLM locally on your Windows PC even if you’re not connected to the internet.

Chat with RTX lets users create a personalized chatbot using either Mistral or Llama 2. It uses retrieval-augmented generation (RAG) and NVIDIA’s inference optimizing TensorRT-LLM software.

You can direct Chat with RTX to a folder on your PC and then ask it questions related to the files in the folder. It supports various file formats, including .txt, .pdf, .doc/.docx and .xml.

Because the LLM is analyzing locally stored files with inference happening on your machine, it is really fast and none of your data is shared on potentially unsecured networks.

You could also prompt it with a YouTube video URL and ask it questions about the video. That requires internet access but it’s a great way to get answers without having to watch a long video.

You can download Chat with RTX for free but you’ll need to be running Windows 10 or 11 on your PC with a GeForce RTX 30 Series GPU or higher, with a minimum 8GB of VRAM.

Chat with RTX is a demo, rather than a finished product. It’s a little buggy and doesn’t remember context so you can’t ask it follow up questions. But it’s a nice example of the way we’ll use LLMs in the future.

Using an AI chatbot locally with zero API call costs and very little latency is likely the way most users will eventually interact with LLMs. The open-source approach that companies like Meta have taken will see on-device AI drive the adoption of their free models rather than proprietary ones like GPT.

That being said, mobile and laptop users will have to wait a while yet before the computing power of an RTX GPU can fit into smaller devices.

Join The Future


Clear, concise, comprehensive. Get a grip on AI developments with DailyAI

Eugene van der Watt

Eugene comes from an electronic engineering background and loves all things tech. When he takes a break from consuming AI news you'll find him at the snooker table.


Stay Ahead with DailyAI


Sign up for our weekly newsletter and receive exclusive access to DailyAI's Latest eBook: 'Mastering AI Tools: Your 2024 Guide to Enhanced Productivity'.


*By subscribing to our newsletter you accept our Privacy Policy and our Terms and Conditions