
Conventional belief holds that one needs a vast amount of Nvidia GPUs, costing around $50,000 each, to operate state-of-the-art AI models. However, a recent claim by EXO Labs suggests otherwise.
They have managed to get the Llama 2 LLM working on a Pentium II processor from 1997, which they acquired for less than $120 on eBay. The only downside? It’s approximately 20,000 times slower than a modern GPU.
Exo Labs faced significant challenges in configuring the ancient machine for modern applications, including compatibility issues with its outdated USB ports and the need to compile necessary files to suit the old processor’s architecture.
Once everything was sorted, the 260K parameter version of Llama 2 achieved 39.31 tokens per second on this nostalgic setup, while the more complex 15M parameter version only managed 1.03 tokens per second. Even attempts to run a one billion parameter version yielded a crawling speed of 0.0093 tokens per second.
To summarize: Although it’s commendable to get a modern language model functioning on such a legacy system, the performance disparity highlights the critical importance of speed in technology.