Many AI companies like OpenAI, Google, and Meta have been scaling their models upwards over the years. To operate these large language models with billions of parameters and make them available to all users simultaneously, enormous computing power is required. But what if we didn’t have this power and were still using old technology?
This question was also posed by the programmers at EXO Labs. They decided to try it out by getting an old Windows 98 computer. According to a blog post, the hardware is over 25 years old. The team purchased it for around 119 British pounds on eBay. Inside the Windows 98 machine is an Intel Pentium II and 128 megabytes of RAM.
Compared to today’s home computers, this is a minimal performance. When compared to AI data centers, whose power consumption is continually increasing, the project seems almost impossible. However, with some tweaks, a small version of Llama 2 with a total of 260,000 parameters can indeed be operated on the computer.
The adjustments include connecting old PS/2 hardware because the available USB ports did not work. Then, Llama 2 was transferred to the PC via FTP. This was only a workaround since discs were not recognized by the PC and the existing four-terabyte hard drive was too large for the FAT32 file system under Windows 98.
The result: With the old Windows 98, around 40 tokens per second can be generated using the 260K model. The programmers also pushed the old hardware with a language model with 15 million parameters. Here, the hardware struggled and generated only one token per second. Based on a benchmark, the programmers calculated how long the Windows 98 PC would last with a Llama 3.2 model and thus one billion parameters. The result is 0.0093 tokens per second, which is unusable for serious AI use.
Despite the limitations, this experiment shows that it is possible to run AI models on older hardware, albeit with significant performance constraints. The project highlights the rapid advancements in technology and how far computing power has come over the years. It also brings attention to the growing energy demands of modern AI systems and the potential need for more sustainable solutions.
Running AI models on local machines, even older ones, can be a fascinating exploration of technology’s potential and limitations. It serves as a reminder of how adaptable and innovative programmers can be when faced with challenges, finding ways to make technology work, even in less than ideal conditions.
As AI continues to evolve, it is essential to consider the balance between technological advancement and resource consumption. Projects like this can inspire new approaches to AI deployment, focusing on efficiency and sustainability. While the performance on a Windows 98 machine is far from practical for current AI applications, it opens the door to discussions about the accessibility and environmental impact of AI technologies.
In conclusion, while the experiment with Windows 98 and AI models may not lead to practical applications, it serves as an intriguing case study in the history and future of computing. It invites us to think critically about how we use technology and the resources required to power the innovations of tomorrow.