Powering AI responsibly: how smaller models can lead the way

AI’s potential to reshape industries is no longer theoretical, it’s unfolding in real time. Yet in South Africa, where energy is both finite and fragile, the rise of large language models (LLMs) presents a challenge.

Unlike jurisdictions such as the United States or European Union, South Africa does not yet have legislation specifically targeting data centre energy consumption. However, the National Data and Cloud Policy (2024) signals a shift in direction, proposing that data centre operators adopt independent backup power and cooling systems to improve operational reliability and resource efficiency. While these proposals are not yet mandated, they reflect a growing policy focus on boosting the efficiency and resilience of IT infrastructure.

For programmers, software developers, and business leaders, the message is clear: innovation must be reimagined for a resource-constrained world. This means embracing alternative approaches that prioritise IT, energy and general resource efficiency as a strategic imperative.

Fortunately, innovation in AI is proving that models don’t need to endlessly scale to be effective. While ballooning LLMs have been the rockstars dazzling global audiences, AI teams are turning to Small Language Models (SLMs) as small yet mighty alternatives that can be better suited for specific enterprise use cases.

SLMs typically have fewer than 10 billion parameters; tiny compared to LLMs, which can have hundreds of billions or even trillions. This makes SLMs a more sensible alternative when computing power is limited or when speed and low latency are critical. Their compact design allows them to run efficiently on less powerful hardware, using only a fraction of the energy larger models require. They’re also easy to deploy locally, which reduces the need for heavy infrastructure. This is especially important in areas of the country with limited digital resources. By keeping resource requirements modest, SLMs make AI more accessible to a wider range of users, enabling more inclusive innovation in key sectors.

Newer generations of GenAI take this even further by automatically switching to smaller models for simple queries and reserving larger models for complex tasks. In this way, various models can provide accurate, yet efficient, answers to more queries.

The argument for SLMs

Some organisations had dismissed SLMs, assuming they were unable to deliver enterprise-grade performance, would be hard to scale, and would have reduced general knowledge due to narrower training data. Many worried that these models would turn up short when it came to complex reasoning, multilingual capabilities, and effectively handling nuance and ambiguity. However, many of the misperceptions about SLMs have been disproven over the last eighteen months.

Underpinning this pivot toward SLMs is a growing recognition that data quality – not quantity – is the key to model performance. LLMs are often burdened by vast pools of raw, unfiltered data, much of which can be duplicate or entirely irrelevant. SLMs instead rely on so-called ‘data efficiency’ principles, where datasets are meticulously curated with precision and relevance in mind.

In fact, the targeted approach to purpose-built SLMs trained on curated, high-quality datasets has improved accuracy for domain-specific tasks while reducing the inefficiencies burdening larger model behemoths. Smaller models stand out for their ability to be fine-tuned more quickly and updated more often, making them flexible and ideal for dynamic settings. Because SLMs are typically easier to deploy and manage, they are often appealing to enterprises. They are also preferred in edge environments, such as in smart factory settings, where energy efficiency is essential.

Pilot programmes to integrate SLMs have already launched in legal, medical, and financial services, where these models deliver faster inference times, lower latency, and smaller hardware footprints.

Higher education is a key example of an industry in which SLMs can have a transformative impact in South Africa. When trained on subject- or task-specific datasets and equipped with customised algorithms, these models can support teaching, learning, and research activities. And in agriculture, developers can start building applications using globally trained models and adapt them with local context, such as crop varieties or regional climate patterns. By tapping into foundational models, local teams can focus their resources on adding the contextual intelligence that delivers real, distinctive value.

Open-weight models offer another alternative

Along with the rise of SLMs, open-weight models are also revealing themselves to be a viable energy efficient alternative to LLMs. Those models use a Mixture of Experts (MoE) architectures that are attractive to energy-conscious AI developers because they only activate a small subset of their parameters during inference, drastically reducing the compute and energy required for each task. A model with one hundred billion parameters might only need to use five billion at a time, for example. And, because open-weight models can be deployed locally and fine-tuned for specific use cases, they also relieve the energy burden on cloud infrastructure, making them well-suited for edge environments and enterprise applications where energy efficiency is important. The learnings of the last two years have demonstrated that SLMs can be just as effective, cost less, and do well at tasks that don’t demand the extensive knowledge base of an LLM.

As the next phase of AI development unfolds, the focus must shift to efficiency. South Africa’s evolving regulatory landscape, including the National AI Policy and a potential AI Act, will play a critical role in fostering innovation while guiding the responsible development of SLMs. These frameworks can help ensure that models are not only impactful but also aligned with societal needs.

For organisations leveraging AI, it will be necessary to embed efficiency as a foundational operating principle and consider the implications at every stage of the AI lifecycle – from data curation to model architecture design to deployment. By applying holistic, lifecycle thinking to IT challenges, developers will be able to refine their techniques and spark renewed creativity in the next wave of AI’s evolution.