A Swedish proverb states that ‘He who buys what he does not need, steals from himself’. Nobody wants to believe they could fall into this trap, yet this is the mistake many organisations make as they operationalise Gen AI Proof of Concepts (PoCs) into enterprise-wide services. This blog highlights six strategies that will help optimise your spend on Gen AI projects.
One of the key findings from a Global Data report commissioned recently by Orange Business was that cloud costs start to ’skyrocket’ as PoCs are scaled to production-grade services. It’s not difficult to understand why. Most Gen projects start on public clouds where computational power, scalability, and access to advanced AI models are all readily available. However, as you scale the project, the amount of processing power required to service the needs of your AI model increases. As a result, your costs can start to climb very steeply.
At Orange Business, we know there are steps you can take – at every stage of your Gen AI journey – that can have a considerable impact on the project’s overall costs. This will increase value delivery for large companies and bring Gen AI services within reach of everyone else.
1. Apply Gen AI to the right use cases.
The GPUs required by Gen AI models are the most resource-intensive and, therefore, the most expensive resources in your data centre – and should be used sparingly. Not every use case requires this level of computational intensity or will deliver value through using it. Other, earlier technologies - traditional AI or business intelligence (data visualisation) – may deliver the same results at much lower costs. That’s why starting with the use case is essential.
To take an example from our own experience, Orange Business trialled Microsoft Co-Pilot across its entire business and eventually decided that there wasn’t sufficient value generation to justify implementing it company-wide. Instead, we deployed it for two use cases: first, for the functions that create and manipulate content, and second, for project managers who must provide a lot of coordination and meeting summaries. These were the use cases where we could demonstrate real impact and for which there was a readily calculable ROI.
2. On-premise versus cloud.
Generally speaking, edge deployment is recommended if there is a high requirement for real-time connectivity or if the basic infrastructure is either unavailable or unable to provide high-speed connectivity. A use case with a high requirement for resilience and/or security may also suggest edge hosting – this would ensure that the service is available even if there is no or limited external connectivity and that sensitive data isn’t transferred externally.
By contrast, if you have a co-pilot use case in a well-connected office environment where there is no need for instantaneous results, then hosting it on the cloud makes perfect sense. The Hyperscalers are pricing competitively for storage and compute power, and, if cloud deployment is appropriate for your use case, this may be more cost-effective than building and managing your infrastructure.
3. Using the right model for your use case.
Some Large Language Models (LLMs) are more efficient than others. Orange Business evaluated the use of two different generations of the same LLM for a particular use case. We found that the results with each were the same, but the older version of the LLM was 90% cheaper than the new one. However, there is no ‘rule of thumb’ you can apply here – despite being released after ChatGPT4, ChatGPT 4o-mini is the cheaper of the two models. (The same can be said for DeepSeek.)
Also, can you use a Small Language Model (SLM)? Some of these have been carefully optimised to have a lower impact through distillation – a process in which knowledge is transferred from a large, complex model to a smaller, more efficient model to bring down overall costs.
So, ask yourself if you need the most powerful LLM for your use case or if a different LLM or even an SLM will do the same job at a far lower cost.
4. Document Governance.
The compute power required by Gen AI varies directly with the volume of data being searched, so this is a significant cost driver. There are two ways of reducing the amount of data that is queried.
- Technology-based approach: Have the Gen AI model create a subset of the data by preselecting only the most relevant documents, then set the LLM to work on this reduced data set.
- Governance-based approach: Create rules that limit the number of documents included in a search: for example, if you have recently introduced a new pricing model, then any contract created under the old model should be excluded from pricing-related prompts. This not only provides more accurate information but also reduces associated costs.
5. Prompt Libraries.
If a user submits five or six prompts before alighting on one that provides the right answer, compute costs can increase substantially. You should consider pre-building prompts that are efficient and allow users to gain good results with the minimum number of attempts.
6. Optimise cloud spend with FinOps.
The indirect costs of your Gen AI service – those relating to the enhanced infrastructure needed to run that service effectively – are often hard to determine and much higher than expected. FinOps can help organisations optimise their cloud spending by ensuring transparency and collaboration between technical teams, finance, and business units. An obvious example would be a new Gen AI service hosted by a Hyperscaler which becomes wildly popular. In just such a scenario, when we first enabled our employees to use ChatGPT, we put a value limit on usage volumes to control costs.
Conclusion
There is no silver bullet that will suddenly ensure that your Gen AI project doesn’t break the bank. Instead, there are a series of small decisions that can add up to a significant reduction in your expenditure – and a concomitant increase in the value delivered by your Gen AI services.
To find out more about the six steps described above – and to learn the actionable insights that will help you reduce your cloud costs – download our White Paper, “A Hungry Mouth to Feed: Addressing the skyrocketing costs of Gen AI services”.

Frédéric Loras is the Head of Sales Enablement at Orange Business, where he leads an international team in defining customer value propositions and driving sales through value-based strategies. With over 12 years of management experience and a strong background in innovation and transformation, Frédéric has successfully collaborated with major industrial clients to adopt cutting-edge technologies such as 5G. He is passionate about fostering agile practices within his organization and has a proven track record in strategic project management.
Outside of work, Frédéric is an avid tennis player, enjoying matches on clay courts. He is also proud to have co-founded a startup focused on personal digital asset management, showcasing his entrepreneurial spirit and commitment to innovation.