Inference: The future of AI in the cloud


Now that it’s 2024, we can’t overlook the profound impact that Artificial Intelligence (AI) is having on our operations across businesses and market sectors. Government research has found that one in six UK organizations has embraced at least one AI technology within its workflows, and that number is expected to grow through to 2040.
With increasing AI and Generative AI (GenAI) adoption, the future of how we interact with the web hinges on our ability to harness the power of inference. Inference happens when a trained AI model uses real-time data to predict or complete a task, testing its ability to apply the knowledge gained during training. It’s the AI model’s moment of truth to show how well it can apply information from what it has learned. Whether you work in healthcare, ecommerce or technology, the ability to tap into AI insights and achieve true personalization will be crucial to customer engagement and future business success.
Inference: the Key to true personalisation
The key to personalisation lies in the strategic deployment of inference by scaling out inference clusters closer to the geographical location of the end user. This approach ensures that AI-driven predictions for inbound user requests are accurate and delivered with minimal delays and low latency. Businesses must embrace GenAI’s potential to unlock the ability to provide tailored and personalised user experiences.
Businesses that haven’t anticipated the importance of the inference cloud will get left behind in 2024. It is fair to say that 2023 was the year of AI experimentation, but the inference cloud will enable the realisation of actual outcomes with GenAI in 2024. Enterprises can unlock innovation in open-source Large Language Models (LLMs) and make true personalisation a reality with cloud inference.
Chief Marketing Officer at Vultr.
A new web app
Before the entrance of GenAI, the focus was on providing pre-existing content without personalization close to the end user. Now, as more companies undergo the GenAI transformation, we’ll see the emergence of inference at the edge – where compact LLMs can create personalized content according to users’ prompts.
Some businesses still lack a strong edge strategy – much less a GenAI edge strategy. They need to understand the importance of training centrally, inferring locally, and deploying globally. In this case, serving inference at the edge requires organizations to have a distributed Graphics Processing Unit (GPU) stack to train and fine-tune models against localized datasets.
Once these datasets are fine-tuned, the models are then deployed globally across data centers to comply with local data sovereignty and privacy regulations. Companies can provide a better, more personalized customer experience by integrating inference into their web applications by using this process.
GenAI requires GPU processing power, but GPUs are often out of reach for most companies due to high costs. When deploying GenAI, businesses should look to smaller, open-source LLMs rather than large hyperscale data centers to ensure flexibility, accuracy and cost efficiency. Companies can avoid complex and unnecessary services, a take-it-or-leave-it approach that limits customization, and vendor lock-in that makes it difficult to migrate workloads to other environments.
GenAI in 2024: Where we are and where we’re heading
The industry can expect a shift in the web application landscape by the end of 2024 with the emergence of the first applications powered by GenAI models.
Training AI models centrally allows for comprehensive learning from vast datasets. Centralized training ensures that models are well-equipped to understand complex patterns and nuances, providing a solid foundation for accurate predictions. Its true potential will be seen when these models are deployed globally, allowing businesses to tap into a diverse range of markets and user behaviors.
The crux lies in the local inference component. Inferring locally involves bringing the processing power closer to the end-user, a critical step in minimizing latency and optimising the user experience. As we witness the rise of edge computing, local inference aligns seamlessly with distributing computational tasks closer to where they are needed, ensuring real-time responses and improving efficiency.
This approach has significant implications for various industries, from e-commerce to healthcare. Consider if an e-commerce platform leveraged GenAI for personalized product recommendations. By inferring locally, the platform analyses user preferences in real-time, delivering tailored suggestions that resonate with their immediate needs. The same concept applies to healthcare applications, where local inference enhances diagnostic accuracy by providing rapid and precise insights into patient data.
This move towards local inference also addresses data privacy and compliance concerns. By processing data closer to the source, businesses can adhere to regulatory requirements while ensuring sensitive information remains within the geographical boundaries set out by data protection laws.
The Age of Inference has arrived
The journey towards the future of AI-driven web applications is marked by three strategies – central training, global deployment, and local inference. This approach not only enhances AI model capabilities but is vendor-agonistic, regardless of cloud computing platform or AI service provider. As we enter a new era of the digital age, businesses must recognize the pivotal role of inference in shaping the future of AI-driven web applications. While there’s a tendency to focus on training and deployment, bringing inference closer to the end-user is just as important. Their collective impact will offer unprecedented opportunities for innovation and personalization across diverse industries.
We’ve listed the best productivity tool.
This article was produced as part of TechRadarPro’s Expert Insights channel where we feature the best and brightest minds in the technology industry today. The views expressed here are those of the author and are not necessarily those of TechRadarPro or Future plc. If you are interested in contributing find out more here: https://www.techradar.com/news/submit-your-story-to-techradar-pro
Now that it’s 2024, we can’t overlook the profound impact that Artificial Intelligence (AI) is having on our operations across businesses and market sectors. Government research has found that one in six UK organizations has embraced at least one AI technology within its workflows, and that number is expected to…
Recent Posts
- OpenSSH vulnerabilities could pose huge threat to businesses everywhere
- Magic: The Gathering’s Final Fantasy sets will tell the stories of the games
- All of Chipolo’s Bluetooth trackers are discounted in sitewide sale
- Fortnite: Lawless gets first trailer highlighting the new season’s battle pass roster and the chaos of Crime City
- Chase will start blocking Zelle payments over social media
Archives
- February 2025
- January 2025
- December 2024
- November 2024
- October 2024
- September 2024
- August 2024
- July 2024
- June 2024
- May 2024
- April 2024
- March 2024
- February 2024
- January 2024
- December 2023
- November 2023
- October 2023
- September 2023
- August 2023
- July 2023
- June 2023
- May 2023
- April 2023
- March 2023
- February 2023
- January 2023
- December 2022
- November 2022
- October 2022
- September 2022
- August 2022
- July 2022
- June 2022
- May 2022
- April 2022
- March 2022
- February 2022
- January 2022
- December 2021
- November 2021
- October 2021
- September 2021
- August 2021
- July 2021
- June 2021
- May 2021
- April 2021
- March 2021
- February 2021
- January 2021
- December 2020
- November 2020
- October 2020
- September 2020
- August 2020
- July 2020
- June 2020
- May 2020
- April 2020
- March 2020
- February 2020
- January 2020
- December 2019
- November 2019
- September 2018
- October 2017
- December 2011
- August 2010