5 reasons why 2024 will be the year of the self-hosted LLM
If 2023 has been the year of OpenAI and the API based LLM, I’m making the prediction that 2024 will be the year of the self hosted LLM.
*Caveat: We work with a number of large and mid-sized enterprises on their LLM strategy, so we have a pretty good understanding where these businesses are in terms of their adoption and what they are thinking about, so this prediction is focused on adoption of AI within enterprises and might not apply perfectly for individuals and smaller companies.
So here are the top 5 reasons why 2024 will be the year of the self-hosted LLM…
Number 1: An open source model will be released which is as good as GPT-4
One of the biggest things holding back businesses from diving head first into the self-hosted ecosystem is that the API models are simply better than the open source ones. Meaning it is easier to get to working prototypes. However, once we get to an open source model which is as good as GPT-4 we will see a massive move to self-hosted.
Some think that this won’t happen because by this time we will have GPT-5 which will be even better, and yet, there is a point of marginal returns when it comes to model quality. Businesses just need models which are ‘good enough’ to solve their use cases, not AGI. And GPT-4 quality with GPT-4 context-length is this stage.
When we will see this model? The OS community is close, with excellent releases from Meta (Llama) and TII (Falcon). I predict the next iterations of these models will reach that threshold — especially if they are either instruct tuned or RLHF’d. My money is on H1' 24.
Number 2: Enterprises are moving from POC to scaling
What API based models are really good at is quickly creating demos. They’re great ways to understand what LLMs are able to do and figure out whether they are able to solve your business problem.
However, they are not as good at scale — too expensive, rate limits, and poor latency are just some of the reasons. At scale, the self-hosted LLM wins in a lot of use cases.
Enterprises that we work with have been furiously creating POCs this year and have demonstrated huge business value in a lot of use cases. They are now at the stage where they are starting to think about scaling — self-hosted LLMs have a huge advantage at this stage. The more LLMs we see in production, the more of those that will be self-hosted.
Number 3: Legal and infosec teams don’t like relying on API-based models
Goes without saying. OpenAI is making moves to solve these problems, but there are still outstanding concerns. If they had it their way, they would self-host everything to be sure that data and IP is totally protected.
Number 4: The cost of these models is starting to dawn on leadership teams
If businesses want transformational adoption of LLMs they will need to adopt them at scale. And at any decent scale self-hosing LLMs is significantly cheaper than using API services. We know businesses who are already spending millions a year — and this is just at a small scale.
Also, for most use cases, large LLMs like GPT-4 are not required to solve the task — so by using much smaller language models (<7B) the cost saving can be enormous.
OpenAI is currently understood to be making a loss with their APIs, so we know some business leaders who are happy with the current API cost but are wary that this cost will likely rise as OpenAI feels their enterprise adoption is locked in. Just something to be wary about.
Number 5: The difficulty of building and deploying these models is diminishing drastically
Probably the biggest reason why businesses are held back from self-hosting LLMs is because it is just too difficult. There are a myriad of problems when self-hosting, from picking from the hundreds of model options, getting enough GPU access, and wrangling to get your model fast enough. These are problems that are there, in large part due to the maturity of the field. These problems are being rapidly solved by both the open source community and companies like ours, TitanML.
In fact, early adopters of the Takeoff Server Pro have already reported significant productivity improvements and cost reductions when developing and deploying self-hosted LLMs.
By this time next year, there will be established best practices for self-hosting which will radically reduce the difficulty and skill required to self-host a language model. This will be a huge accelerator for the adoption of self-hosted LLMs.
So my prediction is 2024 will be the year of the self-hosted LLM. With cost and security pressures from enterprises on the rise, and technical barriers quickly lowering — there will soon be little reason for enterprises to use API-based models when building LLM applications. I’ll be here next year to see if I was right!
About TitanML
TitanML enables machine learning teams to effortlessly and efficiently deploy large language models (LLMs). Their flagship product the Titan Takeoff Inference Server is already supercharging the deployments of a number of ML teams.
Founded by Dr. James Dborin, Dr. Fergus Finn and Meryem Arik, and backed by key industry partners including AWS and Intel, TitanML is a team of dedicated deep learning engineers on a mission to supercharge the adoption of enterprise AI.
Written by Meryem Arik, Co-founder/CEO at TitanML.
Deploying Enterprise-Grade AI in Your Environment?
Unlock unparalleled performance, security, and customization with the TitanML Enterprise Stack