A person looking at ChatGPT home screen on the computer
Rafael Saraceni Avatar

How to choose the right LLM (Large Language Model) for your application

With the popularisation of LLM ( large language models) and the increase of the AI powered applications demand, we have seen a surge on the offer of new LLMs and services. If you are not a tech expert with familiarity with the terms and updates on the field, it can be very challenging to understand the differences and limitations of all the resources available on LLMs.

My personal take when evaluating LLMs for a specific application is taking into account three main points:

  • The resources you have available
  • The use cases of the LLM
  • The maturity of your application

Resources Available

Large Language Models require hardware and cloud resources to make them available for your users. Today we have solutions that will take care of everything and will allow us to connect to the LLMs just using an API or out-of-the-box interface. Since they absorb most of the complexity of the setup and management of such resources, they might have some limitations and higher costs compared to other solutions in which we assume responsibility of hosting and exposing the LLMs.

Depending on the LLM of your choice you might need extra hardware resources like GPUs and memory. This can increase your costs at a point that your users will not want to pay for your application. In some cases, the LLM might be used for generating and pre-processing data that will be stored in a database consumed by the application. For this case you just need an LLM during the data processing step. But if your LLM has to receive constant user generated data to provide an output, you need to have a LLM up and running as long as you have users interacting with your application.

Some companies and entrepreneurs have high performance computers with lots of memory and GPUs capable of running some LLMs locally. They might use them to perform data processing related tasks on their machine and update the databases when needed instead of relying on 3rd party cloud services. In some cases you might need to change to a LLM that consumes less resources in order to keep your costs down. Even though LLMs who consume more resources tend to have better performance compared to the ones who consume less resources, if you make the right choice you can drastically reduce your costs without penalising your user experience.


So here are some reasoning you might have to do before determining your resources available:

  • What is the level of skills you have available on your team? Do you have an engineer capable of managing the complexity of your own environment for running, updating and maintaining a LLM or do you have to rely on 3rd party providers to take care of this complexity?
  • How much money can you spend a month and also for the setup of you LLM infrastructure?
  • Do you have any partnership with a provider that can give you a discount or free resources for a limited time?
  • Does you or your team members have previous experience with some of the solutions providers?

The idea here is to understand what you can afford in terms of time and money to put up a LLM running and serving you application.

The use cases

Here you can think in terms of data-in and data-out. What kind of information is it supposed to be provided to your LLM? Are we talking about a LLM that will be use to translate labels from a csv file? Is it a conversational ChatBot? Or do you plan to give it as input some HTML data of a website you just scrapped? And what kind of information do you need to get from this LLM? Is it supposed to be a structured data like a JSON containing pre specified properties? Or just a textual answer to a question?

Also one important point to take in consideration is the multi-language support. Even for non-conversational LLMs, sometimes it is important to understand if the information ( specially if it is user generated ) might come in different languages and the LLM can work with them. One option can also be to have a translation layer before your information is fed to your LLM, this way you can remove the complexity of multi-language from it. It has it’s pros and cons but this should be evaluated with your tech team.

If you have a very complex task that might seem too overwhelming to implement using the LLM of your choice, one option can be to break it in smaller tasks and evaluate using more than one LLM, each responsible for one specific step of the task. Sometimes you might not need a LLM at all and can break your data flow using other types of methods that in the end will allow you to get a refined input that your LLM of choice can be able to interpret.

The maturity of your application

Are you a pre-seed startup that is still on the path for product market fit and is relying on money from early investors? Are you developing a feature for an enterprise established SaaS who already did their IPO and must ship a new feature by the end of the month? Or are you an academic doing some research on new applications for LLM in one specific industry?

My take on this is that the best way to implement LLM based features on apps is doing it incrementally. You can start with the simplest use case with the simplest information needed to advance one step further and use the easiest ready out-of-the-box solution for your case ( yes, 99% of the times you are using OpenAI API. And I swear for god, I’m not being paid by any of these people ). The cost might be very high if you have an app established and lots of calls to the API. One way to minimize this is to deploy the new app or feature to a small subset of your users. This way you won’t have so many API calls and can define key-metrics to evaluate if you are on the right course based on your user behaviour and easily change your trajectory if you understand that your users are not very satisfied with what you came up to.

If you have a very specific need that might involve your LLM relying on data from external sources ( documents, databases, scrapped data ), I highly recommend using a technique called RAG that allows you to provide this data as a context to your LLM allowing it to reason based on previous knowledge available for your specific use case.

Final considerations based on my professional experience

Based on my professional experience, the hardest part when creating LLM based features or applications is actually finding and preparing the right data that can be used as input to your LLM. Sometimes people deploy LLMs just to help on this phase of finding and preparing data that will be supplied to other LLM that will actually give the output to the end user. We have a saying in computer science that is: garbage in, garbage out.It means that no matter how much money and effort you put on a software, if you don’t provide the right kind of data, you are not gonna get a satisfying outcome.

Tagged in :

Rafael Saraceni Avatar

Leave a Reply

Your email address will not be published. Required fields are marked *