Problem

When adopting Large Language Models (LLMs), businesses face a critical question: Should you self-host the model or simply access it through an API?

The right choice depends on several key factors, for example:

  • Do you operate in a regulated industry where owning and protecting proprietary data is critical?
  • How will it impact or augment existing business processes?
  • Do you have the necessary engineering resources and expertise to build, deploy, and maintain an in-house LLM solution from scratch?

Self-hosting LLMs requires advanced DevSecOps and Kubernetes expertise, placing a high operational burden on organisations. The shortage of platform engineering skills raises the barrier to entry, often diverting data teams from integrating LLMs into business systems and delaying real-world impact.

Solution

With Kubox, your teams can deploy a self-hosted LLM in minutes — without needing Kubernetes or cloud expertise. This end-to-end example includes full source code to deploy an open-source chatbot solution, bringing the power of Large Language Models (LLMs) into your own secure environment, while giving you full control and significantly reducing operational costs.

Data Infrastructure

The system uses Ray.io and vLLM to efficiently serve a Meta-Llama-3.1 model on NVIDIA L4 GPUs instance on AWS.

Source Code

Visit our Github repository at https://github.com/kubox-ai/chatbot

What’s next

In future examples, we’ll dive into practical challenges and solutions for optimising LLM inference performance to build cost effective, scalable and high-performant AI systems.