News

TitanML Introduces Full Support for Llama 3.1 Family on the Takeoff Inference Stack

To webinar

Meryem Arik

July 23, 2024

•

We are thrilled to announce the integration of the Llama 3.1 family into our Takeoff Inference Stack. This powerful lineup includes models with 8 billion, 70 billion, and an unprecedented 405 billion parameters, offering state of the art performance for every single resource size. TitanML clients can as of today use the Takeoff Inference Stack to deploy and serve these groundbreaking models privately in their own environments.

Unlocking New Use Cases with Llama 3.1

In just over three months since the release of Llama 3, hundreds of enterprises have deployed it at scale, in production environments. With the advent of Llama 3.1, we anticipate even broader adoption and the creation of smarter, faster, and more accurate applications leveraging Generative AI. Here are some of the standout features when deployed with the Takeoff Inference Stack:

Context Window Expansion

The context window size has increased from 8k to up to 128k tokens, enabling models to process significantly more context information. This enhancement opens up new possibilities for applications such as Retrieval Augmented Generation (RAG), document summarization, business intelligence reporting, and other enterprise solutions.

Multi-Language Support

Supporting more languages formally broadens the reach of LLMs for global business applications. Llama 3.1 now supports eight languages, doubling the previous count and positioning it as a state-of-the-art open-source LLM capable of serving diverse markets and multinational organizations.

Enhanced Tool Calling Support

The rise of AI agents relies heavily on tool-calling capabilities. Llama 3.1 when used with the Takeoff Inference Stack enables this feature our of the box, allowing models to perform actions such as web searches, database retrievals, SMS sending, flight booking, and code review submissions on platforms like GitHub. This enables robust agentic workflows to meet various enterprise needs. Notable tool-calling features include:

• JSON Enforcing

• REGEX Enforcing

• SQL Enforcing

Distil to domain specific models

The Llama 3.1 licence permits the use of the models to train smaller and more domain specific models. Meaning you can use the outputs of the large high quality models to train, smaller more domain specific models in your task. For industries which have domain specific requirements such as Healthcare and Finance we anticipate that this will be a large use case for the serving of the largest 405B model. Using TitanML LoRA serving, clients can serve dozens or hundreds of these domain or task adapted models using the same compute resource!

The Most Capable Open-Source LLM: Meta Llama 3.1 405B

The Llama 3.1 lineup includes the most advanced open-source LLM: Llama 3.1 405B. Quality benchmarks reveal that this model competes closely with leading closed-source models like GPT-4o and Claude 3.5 Sonnet, excelling in tasks such as code generation, math, and function calling.

With the rapid growth in the number of open-source LLMs and advancements in quality, more companies are transitioning from proprietary model APIs to open-source alternatives for greater privacy, cost savings, and customisability. Llama 3.1 405B promises to address lingering quality gaps between open and closed AI, particularly in sectors like healthcare and finance, where none of quality, privacy, and compliance can be traded off.

Deploying Llama 3.1 405B on the Takeoff Inference Stack

The 405 billion parameter model demands a highly optimized AI infrastructure for efficient performance. The Takeoff Inference Stack, designed for deployment within a customer’s VPC or on-prem data centre, allows enterprises to self-host powerful LLMs, achieving performance that meets or exceeds GPT-4 quality while ensuring data remains secure within their environment.

Get Started with Llama 3.1 on TitanML Today!

You can start using the Llama 3.1 models immediately with the Takeoff Inference Stack within your own private enviornment. Contact us to request a no-cost Proof of Concept (POC) for deploying Llama 3.1 family on the Takeoff Inference Stack or to learn more about how your business can utilise Llama 3.1!

‍

Deploying Enterprise-Grade AI in Your Environment?

Unlock unparalleled performance, security, and customization with the TitanML Enterprise Stack

Get started