TitanML

Back to Resources

News

Takeoff Inference v0.11 Release

To webinar

Rod Rivera

February 15, 2024

•

We're excited to announce the release of TitanML's Takeoff Inference v0.11, which includes several new capabilities to improve performance and usability

‍

Reranking and Classification Endpoints

We've added a new "/classify" endpoint that supports text classification tasks like sentiment analysis, natural language inference, and reranking models. It enables you to use the full sequence representations from models like T5 and BERT to determine document relevance for retrieval.

‍

CUDA Graph Caching

CUDA graphs can accelerate inference but consume additional memory. We've implemented an LRU cache to store a capped number of CUDA graphs to optimize this tradeoff. It improves average throughput while reducing the chance of out-of-memory errors on longer sequences.

‍

Smaller Container Image

By refactoring some dependencies, we've significantly reduced the container image size compared to the previous version. It allows for installation on more resource-constrained systems without compromising on model support.

‍

Deploying Enterprise-Grade AI in Your Environment?

Unlock unparalleled performance, security, and customization with the TitanML Enterprise Stack

Get started