Skip to main content

LLM solution

Caila offers a wide range of services for accessing LLMs, both cloud and self-hosted. Caila allows direct access to all GPT services through a unified API. Additionally, you can run open models on your own servers and pay not for API calls but for server rental.

LLM services available in Caila

Services for chatting with an LLM are collected in the catalogue in the GPT category.

They are divided into categories:

  • Proxy to cloud-based LLM. All services are available from within the Russian Federation and through a single access key.
  • Open LLM available through API. These models are hosted on our servers and are available constantly. The list of available models changes periodically. See the current list in the catalogue.
  • Open LLM is available to run on dedicated servers. On dedicated resources, a user can run any model. Caila supports several inference engines.
ProxyHosted by usFor deployment on your own servers
OpenAILlama3vLLM
ClaudeQwen2Ollama
GeminiMistralTGI

| GigaChat | | | | YandexGPT | | |

API for direct access to LLMs

The description of Caila API for accessing LLMs can be found here: Chat generation.

All services can be accessed through the OpenAI Adapter.

Tools for working with LLMs and GPT services

  • Multi Chat allows you to chat simultaneously with several models or even with the same model but with different settings. Designed for side-by-side comparison of different models and selecting suitable models and parameters. Read more in the special Multi Chat section.
  • GPT Viewer allows you to enable history saving and view the history of requests to LLMs that are hosted on your dedicated servers.
  • LLM Eval is a set of applications: Jupyter notebooks for running benchmarks on LLM services through Caila API. Access to benchmark launch tools is provided upon request via technical support.