Llama 3.1
Two versions of services based on Llama 3.1 have appeared in the Caila catalogue. The services are provided for introductory and testing purposes.
The availability of services is permanent.
llama3.1-8b
Using checkpoint: meta-llama/Meta-Llama-3.1-8B-Instruct.
Inference engine: vllm.
GPU: 1×3090 (at the time of publication)
llama3.1-70b-4q
Using checkpoint: hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4.
Inference engine: vllm.
GPU: 4×3090 (at the time of publication)