Model Configuration & Lifecycle:
PrivateLLM is designed to be flexible and efficient. Here is how it manages the AI models:
Default Model: By default, the component uses Qwen 2.5 0.5B, a highly efficient model optimized for speed and multilingual tasks.
Custom Models: You can use any Instruction-Tuned GGUF model by providing a direct Hugging Face CDN URL, provided its file size is under 512MB to fit within ODC’s ephemeral storage limit. Note that execution also depends on the 1GB RAM limit; larger models or long inputs may still cause timeouts or memory exhaustion during inference.
Caching Logic: On the first call, the model is downloaded to the environment's temporary directory. Subsequent calls will reuse this local file, resulting in much faster response times.
Persistence: The cached model remains available until the External Logic container is disposed of by the platform (typically due to inactivity or a new deployment). When the container restarts due to a new request, the model will be automatically re-download and cached.
How to get a CDN URL: In Hugging Face (https://huggingface.co/), go to the Files and versions tab of a GGUF repository, click on the specific .gguf file, and click "Copy download link".
.gguf