Downloading AI models

The Models tab allows you to download and load AI models for text embedding, image embedding, image captioning, and text generation. Multiple models of each type may be downloaded, but only one of each type may be loaded into memory for use at any given time.

Important  You must first enable the model server before the Models tab will be available. Some models require you to include a Hugging Face token in the Model Server tab to download the model. See Starting, stopping, and updating the AI model server.

To add a model:

  1. Click the AI Services > Models tab.

  2. Click Add Model.

  3. In the Add a Model dialog, enter the following information:

    • Model Name: The exact name of the model as it appears on the Hugging Face website.

    • Model Type: Select Embedding, Image Captioning, or Text Generation.

    • Select Yes if you would like to give permission to download the model from Hugging Face.

  4. Click Add Model.

To download a model:

  1. Click the AI Services > Models tab.

  2. Click Download next to the model you want to download.

  3. If the model has not been confirmed for downloading, select Yes in the Add Model dialog, then click Add Model.

Note  Model downloads will fail if they require a Hugging Face token and you have not yet entered one in the Model Server tab.

To cancel a model download:

  1. Click the AI Services > Models tab.

  2. Click Cancel next to the model where download is in progress.

To load a model into memory:

  1. Click the AI Services > Models tab.

  2. Click Load next to the model you want to download.

Note  Models cannot be loaded until they have been downloaded.

To manage model settings:

  1. Click the AI Services > Models tab.

  2. Click Manage at the top right of the window.

  3. Change any of the following settings:

    • Local Model Cache Directory

    • Use vLLM Inference Engine

    • Load Models on Demand

    • Max Loaded Embedding Models

    • Max Loaded Image Captioning Models

    • Max Loaded Text Generation Models

    • Max Response Tokens - Maximum number of tokens in text generation responses. When set to -1, each model generates output tokens up to its own limit. Limiting the response allows you to optimize response latency and performance.

To remove a model:

  1. Click the AI Services > Models tab.

  2. Click the check box next to the model or models you want to remove.

  3. Click Remove at the top of the window.

  4. (Optional) Select Yes in the Remove Models dialog to also delete the corresponding model files from the disk.