Fallback Models

Fallback models provide automatic failover to alternative models when API errors or content errors occur with your primary model. This feature improves pipeline reliability and reduces failures due to temporary API issues or model unavailability.

Overview

When configured, DocETL will automatically try fallback models in sequence if:

The primary model encounters an API error (rate limits, service unavailability, content warning errors, etc.)
The primary model returns invalid content that cannot be parsed
The primary model fails to respond within expected timeframes

This ensures your pipelines continue running even when individual models experience issues.

Configuration

Fallback models are configured at the global level in your pipeline YAML file. You can configure separate fallback models for:

Completion/Chat operations: Used by map, reduce, resolve, filter, and other LLM-powered operations
Embedding operations: Used by operations that generate embeddings (e.g., cluster, rank)

Basic Configuration

The simplest way to configure fallback models is to provide a list of model names:

# Default language model for all operations
default_model: gpt-4o-mini

# Fallback models for completion/chat operations
fallback_models:
  - gpt-3.5-turbo
  - claude-3-haiku-20240307

# Fallback models for embedding operations
fallback_embedding_models:
  - text-embedding-3-small
  - text-embedding-ada-002

Models will be tried in the order specified. If the primary model fails, DocETL will automatically try the first fallback model, then the second, and so on.

Advanced Configuration

For more control, you can specify additional LiteLLM parameters for each fallback model:

default_model: gpt-4o-mini

# Fallback models with custom parameters
fallback_models:
  - model_name: gpt-3.5-turbo
    litellm_params:
      temperature: 0.0
      max_tokens: 2000
  - model_name: claude-3-haiku-20240307
    litellm_params:
      temperature: 0.0

# Fallback embedding models
fallback_embedding_models:
  - model_name: text-embedding-3-small
    litellm_params: {}
  - model_name: text-embedding-ada-002
    litellm_params: {}

How It Works

When an operation uses a model (either the default_model or an operation-specific model), DocETL will:

Try the primary model first: The operation's specified model (or default_model) is attempted first
Fallback on error: If an API error or content parsing error occurs, DocETL automatically tries the first fallback model
Continue through fallbacks: If the first fallback also fails, it tries the next fallback model in sequence
Fail only if all models fail: The operation only fails if all models (primary + all fallbacks) fail

Example: Complete Pipeline with Fallback Models

Here's a complete example showing how to use fallback models in a pipeline:

datasets:
  example_dataset:
    type: file
    path: example_data/example.json

# Default language model for all operations unless overridden
default_model: gpt-4o-mini

# Fallback models for completion/chat operations
# Models will be tried in order when API errors or content errors occur
fallback_models:
  # First fallback model
  - model_name: gpt-3.5-turbo
    litellm_params:
      temperature: 0.0
  # Second fallback model
  - model_name: claude-3-haiku-20240307
    litellm_params:
      temperature: 0.0

# Fallback models for embedding operations
# Separate configuration for embedding model fallbacks
fallback_embedding_models:
  - model_name: text-embedding-3-small
    litellm_params: {}
  - model_name: text-embedding-ada-002
    litellm_params: {}

operations:
  - name: example_map
    type: map
    prompt: "Extract key information from: {{ input.contents }}"
    output:
      schema:
        extracted_info: "str"

pipeline:
  steps:
    - name: process_data
      input: example_dataset
      operations:
        - example_map

  output:
    type: file
    path: example_output.json