MOAR Examples

Complete working examples for MOAR optimization.

Medication Extraction Example

This example extracts medications from medical transcripts and evaluates extraction accuracy.

Metric Key

The metric_key in the optimizer_config section specifies which key from your evaluation function's return dictionary will be used as the accuracy metric. In this example, metric_key: medication_extraction_score means MOAR will optimize using the medication_extraction_score value returned by the evaluation function.

pipeline.yaml

datasets:
  transcripts:
    path: workloads/medical/raw.json
    type: file

default_model: gpt-4o-mini
bypass_cache: true

optimizer_config:
  type: moar
  dataset_path: workloads/medical/raw_sample.json  # Use sample for faster optimization
  save_dir: workloads/medical/moar_results
  available_models:  # LiteLLM model names - ensure API keys are set in your environment
    - gpt-5.1-nano
    - gpt-5.1-mini
    - gpt-5.1
    - gpt-4o
    - gpt-4o-mini
  evaluation_file: workloads/medical/evaluate_medications.py
  metric_key: medication_extraction_score
  max_iterations: 40
  rewrite_agent_model: gpt-5.1

system_prompt:
  dataset_description: a collection of transcripts of doctor visits
  persona: a medical practitioner analyzing patient symptoms and reactions to medications

operations:
  - name: extract_medications
    type: map
    output:
      schema:
        medication: list[str]
    prompt: |
      Analyze the following transcript of a conversation between a doctor and a patient:
      {{ input.src }}
      Extract and list all medications mentioned in the transcript.
      If no medications are mentioned, return an empty list.

pipeline:
  steps:
    - name: medication_extraction
      input: transcripts
      operations:
        - extract_medications
  output:
    type: file
    path: workloads/medical/extracted_medications_results.json

evaluate_medications.py

import json
from typing import Any, Dict
from docetl.utils_evaluation import register_eval

@register_eval
def evaluate_results(dataset_file_path: str, results_file_path: str) -> Dict[str, Any]:
    """
    Evaluate medication extraction results.

    Checks if each extracted medication appears verbatim in the original transcript.
    In this example, the dataset has a 'src' attribute with the original input text.
    """
    # Load pipeline output
    with open(results_file_path, 'r') as f:
        output = json.load(f)

    total_correct_medications = 0
    total_extracted_medications = 0

    # Evaluate each result
    for result in output:
        # In this example, the dataset has a 'src' attribute with the original transcript
        original_transcript = result.get("src", "").lower()
        extracted_medications = result.get("medication", [])

        # Check each extracted medication
        for medication in extracted_medications:
            total_extracted_medications += 1
            medication_lower = str(medication).lower().strip()

            # Check if medication appears in transcript
            if medication_lower in original_transcript:
                total_correct_medications += 1

    # Calculate metrics
    precision = total_correct_medications / total_extracted_medications if total_extracted_medications > 0 else 0.0

    return {
        "medication_extraction_score": total_correct_medications,  # This is used as the accuracy metric
        "total_correct_medications": total_correct_medications,
        "total_extracted_medications": total_extracted_medications,
        "precision": precision,
    }

Running the Optimization

docetl build workloads/medical/pipeline_medication_extraction.yaml --optimizer moar

Using Sample Datasets

Notice that dataset_path points to raw_sample.json for optimization, while the main pipeline uses raw.json. This prevents optimizing on your test set.

Key Points

Evaluation Function

In this example, uses the src attribute from output items (no need to load dataset separately)
Checks if extracted medications appear verbatim in the transcript
Returns multiple metrics, with medication_extraction_score as the primary one

Configuration

Uses a sample dataset for optimization (dataset_path)
Includes multiple models in available_models to explore trade-offs
Sets max_iterations to 40 for a good balance of exploration and time