MOAR Configuration Reference
Complete reference for all MOAR configuration options.
Required Fields
All fields in optimizer_config are required (no defaults):
| Field | Type | Description |
|---|---|---|
type |
str |
Must be "moar" |
save_dir |
str |
Directory where MOAR results will be saved |
available_models |
list[str] |
List of LiteLLM model names to explore (e.g., ["gpt-4o-mini", "gpt-4o"]). Make sure your API keys are set in your environment for these models. |
evaluation_file |
str |
Path to Python file containing @register_eval decorated function |
metric_key |
str |
Key in evaluation results dictionary to use as accuracy metric |
max_iterations |
int |
Maximum number of MOARSearch iterations to run |
model |
str |
LLM model to use for directive instantiation during search |
All Fields Required
MOAR will error if any required field is missing. There are no defaults.
Optional Fields
| Field | Type | Default | Description |
|---|---|---|---|
dataset_path |
str |
Inferred from datasets |
Path to dataset file to use for optimization. Use a sample/hold-out dataset to avoid optimizing on your test set. |
exploration_weight |
float |
1.414 |
UCB exploration constant (higher = more exploration) |
build_first_layer |
bool |
False |
Whether to build initial model-specific nodes |
ground_truth_path |
str |
None |
Path to ground truth file (for evaluation) |
Dataset Path
Automatic Inference
If dataset_path is not specified, MOAR will automatically infer it from the datasets section of your YAML:
datasets:
transcripts:
path: data/full_dataset.json # This will be used if dataset_path not specified
type: file
optimizer_config:
# dataset_path not specified - will use data/full_dataset.json
# ... other config ...
Using Sample/Hold-Out Datasets
Best Practice
Use a sample or hold-out dataset for optimization to avoid optimizing on your test set.
optimizer_config:
dataset_path: data/sample_dataset.json # Use sample/hold-out for optimization
# ... other config ...
datasets:
transcripts:
path: data/full_dataset.json # Full dataset for final pipeline
The optimizer will use the sample dataset, but your final pipeline uses the full dataset. This ensures you don't overfit to your test set during optimization.
Model Configuration
Available Models
LiteLLM Model Names
Use LiteLLM model names (e.g., gpt-4o-mini, gpt-4o, gpt-4.1). Make sure your API keys are set in your environment.
available_models: # LiteLLM model names - ensure API keys are set
- gpt-4.1-nano # Cheapest, lower accuracy
- gpt-4.1-mini # Low cost, decent accuracy
- gpt-4.1 # Balanced
- gpt-4o # Higher cost, better accuracy
Model for Directive Instantiation
The model field specifies which LLM to use for generating optimization directives during the search process. This doesn't affect the models tested in available_models.
Cost Consideration
Use a cheaper model (like gpt-4o-mini) for directive instantiation to reduce search costs.
Iteration Count
The max_iterations parameter controls how many pipeline configurations MOAR explores:
- 10-20 iterations: Quick exploration, good for testing
- 40 iterations: Recommended for most use cases
- 100+ iterations: For complex pipelines or when you need the absolute best results
Time vs Quality
More iterations give better results but take longer and cost more.
Complete Example
optimizer_config:
type: moar
save_dir: results/moar_optimization
available_models:
- gpt-4o-mini
- gpt-4o
- gpt-4.1-mini
- gpt-4.1
evaluation_file: evaluate_medications.py
metric_key: medication_extraction_score
max_iterations: 40
model: gpt-4.1
dataset_path: data/sample.json # Optional
exploration_weight: 1.414 # Optional