Quick Start Guide

Get SF-Bench running in 5 minutes.

Prerequisites

Before you begin, ensure you have:

Required Software

Python 3.10+ installed (Download)
Salesforce CLI (sf) installed (Install Guide)
DevHub org with scratch org allocation (Create DevHub)

API Key Requirements

You need an API key from one of these providers:

Provider	Environment Variable	Example Models	Where to Get
RouteLLM	`ROUTELLM_API_KEY`	Grok 4.1, GPT-5, Claude Opus 4	RouteLLM Dashboard
OpenRouter	`OPENROUTER_API_KEY`	Claude Sonnet, GPT-4, Llama	OpenRouter Keys
Google Gemini	`GOOGLE_API_KEY`	Gemini 2.5 Flash, Gemini Pro	Google AI Studio
Anthropic	`ANTHROPIC_API_KEY`	Claude 3.5 Sonnet, Claude Opus	Anthropic Console
OpenAI	`OPENAI_API_KEY`	GPT-4, GPT-3.5	OpenAI Platform

Resource Requirements

For Full Evaluation (12 tasks with --functional):

Scratch Orgs:
- Minimum: 1 org (with --max-workers 1, sequential execution)
- Recommended: 2-3 orgs (with --max-workers 2-3, balanced speed)
- Maximum: 5 orgs (with --max-workers 5, fastest but needs more capacity)
- Note: Each worker needs its own scratch org. Total tasks = 12, so you’ll create 12 orgs sequentially or in parallel based on workers.
Token Usage:
- Per task: ~8,000 tokens (input prompt + generated code + context)
- Full evaluation: ~96,000 tokens (~0.1M tokens)
Time: 1-2 hours (depends on scratch org creation speed and model response time)
Cost: $0.10-$2 per evaluation (varies by model and provider)

For Lite Evaluation (5 tasks):

Scratch Orgs: 1-3 orgs
Token Usage: ~40,000 tokens
Time: ~10-15 minutes

System Requirements:

Max Workers: Supports up to 5 workers (based on typical DevHub limits)
Network: Stable internet connection for API calls and scratch org creation
Disk Space: ~500MB for workspace and cloned repositories

Step 1: Install SF-Bench (2 min)

# Clone the repository
git clone https://github.com/yasarshaikh/SF-bench.git
cd SF-bench

# Install dependencies
pip install -e .

Step 2: Authenticate with DevHub (1 min)

# Login to your DevHub
sf org login web --alias DevHub --set-default-dev-hub

# Verify connection
sf org list --all

You should see your DevHub marked with (D).

Step 3: Configure Your AI Model (1 min)

Choose your provider and set the API key:

OpenRouter (Recommended - Access to 100+ models)

export OPENROUTER_API_KEY="your-openrouter-key-here"

Google Gemini

export GOOGLE_API_KEY="your-google-api-key-here"

Anthropic Claude

export ANTHROPIC_API_KEY="your-anthropic-key-here"

OpenAI

export OPENAI_API_KEY="your-openai-key-here"

Local Ollama (No API key needed)

ollama serve  # Start Ollama in another terminal

Step 4: Run Your First Evaluation (1 min)

# Quick test with a single task
python scripts/evaluate.py \
  --model "gemini-2.5-flash" \
  --tasks data/tasks/verified.json \
  --max-workers 1

Note: First run may take 5-10 minutes as it creates a scratch org.

Step 5: View Results

# View summary
cat evaluation_results/*/summary.md

# View detailed report
cat evaluation_results/*/report.json

🎉 Success!

You’ve run your first SF-Bench evaluation! Now you can:

Try Different Models

# Claude Sonnet 4.5 (via OpenRouter)
python scripts/evaluate.py --model "anthropic/claude-3.5-sonnet"

# GPT-4 (via OpenRouter)
python scripts/evaluate.py --model "openai/gpt-4-turbo"

# Local model (via Ollama)
python scripts/evaluate.py --model "codellama" --provider ollama

Run Full Evaluation

# Run all 12 verified tasks with functional validation
python scripts/evaluate.py \
  --model "your-model" \
  --tasks data/tasks/verified.json \
  --functional \
  --max-workers 2

Time: ~1 hour for full evaluation

Use Lite Dataset (Coming Soon)

# Quick 5-task validation (~10 minutes)
python scripts/evaluate.py \
  --model "your-model" \
  --tasks data/tasks/lite.json \
  --max-workers 1

Common Issues

“DevHub not found”

# Re-authenticate
sf org login web --alias DevHub --set-default-dev-hub

“Scratch org creation failed”

# Check org limits
sf org list limits --target-org DevHub

# Clean up old orgs
sf org list scratch
sf org delete scratch --target-org <username> --no-prompt

“API key not found”

# Verify environment variable is set
echo $OPENROUTER_API_KEY

# Or export it again
export OPENROUTER_API_KEY="your-key"

Next Steps

Need Help?

Got through this guide in under 5 minutes? ⭐ Star us on GitHub!