Troubleshooting Guide

This guide helps you diagnose and resolve common issues with SF-Bench evaluations.

Patch Application Errors

“unexpected end of file in patch”

Symptom: Patch application fails with “unexpected end of file in patch” or “corrupt patch at line X”

Root Cause: The AI model generated a truncated or incomplete patch. The patch is cut off mid-hunk, missing required context lines.

Tool Behavior:

SF-Bench automatically detects and attempts to repair incomplete patches
All 4 patch application strategies are attempted (strict, reject, 3-way, fuzzy)
If all strategies fail, this indicates the patch structure is fundamentally invalid

This is NOT a tool issue - the benchmark correctly identified that the AI model generated an invalid patch.

Resolution:

This is expected behavior when AI models generate incomplete patches
The evaluation will mark the task as ERROR with a clear error message
No action needed - this is a model limitation, not a tool bug

“malformed patch at line X”

Symptom: Patch application fails with “malformed patch at line X”

Root Cause: The AI model generated a patch with invalid diff syntax (e.g., standalone + or - lines, missing context, invalid hunk headers).

Tool Behavior:

SF-Bench automatically cleans common malformations (markdown fences, empty lines, etc.)
Multiple fallback strategies are attempted
If all strategies fail, the patch is fundamentally invalid

This is NOT a tool issue - the benchmark correctly identified invalid patch syntax.

Resolution:

This is expected when AI models generate syntactically invalid patches
The evaluation will mark the task as ERROR
No action needed - this indicates the AI model needs improvement

Scratch Org Creation Errors

“Package ID AC - Collections” or “ancestorVersion HIGHEST can’t be found”

Symptom: Scratch org creation fails with package dependency errors for Flow tasks

Root Cause: This is a Salesforce platform limitation, not a tool issue. Flow features may require managed packages (e.g., “AC - Collections”) that are not available in Developer Edition scratch orgs.

Tool Behavior:

SF-Bench correctly identifies this as a platform limitation
Error messages clearly distinguish platform constraints from tool issues
The evaluation will mark the task as ERROR with documentation that this is a platform limitation

This is NOT a tool issue - it’s a known Salesforce platform constraint.

Resolution:

This is a Salesforce platform limitation, not a benchmark tool failure
Flow tasks requiring specific managed packages may fail in Developer Edition scratch orgs
This is documented in evaluation results as a platform constraint

Pre-flight Check Failures

“DevHub not authenticated”

Symptom: Pre-flight checks fail with “DevHub not authenticated or not found”

Resolution:

# Authenticate with your DevHub
sf org login web --alias DevHub --instance-url <your-instance-url> --set-default-dev-hub

# Verify authentication
sf org list

“API key not set”

Symptom: Pre-flight checks fail with “API key not configured”

Resolution:

Copy .env.sample to .env

Add your API keys to .env:

ROUTELLM_API_KEY="your_key_here"
OPENROUTER_API_KEY="your_key_here"
# etc.

Ensure .env is in the project root directory

Evaluation Errors

“Evaluation failed: name ‘X’ is not defined”

Symptom: Evaluation crashes with NameError or AttributeError

Root Cause: This is a tool bug that needs to be fixed.

Resolution:

Report this as a bug in the SF-Bench repository
Include the full error traceback
This is a legitimate tool issue that will be fixed

Tasks showing ERROR status

Understanding ERROR vs FAIL:

ERROR: Tool or platform issue (scratch org creation failed, patch application failed due to corrupt patch, etc.)
FAIL: Task completed but didn’t meet requirements (code deployed but tests failed, functional validation failed, etc.)

If all tasks show ERROR:

Check logs for common patterns
Verify API keys and DevHub authentication
Check scratch org limits
Review error messages to distinguish tool vs model issues

Performance Issues

Evaluation taking too long

Possible Causes:

Large number of tasks
Slow API responses from AI provider
Scratch org creation delays
Network issues

Optimization:

Reduce --max-workers if system is overloaded
Use faster AI models for testing
Check network connectivity
Monitor API rate limits

Getting Help

If you encounter an issue that’s not covered here:

Check the logs: Evaluation logs are in logs/ directory with detailed error messages
Review error messages: SF-Bench clearly distinguishes tool issues from model/platform issues
Report bugs: If you believe it’s a tool issue, report it with:
- Full error traceback
- Log file excerpts
- Steps to reproduce
- Environment details

Error Message Guide

SF-Bench error messages are designed to clearly distinguish:

Tool Issues: “Evaluation failed: …” (internal errors, bugs)
Model Issues: “AI model generated invalid patch” (corrupt/malformed patches)
Platform Limitations: “Salesforce platform constraint” (package dependencies, org limits)

This transparency ensures vendors cannot blame the tool for model or platform limitations.