Troubleshooting Guide
This guide helps you diagnose and resolve common issues with SF-Bench evaluations.
Patch Application Errors
“unexpected end of file in patch”
Symptom: Patch application fails with “unexpected end of file in patch” or “corrupt patch at line X”
Root Cause: The AI model generated a truncated or incomplete patch. The patch is cut off mid-hunk, missing required context lines.
Tool Behavior:
- SF-Bench automatically detects and attempts to repair incomplete patches
- All 4 patch application strategies are attempted (strict, reject, 3-way, fuzzy)
- If all strategies fail, this indicates the patch structure is fundamentally invalid
This is NOT a tool issue - the benchmark correctly identified that the AI model generated an invalid patch.
Resolution:
- This is expected behavior when AI models generate incomplete patches
- The evaluation will mark the task as ERROR with a clear error message
- No action needed - this is a model limitation, not a tool bug
“malformed patch at line X”
Symptom: Patch application fails with “malformed patch at line X”
Root Cause: The AI model generated a patch with invalid diff syntax (e.g., standalone + or - lines, missing context, invalid hunk headers).
Tool Behavior:
- SF-Bench automatically cleans common malformations (markdown fences, empty lines, etc.)
- Multiple fallback strategies are attempted
- If all strategies fail, the patch is fundamentally invalid
This is NOT a tool issue - the benchmark correctly identified invalid patch syntax.
Resolution:
- This is expected when AI models generate syntactically invalid patches
- The evaluation will mark the task as ERROR
- No action needed - this indicates the AI model needs improvement
Scratch Org Creation Errors
“Package ID AC - Collections” or “ancestorVersion HIGHEST can’t be found”
Symptom: Scratch org creation fails with package dependency errors for Flow tasks
Root Cause: This is a Salesforce platform limitation, not a tool issue. Flow features may require managed packages (e.g., “AC - Collections”) that are not available in Developer Edition scratch orgs.
Tool Behavior:
- SF-Bench correctly identifies this as a platform limitation
- Error messages clearly distinguish platform constraints from tool issues
- The evaluation will mark the task as ERROR with documentation that this is a platform limitation
This is NOT a tool issue - it’s a known Salesforce platform constraint.
Resolution:
- This is a Salesforce platform limitation, not a benchmark tool failure
- Flow tasks requiring specific managed packages may fail in Developer Edition scratch orgs
- This is documented in evaluation results as a platform constraint
Pre-flight Check Failures
“DevHub not authenticated”
Symptom: Pre-flight checks fail with “DevHub not authenticated or not found”
Resolution:
# Authenticate with your DevHub
sf org login web --alias DevHub --instance-url <your-instance-url> --set-default-dev-hub
# Verify authentication
sf org list
“API key not set”
Symptom: Pre-flight checks fail with “API key not configured”
Resolution:
- Copy
.env.sampleto.env - Add your API keys to
.env:ROUTELLM_API_KEY="your_key_here" OPENROUTER_API_KEY="your_key_here" # etc. - Ensure
.envis in the project root directory
Evaluation Errors
“Evaluation failed: name ‘X’ is not defined”
Symptom: Evaluation crashes with NameError or AttributeError
Root Cause: This is a tool bug that needs to be fixed.
Resolution:
- Report this as a bug in the SF-Bench repository
- Include the full error traceback
- This is a legitimate tool issue that will be fixed
Tasks showing ERROR status
Understanding ERROR vs FAIL:
- ERROR: Tool or platform issue (scratch org creation failed, patch application failed due to corrupt patch, etc.)
- FAIL: Task completed but didn’t meet requirements (code deployed but tests failed, functional validation failed, etc.)
If all tasks show ERROR:
- Check logs for common patterns
- Verify API keys and DevHub authentication
- Check scratch org limits
- Review error messages to distinguish tool vs model issues
Performance Issues
Evaluation taking too long
Possible Causes:
- Large number of tasks
- Slow API responses from AI provider
- Scratch org creation delays
- Network issues
Optimization:
- Reduce
--max-workersif system is overloaded - Use faster AI models for testing
- Check network connectivity
- Monitor API rate limits
Getting Help
If you encounter an issue that’s not covered here:
- Check the logs: Evaluation logs are in
logs/directory with detailed error messages - Review error messages: SF-Bench clearly distinguishes tool issues from model/platform issues
- Report bugs: If you believe it’s a tool issue, report it with:
- Full error traceback
- Log file excerpts
- Steps to reproduce
- Environment details
Error Message Guide
SF-Bench error messages are designed to clearly distinguish:
- Tool Issues: “Evaluation failed: …” (internal errors, bugs)
- Model Issues: “AI model generated invalid patch” (corrupt/malformed patches)
- Platform Limitations: “Salesforce platform constraint” (package dependencies, org limits)
This transparency ensures vendors cannot blame the tool for model or platform limitations.