top of page

Paper accepted at IEEE International Conference on Blockchain and Cryptocurrency (ICBC 2025)

March 2025

Automated Judging of LLM-based Smart Contract Security Auditors

The rise of Large Language Model based smart contract auditors, supplementing manual audits, lacks standardized evaluation methods. This creates considerable uncertainty about their reliability and effectiveness in identifying vulnerabilities. This gap is particularly critical as smart contract exploits continue to result in billion-dollar losses and the complexity of blockchain systems grows exponentially. Our research introduces \texttt{smartJudge}, a comprehensive evaluation framework that systematically assesses the capabilities of LLM-based smart contract auditors through multi-dimensional analysis. To develop \texttt{smartJudge}, we first created an agenticRAG architecture that leverages agents to acquire in-depth knowledge and build expertise on smart contract auditing, vulnerabilities, and best practices. We then developed a distilled LLM judge model that efficiently processes auditor outputs through strategic evaluation points to evaluate how well auditors detect vulnerability patterns, understand security research, and identify complex vulnerability chains. Finally, we constructed a benchmark suite with evaluation metrics providing a standardized way to measure auditor performance across diverse vulnerability types. Testing \texttt{smartJudge} on the leading LLM-based auditors revealed critical gaps in vulnerability detection capabilities, particularly in complex attack vectors and novel exploit patterns. Our framework provides insights for improving automated auditing tools and establishes the first systematic methodology for evaluating LLM-based smart contract auditors, addressing a crucial bottleneck in blockchain security tools.

© 2025 Decentralized Science Lab. All Rights Reserved.

bottom of page