Trust But Verify: Evaluating the Accuracy of LLMs in Normalizing Threat Data Feeds
This paper examines whether Large Language Models (LLMs) can be reliably applied to the normalization of Indicators of Compromise (IOCs) into Structured Threat Information Expression (STIX) format. Using benchmark datasets of 200 IOCs across three types (MD5 hashes, URLs, and IPv4 addresses), the performance of Google’s Gemini 2.0 Flash and OpenAI’s ChatGPT-4o will be evaluated.
While both models achieved 100% validity in generating syntactically correct STIX outputs, their fidelity in accurately preserving IOC values varied significantly. Gemini outperformed ChatGPT overall, though both models struggled with hash values, exhibiting frequent omissions and erroneous pattern translations. The inconsistencies in these errors pose a major obstacle to the reliable use of LLMs in operational security and data engineering pipelines.
SANS-Trust-But-Verify-Peterson (PDF, 0.43MB)
16 Jul 2025Related Content
AI-Driven SecOps: Unifying Controls, Automating Response, and Advancing the Modern SOC Using Cortex XSIAM
Research PaperNew research from IDC reveals the tangible business value of rigorous, practitioner-led training from SANS: faster threat detection and response, reduced operational risk, stronger team cohesion, and millions in annual cost savings.
- 29 Jul 2025
- Dave Shackleford
Do AI Coding Assistants Make Bad Coders Worse? A Security Evaluation of GitHub Copilot
Research PaperThis paper examines whether the overall security posture of a project affects the quality of the code produced by Copilot.
- 11 Jul 2025
Dropzone AI Can Make Internal SOC Teams More Effective
Research PaperIn this paper, SANS Certified Instructor Mark Jeanmougin examines how Dropzone AI can integrate into existing security stacks and help SOC teams stay focused on high-impact decisions.
- 17 Jun 2025
- Mark Jeanmougin
AI Hunting with the Cybereason Platform: A SANS Review
Research PaperSANS reviewed Cybereason's AI hunting platform, which offers a lightweight, behavior-focused model...
- 23 Jul 2018
- Dave Shackleford
Applying Machine Learning Techniques to Measure Critical Security Controls
Research PaperImplementing and measuring Critical Security Controls (CSC) requires analyzing all data types...
- 6 Sep 2016
