AI Rule Artifact File Disclosure Scanner
This scanner detects the use of AI Development Tools File Disclosure in digital assets. AI rule and instruction files such as CLAUDE.md, .cursorrules, and AGENTS.md may be publicly accessible on web servers, exposing sensitive system architecture, prompt instructions, and internal development conventions. Identifying and securing these files is critical to preventing unintended information disclosure.
Short Info
Level
Single Scan
Single Scan
Can be used by
Asset Owner
Estimated Time
10 seconds
Time Interval
13 days 10 hours
Scan only one
Domain, IPv4, Subdomain
Toolbox
AI development tools such as Claude, Cursor, GitHub Copilot, Gemini, Windsurf, and Continue are widely adopted by software development teams to accelerate coding workflows and automate repetitive engineering tasks. These tools rely on instruction and rule files stored within project directories to define assistant behavior, coding conventions, and system-level context. Files such as CLAUDE.md, .cursorrules, AGENTS.md, and .github/copilot-instructions.md are examples of these configuration artifacts. They are typically authored by developers or engineering leads and are intended for local or version-controlled use within trusted environments. As projects grow and are deployed to web servers, these files can inadvertently be placed in publicly accessible directories. The widespread adoption of AI-assisted development has significantly increased the prevalence of such files across codebases worldwide.
AI rule artifact files are plain-text configuration documents designed to guide the behavior of AI coding assistants during development. When these files are unintentionally exposed on publicly accessible web servers, they constitute a file disclosure vulnerability that leaks sensitive internal information. The contents of these files often describe the technical architecture of the underlying system, coding standards, security requirements, and even infrastructure details. An attacker who retrieves such files gains a detailed understanding of how the target application is structured and maintained. This information can be leveraged to craft more targeted attacks, identify weak points in the application, or bypass security controls that are documented within the instructions. The vulnerability requires no authentication and can be exploited with a simple HTTP GET request to a known file path.
The scanner issues HTTP GET requests to a set of well-known AI rule artifact file paths relative to the target asset root, including paths such as /CLAUDE.md, /.cursorrules, /AGENTS.md, /.github/copilot-instructions.md, /.cursor/rules/index.mdc, and others. A baseline probe is first sent to a randomly generated non-existent path to detect servers that return HTTP 200 for all requests regardless of content. Each candidate file response is then evaluated against a series of false-positive filters: the HTTP status must be 200, the Content-Type must not be a blocked type such as text/html or application/json, the response body must contain at least 50 characters, the body must not begin with HTML document tags, and the response body must differ sufficiently from the baseline probe response using a similarity ratio threshold of 85 percent. Only responses that pass all filters are flagged as confirmed exposures. The scan covers 18 distinct AI rule artifact paths associated with tools including Claude, Cursor, Gemini, GitHub Copilot, Windsurf, Continue, Cline, and Junie.
Successful exploitation of this vulnerability allows an attacker to read AI rule artifact files that were intended to remain private within the development environment. The exposed content may include detailed descriptions of the application architecture, database schemas, authentication mechanisms, and security policies documented as instructions to the AI assistant. Attackers can use this intelligence to identify exploitable components, understand the technology stack in depth, and prepare more precise injection or bypass attacks against the application. In cases where the instruction files contain API keys, tokens, or credential patterns used as examples, direct credential compromise may also be possible. The exposure of internal conventions and naming patterns may also aid attackers in enumerating additional sensitive endpoints or files. Organizations using AI-assisted development workflows are particularly at risk as the practice of documenting system context for AI tools becomes more common.