False Positive Handling Mechanisms in GitLab

Philosophy: False Positives as a Feature, Not a Failure

Comprehensive security scanning inherently generates false positives. Security scanners cannot be tuned out of the box for each organization's unique needs. Flagging a potential issue for human review is vastly preferable to missing a real vulnerability. Intelligent prioritization is critical for effective security programs. Triaging security findings should filter out the noise to surface high-severity vulnerabilities with real exploitability, ensuring developers focus their attention on issues that pose actual risk to the organization.

Core Principles

  • False positives are inevitable and expected. Any security scanner sophisticated enough to catch complex vulnerabilities will occasionally misinterpret safe code patterns. The alternative—scanners that never flag false positives—would be dangerously unreliable at detecting actual threats.
  • The goal is efficient triage, not zero false positives. Success isn't measured by eliminating all false positives, but by establishing workflows that allow your team to quickly identify and dismiss them without impeding development velocity. A well-tuned security program accepts some noise in exchange for comprehensive coverage.
  • Dismissing false positives is security-positive. Proper false positive management improves your security posture by reducing alert fatigue, helping teams focus on genuine risks, building institutional knowledge about your codebase's security characteristics, and creating a more sustainable security culture where developers trust the tools.
  • Balance coverage with velocity. The security-development balance is dynamic. Heavy-handed scanning without false positive management frustrates developers and leads to workarounds. No scanning provides velocity but unacceptable risk. The sweet spot is comprehensive scanning paired with effective false positive handling.

Expectations: The Tuning Journey

When implementing GitLab security scanning, whether your first scanner or adding new analyzer types, expect an initial tuning period. This is not a sign of misconfiguration; it's an essential phase where scanners are configured to establish a baseline for your codebases.

Timeline and Effort

  • Week 1: Initial Scan Shock. Your first comprehensive scan will likely generate hundreds or thousands of findings. This is normal. Many will be legitimate issues that have existed undetected; some will be false positives related to your specific frameworks, architectural patterns, or testing approaches. Expect this phase to be time-intensive.
  • Weeks 2-4: Active Tuning. Your team identifies patterns in false positives, implements path exclusions for test directories and vendor code, creates custom rulesets for framework-specific patterns, and bulk-dismisses categories of false positives. Security and development teams collaborate to understand findings.
  • Months 2-3: Stabilization. New findings decrease to a manageable daily volume. Your ruleset covers most recurring false positive patterns. The team develops intuition for what's real vs. false. Scan results become actionable rather than overwhelming.
  • Steady State: Continuous Refinement. New false positive patterns emerge occasionally (new frameworks, coding patterns, dependencies). Tuning becomes lightweight, incremental maintenance. Most scans produce few or zero false positives requiring attention.

Team Investment

This is a collaborative effort requiring security engineering to own scanner configuration and tuning, developers to provide context on code patterns and architectural decisions, and security champions to bridge communication and build institutional knowledge.

Measuring Success: Metrics & KPIs

Track these metrics to validate your tuning efforts and identify areas needing attention:

  • Signal-to-Noise Ratio: Calculate the percentage of findings that are actionable vulnerabilities vs. false positives. A healthy mature program typically sees 70-85% signal (real issues) after initial tuning. Track this monthly to see improvement trends.
  • Time-to-Triage: Measure average time from finding detection to dismissal or remediation decision. Target under 10 minutes per finding for most cases. If this creeps up, it signals either complex new vulnerability types or inadequate tuning (same false positives recurring).
  • Dismissal Rate by Category: Monitor which dismissal reasons you use most frequently. High "False Positive" rates in specific categories indicate opportunities for custom rulesets or path exclusions. "Used in Tests" clustering suggests test path exclusions needed. "Acceptable Risk" trends may warrant policy discussions.
  • Developer Satisfaction: Conduct quarterly pulse checks with development teams. Ask: "Do security scans help you write better code?" and "Do you trust the findings you see?" Declining trust indicates alert fatigue from poor false positive management.
  • Recurring False Positives: Track findings dismissed multiple times (same vulnerability signature, different branches or time periods). High recurrence is your strongest signal to promote a dismissal to a custom ruleset or path exclusion—you're manually doing what automation should handle.

Baseline Metrics

Capture these at the start

  • Total findings on initial scan
  • Breakdown by scanner type and severity
  • Percentage in test vs. production code
  • Time spent on first week of triage

1. GitLab Duo AI-Assisted Triage

Mechanism: AI agent analyzes findings and recommends actions

Capabilities

  • Batch analysis of multiple findings
  • Code flow analysis for exploitability
  • Suppression rule generation
  • Pattern recognition from historical data

AI-Assisted Triage Characteristics

  • Scope: Analysis tool, not direct suppression
  • Persistence: Recommendations must be implemented
  • Recurrence Prevention: Indirect - helps create other suppressions
  • Maintenance: Low - AI handles analysis
  • Risk: Low - human still makes final decision

Best Use Cases for AI-Assisted Triage

  • High-volume triage scenarios
  • Complex exploitability analysis
  • Learning/training on FP patterns
  • Accelerating initial assessment

2. Path-Based Exclusions

Mechanism: Exclude entire directories or file patterns from scanning

Path Exclusion Characteristics

  • Scope: Broad - removes entire paths from analysis
  • Persistence: Permanent (config-as-code in repo)
  • Recurrence Prevention: Excellent - findings never generated
  • Maintenance: Low - set once, applies to all future scans
  • Risk: Medium - could miss real issues if paths change purpose

Best Use Cases for Path Exclusions

  • Test directories, fixtures, mock data
  • Vendor/third-party code you don't control
  • Documentation or example code
  • Build artifacts or generated code

3. Analyzer-Level Tuning

Mechanism: Enable/disable entire scanning engines or analyzers

Analyzer Tuning Characteristics

  • Scope: Very broad - entire analyzer disabled
  • Persistence: Pipeline-level configuration
  • Recurrence Prevention: Complete for that analyzer
  • Maintenance: Low
  • Risk: Medium - multiple analyzers may provide additional coverage

Best Use Cases for Analyzer Tuning

  • Redundant analyzers (if multiple tools scan same thing)
  • Performance optimization in large monorepos
  • Deprecated analyzers after upgrading

4. Custom Rulesets (Rule-Level Suppression)

Mechanism: Override, disable, or modify specific detection rules

Granularity Levels

  • Disable: Turn off rule completely
  • Override: Change severity, description, or metadata
  • Replace: Rewrite rule logic entirely (for specific analyzers)

Custom Ruleset Characteristics

  • Scope: Targeted - specific vulnerability patterns
  • Persistence: Version-controlled in repository
  • Recurrence Prevention: Excellent - rule-based matching
  • Maintenance: Medium - requires understanding rule IDs
  • Risk: Low to Medium - depends on rule specificity

Best Use Cases for Custom Rulesets

  • Organization-wide coding standards that conflict with default rules
  • Architectural patterns that scanners misinterpret
  • Framework-specific protections scanners don't recognize
  • Adjusting severity based on compensating controls

5. Vulnerability Dismissal (UI-Based)

Mechanism: Mark individual or bulk findings as dismissed in GitLab Security Dashboard

Dismissal Types

  • False Positive
  • Used in Tests
  • Won't Fix
  • Acceptable Risk

Bulk Triaging Capabilities

  • Filter findings by: severity, scanner type, file path, CVE/CWE, status
  • Select multiple findings matching filter criteria
  • Apply dismissal reason to all selected findings simultaneously
  • Add bulk comment/justification across selections

Vulnerability Dismissal Characteristics

  • Scope: Individual finding OR multiple findings via filtering
  • Persistence: Database-stored, project-scoped
  • Recurrence Prevention: Good if finding signature matches exactly
  • Maintenance: Medium with bulk operations (High for individual)
  • Risk: Low - most granular control

Propagation Behavior

  • Dismissals apply to matching findings across branches
  • May not persist through major code refactors
  • Requires re-dismissal if file path changes significantly

Best Use Cases for Vulnerability Dismissal

  • One-off false positives
  • Patterns affecting multiple files (bulk dismiss by vulnerability type)
  • Initial backlog cleanup (filter to test files, bulk dismiss)
  • Findings requiring case-by-case judgment
  • Temporary risk acceptance with planned remediation
  • Learning phase before creating broader rules

Strategic Layering: The Pyramid Approach

Think of these mechanisms in layers:

         ┌─────────────────────────┐
         │   UI Dismissals         │  ← Most granular, highest maintenance
         ├─────────────────────────┤
         │   Custom Rulesets       │  ← Specific patterns
         ├─────────────────────────┤
         │   Path Exclusions       │  ← Broader scope
         ├─────────────────────────┤
         │   Analyzer Tuning       │  ← Broadest scope, lowest maintenance
         └─────────────────────────┘

Principle: Start broad (bottom), add specificity as needed (move up)


Combination Strategies

Most Effective Combinations:

  1. Path Exclusions + Custom Rulesets:

    • Exclude test folders, then fine-tune rules for production code
  2. Custom Rulesets + UI Dismissals:

    • Broad rule changes for patterns, individual dismissals for edge cases
  3. Policy Exceptions + Vulnerability Dismissals:

    • Enterprise policies for standards, dismissals for unique cases
  4. AI Triage + Any Mechanism:

    • AI identifies patterns, human implements appropriate suppression layer

Decision Matrix: Which Mechanism to Use

Ask these questions:

  1. How many findings match this pattern?

    • One: UI dismissal
    • Same vulnerability across files: Bulk UI dismissal via filtering
    • Many with code pattern (20+): Custom ruleset or path exclusion
    • Entire directories: Path exclusion or analyzer tuning
  2. How confident are you it's a false positive?

    • Very confident: Permanent config (paths, rules)
    • Somewhat confident: UI dismissal with review date
    • Need analysis: AI-assisted triage first
  3. Will this pattern recur in future code?

    • Yes: Custom ruleset or path exclusion
    • No: UI dismissal
    • Unknown: Start with dismissal, promote to rule if recurs
  4. Who needs to approve this suppression?

    • Just you: UI dismissal
    • Security team: Custom ruleset in repo
    • Organization-wide: Policy exception
  5. Is this specific to one project or company-wide?

    • One project: Project-level config
    • Multiple projects: Group-level policy or shared ruleset
    • Company standard: Organization policy

A note on preventing false positives in Container Images

Traditional base images include hundreds of unused packages, binaries, shells and package managers, each a great source of false positive noise. So called "Distroless" images (e.g., Chainguard) contain only runtime dependencies, eliminating scanner noise from unused system packages. See this Component Reduction Paper for details on reducing attack surface through minimal base images.

Continuous Improvement: Collaborating with GitLab

Your false positive patterns and tuning experiences help improve GitLab's security scanners for everyone. When you encounter persistent false positive patterns, consider sharing feedback with GitLab through your account team or support channels.

High-Value Feedback Includes

  • The Scanner & Checker Details: Which specific analyzer and rule generated the finding? The more specific, the better.
  • What Defect Was It Looking For: What vulnerability class or weakness was the scanner trying to detect? (e.g., SQL injection, XSS, insecure deserialization, known CVE). This helps contextualize whether the rule is fundamentally misaligned or just overly broad.
  • Code Pattern & Context: What does the actual code look like that triggered the false positive? Ideally sanitized/anonymized snippets showing the pattern. Include context like: is this using a specific framework (Rails, Django, Spring)? Are there framework-level protections the scanner didn't recognize? Is there an architectural pattern (ORM, prepared statements, input sanitization) that makes this safe?
  • Cross-Scanner Correlation: Do other scanners flag the same code? If SAST says it's vulnerable but DAST doesn't find it exploitable, that's valuable context.
  • Why It's Safe: Your security team's analysis of why this isn't exploitable. Example: "This SQL query uses parameterized statements via ActiveRecord, which prevents injection regardless of tainted input."
  • Impact & Frequency: How often does this pattern appear in your codebase? Is this a one-off or affecting dozens/hundreds of files? Does it block critical pipelines or just add noise?

Sharing This Feedback Helps GitLab

  • Improve scanner accuracy and reduce false positive rates for all customers
  • Better recognize framework-specific safety mechanisms
  • Tune default rulesets to balance sensitivity and specificity
  • Prioritize integration improvements with popular frameworks
  • Enhance documentation for common false positive patterns

Real-world scenarios that involve tuning scanners in production environments is invaluable data that helps evolve the product. The patterns customers see, especially recurring false positives that require custom rules, are exactly what GitLab needs to continuously refine the scanning engines.