Skip to content

Conversation

elskow
Copy link
Contributor

@elskow elskow commented Jul 1, 2025

Pull Request Description for NGINX Ingress SSL Crisis Detection

Details

solves #96
/claim #96

Reproducible test setup (Maintainers invited): nginx-ingress-ssl-crisis
A link to a working CRE in the CRE playground: CRE Playground Link

Video Demonstration

Full test execution
showing SSL certificate crisis simulation and CRE rule validation:

Screen.Recording.2025-07-01.at.16.37.33.mov

What This Detects

This rule identifies critical NGINX Ingress Controller SSL certificate failure patterns that cause complete service unavailability. The detection focuses on:

  • SSL Certificate Verification Failures: Expired, invalid, or untrusted certificates
  • TLS Handshake Failures: Protocol incompatibility and cipher suite mismatches
  • Certificate Chain Issues: Missing intermediate certificates and broken trust chains
  • Upstream SSL Connection Errors: Backend SSL verification failures
  • HTTP Error Cascade: Immediate 502/503 responses following SSL failures

Crisis Simulation

The test reproduces an authentic SSL certificate crisis scenario:

  • NGINX Ingress Controller with SSL-enabled backends
  • Certificate expiration and validation failure simulation
  • SSL handshake failure cascade detection
  • CRE rule validation confirming crisis detection

Commands for Sample Data

The test.log file associated with CRE-2025-0120 was generated using the nginx-ingress-ssl-crisis/run-test.sh script (Maintainers invited until bounty closed). This script automates the setup of the NGINX Ingress environment, SSL certificate crisis simulation, and collection of crisis log patterns.

The core process executed by the script to produce the crisis patterns in test.log:

  1. Start NGINX Ingress with SSL configuration:

    docker-compose up -d
  2. Simulate SSL certificate crisis:

    # Expire SSL certificates
    ./scripts/expire-certificates.sh
    # Generate invalid certificate chain
    ./scripts/break-certificate-chain.sh
  3. Generate traffic to trigger SSL failures:

    for i in {1..20}; do curl -k https://localhost:8443/api/health || true; sleep 1; done
  4. Extract crisis patterns from NGINX logs:

    docker logs nginx-ingress > nginx-ssl-crisis.log 2>&1
    python3 extract_ssl_patterns.py

Sample Crisis Evidence

The test generates authentic SSL crisis patterns showing certificate validation failure cascade:

2025/01/20 14:25:30 [error] 456#456: *1 SSL certificate verify failed (certificate has expired) while SSL handshaking to upstream, client: 192.168.1.100, server: api.example.com
192.168.1.100 - - [20/Jan/2025:14:25:30 +0000] "GET /api/health HTTP/1.1" 502 497 "-" "curl/7.68.0"
2025/01/20 14:25:31 [error] 457#457: *2 SSL handshake failed (SSL: error:14090086:SSL routines:ssl3_get_server_certificate:certificate verify failed) while connecting to upstream
192.168.1.100 - - [20/Jan/2025:14:25:31 +0000] "GET /api/users HTTP/1.1" 503 497 "-" "Mozilla/5.0"

Rule Characteristics

  • Severity: 0 (Critical)
  • Detection Window: Real-time SSL failure detection
  • Impact Score: 10/10 (complete HTTPS service unavailability)
  • False Positive Risk: Low (requires specific SSL failure sequence)
  • Pattern Coverage: Comprehensive SSL certificate crisis detection with consolidated regex
  • Tags Added: ssl-certificate, tls-handshake, certificate-verification, service-unavailability

This implementation addresses the critical need for early detection of SSL certificate crises in NGINX Ingress Controller environments, preventing prolonged service outages and security exposure.

LB: resolves #98

Add new rule CRE-2025-0120 to detect critical SSL certificate failures in NGINX Ingress Controllers
@Lyndon-prequel
Copy link
Contributor

@elskow To appropriately associate this PR with the issue, you must use one of the github approved linking words. You used "solves", which is close - but unfortunately not recognized. I went ahead and added the appropriate tag "resolves".

As a result, we didn't catch this submission in the review cycle. Since it was submitted on time, we will consider it for all award categories. Please give us until next wednesday to get back to you.

Copy link
Contributor

@tonymeehan tonymeehan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the PR to change any use of the word "crisis" to "failure" or "problem"

@tonymeehan
Copy link
Contributor

The rule yaml looks identical to https://github.com/prequel-dev/cre/pull/102/files. Mistake in the PR?

@elskow
Copy link
Contributor Author

elskow commented Jul 15, 2025

The rule yaml looks identical to https://github.com/prequel-dev/cre/pull/102/files. Mistake in the PR?

yup. I forgot to change branch while working on it. Do i have to close then opening a new pr?

@tonymeehan
Copy link
Contributor

yup. I forgot to change branch while working on it. Do i have to close then opening a new pr?

All good. You can just rebase and update this PR.

@elskow elskow requested a review from tonymeehan July 22, 2025 07:19
@tonymeehan tonymeehan merged commit 34ee3d0 into prequel-dev:main Jul 23, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Multiple Winners] aws-vpc-cni: Reproduce A High-Severity Failure & Write a Detection Rule [Submit by July 6 11:59 pm ET]
3 participants