AI Infrastructure / Validation Project

Inference Runtime Regression Gate

A lightweight CI validation system for inference-style workloads. It benchmarks baseline and candidate runs, tracks p50/p95/p99 latency and throughput, and blocks meaningful p95 latency regressions before release.

Key Features

Compares baseline and candidate inference-style benchmark runs
Tracks p50, p95, and p99 end-to-end latency
Measures throughput in requests per second
Runs in GitHub Actions for push, PR, and manual validation
Includes controlled failure detection for reproducible regression testing

Regression Policy

Compares candidate p95 latency against baseline p95 latency
Fails when candidate p95 regresses by more than 10 percent
Requires at least 1.0 ms absolute regression to avoid runner noise
Separates normal pass validation from controlled failure detection
Uses JSON configuration so thresholds can be tuned without changing code

Latest GitHub Actions Results

Default gate passed with 0.290 ms absolute p95 change
Manual pass gate passed with 0.021 ms absolute p95 change
Controlled regression demo detected a 1.259 ms p95 regression
All workflows completed successfully with expected validation behavior

Regression Gate Output

=== Default Regression Gate ===

Workflow: push / pull request validation
Stage: end_to_end_ms
Metric: p95_ms

Baseline throughput:   233.24 req/s
Candidate throughput:  219.38 req/s

Baseline p50:          4.260 ms
Candidate p50:         4.537 ms

Baseline p95:          4.357 ms
Candidate p95:         4.648 ms
Change:                6.66%
Absolute change:       0.290 ms
Percent threshold:     10.00%
Minimum floor:         1.000 ms
Status:                PASS

=== Manual Pass Gate ===
Workflow: manual passing-path validation

Baseline p95:          4.362 ms
Candidate p95:         4.384 ms
Change:                0.49%
Absolute change:       0.021 ms
Percent threshold:     10.00%
Minimum floor:         1.000 ms
Status:                PASS

=== Controlled Regression Detection ===
Workflow: manual failure-path validation

Baseline throughput:   231.46 req/s
Candidate throughput:  157.55 req/s

Baseline p95:          5.257 ms
Candidate p95:         6.516 ms
Change:                23.95%
Absolute change:       1.259 ms
Percent threshold:     10.00%
Minimum floor:         1.000 ms
Checker status:        FAIL
Workflow result:       PASS

Controlled regression was detected as expected.

Highlights

Built baseline vs candidate benchmark flow for inference-style runtime validation
Measured p50, p95, p99 latency and throughput for each candidate run
Implemented configurable regression thresholds using percent and absolute latency floors
Added controlled regression detection to prove the gate catches meaningful p95 regressions
Integrated validation into GitHub Actions for push, PR, and manual workflow checks
Added smoke tests to validate benchmark result structure before trusting outputs
Documented passing and failure-path behavior with reproducible benchmark results

Tech

Python PyTorch GitHub Actions CI/CD JSON Config Latency Benchmarking

Links

View Source Code View Benchmark Results

Performance Validation

Compare candidate runs against baseline runs
Track p95 latency as the primary regression signal
Use absolute floors to reduce CI runner noise
Block meaningful latency regressions before release

What I Learned

Performance gates need noise tolerance
Percent thresholds alone can be too sensitive
Controlled failures make CI behavior easier to verify
Functional correctness and performance validation solve different problems

Tech Stack

Python benchmark and comparison scripts
PyTorch synthetic inference-style workload
GitHub Actions workflow validation
JSON-driven regression threshold configuration