Building faster inference through latency benchmarks, runtime validation, and GPU-aware systems
CUDA • C++ • PyTorch • Model Serving • Latency Benchmarking • Runtime Systems
CI validation gate that benchmarks baseline vs candidate inference runs and blocks meaningful p95 latency regressions before release.
Lock-free SPSC ring buffer with tuned wait strategies and benchmarked throughput/latency performance.
Real-time drone simulation with deterministic physics and stable control systems.
Real-time profiling tool for identifying runtime bottlenecks and performance regressions.
GPU shader-based reveal system optimized for real-time rendering.
A first-person psychological horror game built with dynamic rule-based gameplay systems.
A stylized stealth-horror prototype featuring AI-driven enemy behavior and player evasion mechanics.
A multiplayer capture-the-flag prototype with networked gameplay systems and team-based objectives.