Torch-Velocity
Jan 2025
An implementation of speculative decoding with adaptive lookahead mechanisms for LLM inference optimization, achieving 1.5-2.5x speedups on transformer-based models.
Quantitative Researcher
Jan 2025
An implementation of speculative decoding with adaptive lookahead mechanisms for LLM inference optimization, achieving 1.5-2.5x speedups on transformer-based models.