Mitchell Stern et al. 2018 introduced the prototype concept of speculative decoding. This method has since been further developed and refined by various approaches, including Lookahead Decoding, REST, Medusa and EAGLE, significantly accelerating the inference process of large language models (LLMs).
One might wonder: will speculative decoding in LLMs harm