Here's how much energy your next ChatGPT query will use.
JetSpec is an implementation of causal parallel tree drafting for fast LLM speculative decoding inference with up to 10x acceptance length, and 1000+ TPS on coding and math tasks using B200 GPUs. A ...