To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis Paper • 2305.13230 • Published May 22, 2023
Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline Paper • 2305.13144 • Published May 22, 2023 • 1