EchoLM is an in-context caching system that improves large language model (LLM) serving efficiency by leveraging semantically similar past requests as examples to guide response generation, resulting in significant throughput gains and latency reduction without compromising quality.
Jan 22, 2025
CHASE-SQL is a novel framework that improves Text-to-SQL performance by using multiple LLM agents for diverse SQL candidate generation—employing divide-and-conquer, chain-of-thought reasoning, and instance-aware synthetic examples—and a fine-tuned selection agent to rank these candidates, achieving state-of-the-art accuracy on the BIRD benchmark.
Oct 2, 2024