LLM

EchoLM: Accelerating LLM Serving with Real-time Knowledge Distillation

EchoLM is an in-context caching system that improves large language model (LLM) serving efficiency by leveraging semantically similar past requests as examples to guide response generation, resulting in significant throughput gains and latency reduction without compromising quality.

Jan 22, 2025

CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL

CHASE-SQL is a novel framework that improves Text-to-SQL performance by using multiple LLM agents for diverse SQL candidate generation—employing divide-and-conquer, chain-of-thought reasoning, and instance-aware synthetic examples—and a fine-tuned selection agent to rank these candidates, achieving state-of-the-art accuracy on the BIRD benchmark.

Oct 2, 2024