Andes: Defining and Enhancing Quality-of-Experience in LLM-Based Text Streaming Services

Liu, Jiachen; Wu, Zhiyu; Chung, Jae-Won; Lai, Fan; Lee, Myungjin; Chowdhury, Mosharaf

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2404.16283 (cs)

[Submitted on 25 Apr 2024]

Title:Andes: Defining and Enhancing Quality-of-Experience in LLM-Based Text Streaming Services

Authors:Jiachen Liu, Zhiyu Wu, Jae-Won Chung, Fan Lai, Myungjin Lee, Mosharaf Chowdhury

View PDF HTML (experimental)

Abstract:The advent of large language models (LLMs) has transformed text-based services, enabling capabilities ranging from real-time translation to AI-driven chatbots. However, existing serving systems primarily focus on optimizing server-side aggregate metrics like token generation throughput, ignoring individual user experience with streamed text. As a result, under high and/or bursty load, a significant number of users can receive unfavorable service quality or poor Quality-of-Experience (QoE). In this paper, we first formally define QoE of text streaming services, where text is delivered incrementally and interactively to users, by considering the end-to-end token delivery process throughout the entire interaction with the user. Thereafter, we propose Andes, a QoE-aware serving system that enhances user experience for LLM-enabled text streaming services. At its core, Andes strategically allocates contended GPU resources among multiple requests over time to optimize their QoE. Our evaluations demonstrate that, compared to the state-of-the-art LLM serving systems like vLLM, Andes improves the average QoE by up to 3.2$\times$ under high request rate, or alternatively, it attains up to 1.6$\times$ higher request rate while preserving high QoE.

Comments:	16 pages, 22 figures
Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Cite as:	arXiv:2404.16283 [cs.DC]
	(or arXiv:2404.16283v1 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2404.16283

Submission history

From: Jiachen Liu [view email]
[v1] Thu, 25 Apr 2024 01:56:00 UTC (1,498 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Andes: Defining and Enhancing Quality-of-Experience in LLM-Based Text Streaming Services

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Andes: Defining and Enhancing Quality-of-Experience in LLM-Based Text Streaming Services

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators