NFO
_.--.__.-'""'-.__.--.__.-'""'-.__.--.__.-'""'-.__.--.__.-'""'-._
-.-. .... .- .--. - . .-. - .-- . -. - -.-- --- -. .
________ ________ ________ ________ ________ ________ ________
/ / / / / / / /
/ / / / / / /_ _/ / / / /
/ __/ / / __// // __/ __/
\_______/_\___/____/\___/____/\______/___\______/_\_______/ \___/___/
/ / / / / / / / / /
/_ _/ / / / /_ _/ /
/ // / __/ // / \__ /
\______/_\________/\_______/_\__/_____/ \______/ \_____/
/ / / /
/ / / / / /
/ / / __/ -- C H A P T E R T W E N T Y O N E --
\________/\__/_____/\_______/
"'--'""'-.__.-'""'--'""'-.__.-'""'--'""'-.__.-'""'--'""'-.__.-'"
O'Reilly Media, Inc
Chi Wang
Hands-On LLM Serving And Optimization
N/A
2026
_.--.__.-'""'-.__.--.__.-'""'-.__.--.__.-'""'-.__.--.__.-'""'-._
Publisher > O'Reilly Media, Inc
Author > Chi Wang
Title > Hands-On LLM Serving And Optimization
Issue > N/A
Year > 2026
_.-"'-'""'-.__.-'""'--'""'-.__.-'""'--'""'-.__.-'"-._
ISBN > 9798341621497
Pages > 589
Genre > Prompt-Engineering
Language > English
_.-"'-'""'-.__.-'""'--'""'-.__.-'""'--'""'-.__.-'"-._
Type > RETAIL
Media > Book
Format > .EPUB
Section > eBook
_.-"'-'""'-.__.-'""'--'""'-.__.-'""'--'""'-.__.-'"-._
Published > 2026-05
Release > 2026-05-30
Disks > 03*5.0MB
"'--'""'-.__.-'""'--'""'-.__.-'""'--'""'-.__.-'""'--'""'-.__.-'"
https://bookshop.org/p/books/hands-on-llm-serving-and-optimization-hosting-llms-
at-scale-chi-wang/a4028296825459a7
Large language models (LLMs) are the reasoning engines of modern AI. Today, a
major inflection point has arrived: as the world races to deploy AI at scale,
model inference has moved to the center of the stack. Welcome to the inference
era. Without proper optimization, however, LLMs can be expensive and slow to
serve. Hands-On LLM Serving and Optimization is a comprehensive guide to the
complexities of deploying and optimizing LLMs at scale. In this hands-on,
engineering-focused book, authors Chi Wang and Peiheng Hu combine practical
examples, code, and strategies for building robust, performant, and cost-
efficient AI token factories. Whether youre building the LLM inference
infrastructure or the applications that consume it, a deep understanding of LLM
serving will make you a more effective, future-ready engineer as AI transforms
how we work and build. Learn the foundations of model serving with core
concepts, design paradigms, and industry best practices Understand the common
challenges of hosting LLMs at scale Balance latency and throughput to meet the
demands of AI applications and business requirements Host LLMs cost-effectively
with practical, code-backed techniques
_.--.__.-'""'-.__.--.__.-'""'-.__.--.__.-'""'-.__.--.__.-'""'-._
- --- - - -- [ C H A P T E R T W E N T Y O N E ] -- - - --- -
"'--'""'-.__.-'""'--'""'-.__.-'""'--'""'-.__.-'""'--'""'-.__.-'"
-.-. .... .- .--. - . .-. - .-- . -. - -.-- --- -. .
CRC32: 87de3e82851e46192b3003e94322425b72d6c4f7