
In this blog post, we introduce the concepts behind next-generation inference capabilities, including disaggregated serving, intelligent request scheduling, and expert parallelism. We discuss their benefits and walk through how you can implement them on Amazo…
We thank Greg Pereira and Robert Shaw from the llm-d team for their support in bringing llm-d to AWS.
In the agentic and reasoning era, large language models (LLMs) generate 10x more tokens and comp… [+22036 chars]



