ABCD
1Main
2vLLMPagedAttentionEfficient management of attention key and value memory
3continuous batching
4tensor parallelism
5DeepSpeed
6model parallelism, inference-customized kernels, MoQ quantization
7
8TensorRT
9
10PowerInfer
11
12
13Acceleratehttps://huggingface.co/docs/accelerate/index