| A | B | C | D | |
|---|---|---|---|---|
| 1 | Main | |||
| 2 | vLLM | PagedAttention | Efficient management of attention key and value memory | |
| 3 | continuous batching | |||
| 4 | tensor parallelism | |||
| 5 | DeepSpeed | |||
| 6 | model parallelism, inference-customized kernels, MoQ quantization | |||
| 7 | ||||
| 8 | TensorRT | |||
| 9 | ||||
| 10 | PowerInfer | |||
| 11 | ||||
| 12 | ||||
| 13 | Accelerate | https://huggingface.co/docs/accelerate/index |