0. Preliminary: Where Did All the Memory Go?

Model States: Optimizer States, Gradients and Parameters

Residual Memory Consumption

1. What is Deepspeed?

2. ZeRO