Tensor Parallelelism
Tensor parallelism is a technique used to distribute a large model across multiple GPUs. For instance, during the multiplication of input tensors with the first weight tensor, the process involves splitting the weight tensor column-wise, multiplying each column separately with the input, and then concatenating the resulting outputs. These outputs are transferred from the GPUs and combined to produce the final result, as illustrated below.
Related Articles
No items found.