ShipSquad

What is Tensor Parallelism?

AI Infrastructure

Last updated:

Splitting individual model layers across multiple GPUs to serve models that exceed single-GPU memory.

Tensor parallelism distributes weight matrices across GPUs so each processes a portion of every layer simultaneously. Combined with pipeline parallelism, it enables serving models with hundreds of billions of parameters across multi-GPU nodes.

Related Terms

Further Reading

Ready to assemble your AI squad?

10 specialized AI agents. One mission. $99/mo + your Claude subscription.

Start Your Mission