gaudi_spawn
A simple launcher script for distributed training on HPUs.
Single node:
>>> python gaudi_spawn.py --world_size=NUM_CARDS_YOU_HAVE --use_mpi
YOUR_TRAINING_SCRIPT.py (--arg1 --arg2 --arg3 and all other
arguments of your training script)
Multi node:
>>> python gaudi_spawn.py --hostfile=PATH_TO_HOSTFILE --use_deepspeed
YOUR_TRAINING_SCRIPT.py (--arg1 --arg2 --arg3 and all other
arguments of your training script)
Functions
Helper function parsing the command line options. |