ExpectationMax / simple_gpu_scheduler, Hacker News

simple_gpu_scheduler

A simple scheduler to run your commands on individual GPUs. Following theKISS principle, this script simply accepts commands viastdinand executes them on a specific GPU by setting theCUDA_VISIBLE_DEVICESvariable.

The commands read are executed using the login shell, thus redirections>pipes|and all other kinds of shell magic can be used.

Installation

The package can simply be installed frompypi

$ pip install simple_gpu_scheduler

Simple Example

Suppose you have a filegpu_commands.txtwith commands that you would like to execute on the GPUs 0, 1 and 2 in parallel:

$ cat gpu_commands.txt python train_model.py --lr 0 001 --output run_1 python train_model.py --lr 0 0005 --output run_2 python train_model.py --lr 0.0 001 --output run_3

Then you can do so by simply piping the command into thesimple_gpu_schedulerscript

$ simple_gpu_scheduler --gpus 0 1 2gpu_commands.txt Processingcommand`python train_model.py --lr 0. 001 --output run_1``on gpu 2 Processingcommand`python train_model.py --lr 0. 0005 --output run_2`on GPU 1 Processingcommand`python train_model.py --lr 0.0 001 --output run_3`on GPU 0

For further details seesimple_gpu_scheduler -h.

Hyperparameter search

In order to allow user friendly utilization of the scheduler in the common scenario of hyperparameter search, a convenience scriptsimple_hypersearchis included in the package. The output can directly be piped intosimple_gpu_scheduleror appended to the “queue file” (seeSimple scheduler for jobs).

Grid of all possible parameter configurations in random order:

simple_hypersearch""python3 train_dnn.py --lr {lr} --batch_size {bs}"-p lr 0. 001 0.  (0.0)  -p BS 32 64 128|simple_gpu_scheduler --gpus 0,1,2

5 uniformly sampled parameter configurations:

simple_hypersearch""python3 train_dnn.py --lr {lr} --batch_size {bs}"--n -samples 5 -p lr 0. 001 0.  (0.0)  -p BS 32 64 128|simple_gpu_scheduler --gpus 0,1,2

For further information see thesimple_hypersearch -h.

Simple scheduler for jobs

Combined with some basic command line tools, one can set up a very basic scheduler which waits for new jobs to be “submitted” and executes them in order of submission.

Setup and start scheduler in background or in a separate permanent session (using for exampletmux):

touch gpu.queue tail -f -n 0 gpu.queue|simple_gpu_scheduler --gpus 0,1,2

the commandtail -f -n 0follows the end of the gpu.queue file. Thus if there was anything written intogpu.queueprior to the execution of the command it will not be passed tosimple_gpu_scheduler.

Then submitting commands boils down to appending text to thegpu.queuefile:

echo""my_command_with | and stuff>logfile"">>gpu.queue

TODO

Multi line jobs (evtl. we would then need a submission script after all)
Stop, but let commands finish when receiving a defined signal
Tests would be nice, until now the project is still (very small) but if it grows tests should be added