This page is read only. You can view the source, but not change it. Ask your administrator if you think this is wrong.
$$$$$$$$$$$$ ====== Array jobs for clusters running SGE ======$$$$$$$$$$$$ $$$$$$$$$$$$ $$$$$$$$$$$$ $$$$$$$$$$$$ $$$$$$$$$$$$ Say that we want to run the exact same script several times, but with different parameters each time. The naive solution would be calling qsub N times, but this is impractical. Instead, array jobs are the solution.$$$$$$$$$$$$ $$$$$$$$$$$$ $$$$$$$$$$$$ $$$$$$$$$$$$ Array jobs are created with the -t parameter (either in the qsub call or in the script header). You must specify a range from 1 to N. The variable SGE_TASK_ID will indicate the i-th call of the array job as in the example below:$$$$$$$$$$$$ $$$$$$$$$$$$ $$$$$$$$$$$$ $$$$$$$$$$$$ $$$$$$$$$$$$ $$$$$$$$$$$$ $$$$$$$$$$$$
$$$$$$$$$$$$ $$$$$$$$$$$$ #!/bin/sh$$$$$$$$$$$$ #$ -t 1-10000$$$$$$$$$$$$ SEEDFILE=~/data/seeds$$$$$$$$$$$$ SEED=$(cat $SEEDFILE | head -n $SGE_TASK_ID | tail -n 1)$$$$$$$$$$$$ ~/programs/simulation -s $SEED -o ~/results/output.$SGE_TASK_ID$$$$$$$$$$$$ $$$$$$$$$$$$
$$$$$$$$$$$$ $$$$$$$$$$$$ $$$$$$$$$$$$ === What if you number files from 0 instead of 1? ===$$$$$$$$$$$$ $$$$$$$$$$$$ $$$$$$$$$$$$ $$$$$$$$$$$$ $$$$$$$$$$$$ The '-t' option will not accept 0 as part of the range, i.e. #$ -t 0-99 is invalid, and will generate an error. However, you can label the input files from 0 to n−1. That’s easy to deal with:$$$$$$$$$$$$ $$$$$$$$$$$$ $$$$$$$$$$$$
$$$$$$$$$$$$ $$$$$$$$$$$$ #!/bin/sh$$$$$$$$$$$$ # Tell the SGE that this is an array job, with "tasks" to be numbered 1 to 10000$$$$$$$$$$$$ #$ -t 1-10000$$$$$$$$$$$$ i=$(expr $SGE_TASK_ID - 1)$$$$$$$$$$$$ if [ ! -e ~/results/output.$i ]$$$$$$$$$$$$ then$$$$$$$$$$$$ ~/programs/program -i ~/data/input.$i -o ~/results/output.$i$$$$$$$$$$$$ fi $$$$$$$$$$$$
$$$$$$$$$$$$ === Limiting the number of concurrent array jobs ===$$$$$$$$$$$$ $$$$$$$$$$$$ $$$$$$$$$$$$ If we don't want to run all array jobs simultaneously, we can add the parameter -tc MAX_JOBS. This will allow only a maximum of MAX_JOBS to be running in the cluster at the same time.$$$$$$$$$$$$ $$$$$$$$$$$$ $$$$$$$$$$$$ $$$$$$$$$$$$ $$$$$$$$$$$$ $$$$$$$$$$$$ $$$$$$$$$$$$ $$$$$$$$$$$$ $$$$$$$$$$$$ $$$$$$$$$$$$ $$$$$$$$$$$$ $$$$$$$$$$$$ $$$$$$$$$$$$