User Tools

Site Tools


array_jobs

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
array_jobs [2014/02/16 20:17]
rdiazgar
array_jobs [2014/02/18 12:04] (current)
rdiazgar old revision restored (2014/02/17 16:31)
Line 1: Line 1:
-$$$$$$$$$$$$ +====== Array Jobs ====== 
-====== Array jobs for clusters running SGE ======$$$$$$$$$$$$ + 
-$$$$$$$$$$$$ +Array jobs are essentially a mecanism for executing the very same script several times. ​Say that, for instance, you need to run a certain job script ​times (you want to apply a certain action to an image)You would typically call the same script ​N times and change just one parameter (the image index i)In array jobs, you can specify ​the index range you want to execute and SGE will take care of the restThere are plenty of advantages: 
-$$$$$$$$$$$$ + 
-$$$$$$$$$$$$ +    - Simplification of the job script 
-$$$$$$$$$$$$ +    - Better queue management in the head node of the cluster 
-Say that we want to run the exact same script ​several ​times, but with different parameters each timeThe naive solution ​would be calling qsub N times, but this is impracticalInstead, ​array jobs are the solution.$$$$$$$$$$$$ +    - Better job organization:​ a unique JOBID is created ​for the array job and a separate TASKID is added 
-$$$$$$$$$$$$ + 
-$$$$$$$$$$$$ +In order to execute an array job, simply add the following to a qsub call or script header
-$$$$$$$$$$$$ + 
-Array jobs are created ​with the -t parameter (either in the qsub call or in the script header)You must specify ​a range from 1 to N. The variable SGE_TASK_ID ​will indicate ​the i-th call of the array job as in the example ​below:$$$$$$$$$$$$ +<​code>​qsub -t 1-N ... 
-$$$$$$$$$$$$ +</​code>​ 
-$$$$$$$$$$$$ + 
-$$$$$$$$$$$$ +Where 1-N is the range you want to cover (note that a range 0-is invalid). 
-$$$$$$$$$$$$ + 
-$$$$$$$$$$$$ +In the script side, we will control ​the i-th call of our script using the variable SGE_TASK_ID. Take this as an example: 
-$$$$$$$$$$$$ + 
-<​code>​$$$$$$$$$$$$ +<​code>​ 
-$$$$$$$$$$$$ +#!/bin/bash 
-#!/bin/sh$$$$$$$$$$$$ +
-#$ -t 1-10000$$$$$$$$$$$$ +# MatchDist.sh 
-SEEDFILE=~/data/seeds$$$$$$$$$$$+Create a script for distributedly match a list of key files 
-SEED=$(cat $SEEDFILE | head -n $SGE_TASK_ID | tail -n 1)$$$$$$$$$$$$ + 
-~/programs/simulation -s $SEED -o ~/results/​output.$SGE_TASK_ID$$$$$$$$$$$+export PATH=~/software/bundler_sfm/​bin:​$PATH 
-$$$$$$$$$$$$ +export LD_LIBRARY_PATH=~/software/bundler_sfm/bin:$LD_LIBRARY_PATH 
-</​code>​$$$$$$$$$$$+ 
-$$$$$$$$$$$$ +WORKDIR=$4 
-$$$$$$$$$$$$ +LIST=$1 
-=== What if you number files from 0 instead of 1? ===$$$$$$$$$$$$ +TMPDIR=$5 
-$$$$$$$$$$$$ +OUTDIR=$6 
-$$$$$$$$$$$$ +mkdir -p $TMPDIR 
-$$$$$$$$$$$$ +mkdir -p $OUTDIR 
-$$$$$$$$$$$$ +RATIO=$3 
-The '-t' option will not accept 0 as part of the range, i.e.  #$ -t 0-99 is invalid, and will generate an error. However, you can label the input files from 0 to n−1. That’s easy to deal with:​$$$$$$$$$$$$ +I=$((${SGE_TASK_ID}-1)) 
-$$$$$$$$$$$$ +OUT=$(echo $(printf "​d" ​$I)_${2}) 
-$$$$$$$$$$$$ + 
-<​code>​$$$$$$$$$$$+cd $WORKDIR 
-$$$$$$$$$$$$ + 
-#!/bin/sh$$$$$$$$$$$$ +echo "​KeyMatchSingle ​$LIST $OUTDIR/$OUT $RATIO $I
-# Tell the SGE that this is an array job, with "tasks" to be numbered 1 to 10000$$$$$$$$$$$$ +KeyMatchSingle ​$LIST $TMPDIR/$OUT $RATIO $I 
-#$ -t 1-10000$$$$$$$$$$$$ +cp -$TMPDIR/$OUT $OUTDIT/$OUT 
-i=$(expr $SGE_TASK_ID ​1)$$$$$$$$$$$$ +</code> 
-if [ ! -e ~/results/​output.$i ]$$$$$$$$$$$$ + 
-then$$$$$$$$$$$$ +In the example above, the SGE_TASK_ID variable is used as the i-th call of our array job
-~/​programs/​program -i ~/​data/​input.$i -o ~/​results/​output.$i$$$$$$$$$$$$ + 
-fi $$$$$$$$$$$$ +==== Preventing too many tasks to be run simultaneously ==== 
-</​code>​$$$$$$$$$$$$ + 
-=== Limiting the number of concurrent array jobs ===$$$$$$$$$$$$ +If we know that our jobs will demand too many resources and might stall a node, we can prevent this by limiting the number of concurrent tasks for that specific job. Just add the following ​parameter ​in the qsub call or in the script header: 
-$$$$$$$$$$$$ + 
-$$$$$$$$$$$$ +<​code>​ 
-If we don't want to run all array jobs simultaneously, we can add the parameter -tc MAX_JOBS. This will allow only a maximum of MAX_JOBS ​to be running in the cluster at the same time.$$$$$$$$$$$$ +qsub -t 1-N -tc NMAX ... 
-$$$$$$$$$$$$ +</​code>​ 
-$$$$$$$$$$$$ + 
-$$$$$$$$$$$$ +This will allow at most NMAX tasks to be executed simultaneously.
- ​$$$$$$$$$$$$ +
-$$$$$$$$$$$$ +
-$$$$$$$$$$$$ +
-$$$$$$$$$$$$ +
-$$$$$$$$$$$$ +
-$$$$$$$$$$$$ +
- ​$$$$$$$$$$$$ +
-$$$$$$$$$$$$ +
-$$$$$$$$$$$$+
  
array_jobs.1392610654.txt.gz · Last modified: 2014/02/16 20:17 (external edit)