User Tools

Site Tools


array_jobs

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
array_jobs [2014/02/16 20:20]
rdiazgar
array_jobs [2014/02/18 12:04] (current)
rdiazgar old revision restored (2014/02/17 16:31)
Line 1: Line 1:
-$$$$$$$$$$$$ +====== Array Jobs ====== 
-====== Array jobs for clusters running SGE ======$$$$$$$$$$$$ + 
-$$$$$$$$$$$$ +Array jobs are essentially a mecanism for executing ​the very same script several timesSay thatfor instance, you need to run a certain job script N times (you want to apply a certain action ​to an image). You would typically call the same script N times and change just one parameter (the image index i)In array jobs, you can specify ​the index range you want to execute and SGE will take care of the restThere are plenty of advantages:​ 
-== The problem ==$$$$$$$$$$$$ + 
-$$$$$$$$$$$$ +    - Simplification of the job script 
-$$$$$$$$$$$$ +    - Better queue management in the head node of the cluster 
-$$$$$$$$$$$$ +    - Better job organization:​ a unique JOBID is created for the array job and a separate TASKID is added 
-$$$$$$$$$$$$ + 
-A common problem is that you have a large number of jobs to run, and they are largely identical in terms of the command to runFor exampleyou may have 1000 data setsand you want to run a single program on them, using the cluster. The naive solution is to somehow generate 1000 shell scripts, and submit them to the queueThis is not efficientneither for you nor for the head node.$$$$$$$$$$$$ +In order to execute ​an array job, simply add the following to qsub call or a script header: 
-$$$$$$$$$$$$ + 
-$$$$$$$$$$$$ +<​code>​qsub -t 1-N ... 
-$$$$$$$$$$$$ +</​code>​ 
-$$$$$$$$$$$$ + 
-== Array jobs are the solution ==$$$$$$$$$$$$ +Where 1-N is the range you want to cover (note that range 0-N is invalid). 
-$$$$$$$$$$$$ + 
-$$$$$$$$$$$$ +In the script sidewe will control ​the i-th call of our script using the variable ​SGE_TASK_ID. Take this as an example: 
-$$$$$$$$$$$$ + 
-$$$$$$$$$$$$ +<​code>​ 
-There is an alternative on SGE systems – array jobs. The advantages ​are:$$$$$$$$$$$$ +#!/bin/bash 
-$$$$$$$$$$$$ +
-$$$$$$$$$$$$ +# MatchDist.sh 
-    - You only have to write one shell script$$$$$$$$$$$$ +Create a script for distributedly match a list of key files 
-    - You don’t have to worry about deleting thousands ​of shell scripts, etc.$$$$$$$$$$$$ + 
-    - If you submit an array joband realize you’ve made mistake, you only have one job id to qdel, instead of figuring out how to remove 100s of them.$$$$$$$$$$$$ +export PATH=~/​software/​bundler_sfm/​bin:​$PATH 
-    - You put less of a burden on the head node.$$$$$$$$$$$$ +export LD_LIBRARY_PATH=~/software/bundler_sfm/​bin:​$LD_LIBRARY_PATH 
-$$$$$$$$$$$$ + 
-$$$$$$$$$$$$ +WORKDIR=$
-In fact, there are no disadvantages that I’m aware of. Submitting ​an array job to do 1000 computations is entirely equivalent to submitting 1000 separate scriptsbut much less work for you.$$$$$$$$$$$$ +LIST=$
-$$$$$$$$$$$$ +TMPDIR=$
-$$$$$$$$$$$$ +OUTDIR=$
-$$$$$$$$$$$$ +mkdir -p $TMPDIR 
-$$$$$$$$$$$$ +mkdir -p $OUTDIR 
-$$$$$$$$$$$$ +RATIO=$3 
- ​$$$$$$$$$$$$ +I=$((${SGE_TASK_ID}-1)
-$$$$$$$$$$$$ +OUT=$(echo $(printf "​d" ​$I)_${2}) 
-$$$$$$$$$$$$ + 
-=== Pulling data from the i-th line of file ===$$$$$$$$$$$$ +cd $WORKDIR 
-$$$$$$$$$$$$ + 
-$$$$$$$$$$$$ +echo "​KeyMatchSingle ​$LIST $OUTDIR/$OUT $RATIO $I" 
-$$$$$$$$$$$$ +KeyMatchSingle ​$LIST $TMPDIR/$OUT $RATIO $I 
-$$$$$$$$$$$$ +cp -f $TMPDIR/$OUT $OUTDIT/$OUT 
-Let’s say you have list of numbers in a file, one number per lineFor example, ​the numbers could be random number seeds for a simulation. For each task in an array jobyou want to get the ''​i''​ <sup>th</​sup>​ line from the file, where ''​i'' ​ equals ​SGE_TASK_ID, and use that value as the seed. This is accomplished by using the Unix head and tail commands. (Read the man pages for those commands – don’t ask me.)$$$$$$$$$$$$ +</​code>​ 
-$$$$$$$$$$$$ + 
-$$$$$$$$$$$$ +In the example above, the SGE_TASK_ID variable is used as the i-th call of our array job. 
-$$$$$$$$$$$$ + 
-<​code>​$$$$$$$$$$$$ +==== Preventing too many tasks to be run simultaneously ==== 
-$$$$$$$$$$$$ + 
-#!/bin/sh$$$$$$$$$$$$ +If we know that our jobs will demand too many resources ​and might stall a nodewe can prevent this by limiting the number of concurrent tasks for that specific jobJust add the following parameter in the qsub call or in the script header
-#$ -t 1-10000$$$$$$$$$$$$ + 
-SEEDFILE=~/data/seeds$$$$$$$$$$$$ +<​code>​ 
-SEED=$(cat $SEEDFILE | head -n $SGE_TASK_ID ​| tail -1)$$$$$$$$$$$$ +qsub -t 1--tc NMAX ... 
-~/programs/​simulation -s $SEED -o ~/​results/​output.$SGE_TASK_ID$$$$$$$$$$$$ +</​code>​ 
-$$$$$$$$$$$$ + 
-</​code>​$$$$$$$$$$$$ +This will allow at most NMAX tasks to be executed simultaneously.
-$$$$$$$$$$$$ +
-$$$$$$$$$$$$ +
-=== What if you number files from 0 instead of 1? ===$$$$$$$$$$$$ +
-$$$$$$$$$$$$ +
-$$$$$$$$$$$$ +
-$$$$$$$$$$$$ +
-$$$$$$$$$$$$ +
-The '​-t'​ option ​will not accept 0 as part of the range, i.e.  #$ -t 0-99 is invalid, ​and will generate an error. HoweverI often label my input files from 0 to n−1That’s easy to deal with:$$$$$$$$$$$$ +
-$$$$$$$$$$$$ +
-$$$$$$$$$$$$ +
-<​code>​$$$$$$$$$$$$ +
-$$$$$$$$$$$$ +
-#​!/​bin/​sh$$$$$$$$$$$$ +
-# Tell the SGE that this is an array job, with "​tasks"​ to be numbered 1 to 10000$$$$$$$$$$$$ +
-#$ -t 1-10000$$$$$$$$$$$$ +
-i=$(expr $SGE_TASK_ID ​1)$$$$$$$$$$$$ +
-if [ ! -e ~/​results/​output.$i ]$$$$$$$$$$$$ +
-then$$$$$$$$$$$$ +
-~/​programs/​program -i ~/​data/​input.$i -o ~/​results/​output.$i$$$$$$$$$$$$ +
-fi$$$$$$$$$$$$ +
-$$$$$$$$$$$$ +
-$$$$$$$$$$$$ +
-</​code>​$$$$$$$$$$$$ +
-$$$$$$$$$$$$ +
- $$$$$$$$$$$$ +
-$$$$$$$$$$$$ +
-$$$$$$$$$$$$ +
-$$$$$$$$$$$$ +
-$$$$$$$$$$$$ +
-$$$$$$$$$$$$ +
- ​$$$$$$$$$$$$ +
-$$$$$$$$$$$$ +
-$$$$$$$$$$$$+
  
array_jobs.1392610815.txt.gz · Last modified: 2014/02/16 20:20 by rdiazgar