Condor helper scripts

Thanks to the Condor team for producing a great piece of software and licensing it freely. All scripts on this page are distributed under the GNU General Public License.


qsub

Submit jobs with qsub (enter it without options to see a very brief usage summary). It is always a good idea to try starting your program outside of the queue system first -- often I'll forget a quote or get a path mixed up or something & it will crash immediately. That said, once you're satisfied that your program runs, it's very easy to hand it off to the queue (as well as the 500 others like it but with different parameter values):

qsub './my_program -i some_input -o some_output > redirect_your_stdout 2> and_stderr'

The quotes are only essential if your command contains something that you don't want to operate on the submission program itself (e.g., above I wanted my_program's standard output to be the redirected, not qsub's standard output).

If you do not redirect your stdout / stderr AND your program generates output, then the script will create a file in the directory from which you submitted called condor_out.XX.YY or condor_err.XX.YY where XX is the condor "cluster" (the job id) and YY is the condor "process" (a sub-job-id; more info below). Cluster & process ids can be found from within your running program by examining the environment variables $CONDOR_CLUSTER and $CONDOR_PROCESS.

If you are submitting from a desktop computer, you might want to be notified when your job completes. Modern Gnome-based linux distributions use the Desktop Notifications Specification and the libnotify library to present notifications sent and displayed by the same computer. In our situation, we want to display notifications on the submission computer that are sent from the execution computer. I wrote a small daemon to accomplish this task but haven't gotten around to putting it on the web yet (email me if you want it; note this daemon is entirely optional; qsub runs perfectly well without notifications).

Options to qsub (put these before your command):

-dry-run
Run directly without submitting to condor. This is useful to make sure you have your command straight.
-priority <>
This only effects priority within *YOUR* jobs. But if you submit 100 jobs and then suddenly realize you have a few higher priority ones, its nice not to have to delete the first 100 to reshuffle the order in the queue. Higher numbers are better (range -20 to +20; default is 0).
-n <>
Number of replicates to run of your *exact* command. Generally useful in conjunction with the $CONDOR_PROCESS environment variable (e.g. redirect output to my_output.$CONDOR_PROCESS and then potentially cat it all back together when all the jobs finish)
-requirements <>
ClassAd job requirements to specify a certain machine, certain amount of memory, etc. Read the condor documentation if you need this.
-cpus <>
Request multiple CPUs from a single SMP machine (defaults to 1). Note that the condor needs to be configured such that slots exist with at least this many cpus (potentially by using the relatively new configuration option of "partitionable" slots).
-env X=Y
Set environmental variable X to Y when running job.
-force
If you're submitting a job from a path that doesn't look to be network-mounted, then qsub will complain and not submit your job. But if you know what you're doing, then go ahead and force it to submit.
-transfer <>
Enable file transfers (so job can run on desktop machines that do not share the same filesystem as the computer from which your are submitting) and transfer the specified file. All files (including executables) explicitly mentioned in the command line will automatically be added to the file transfer list, assuming they are 1) in the current directory, 2) have the full path listed in the command line, OR 3) found in your $PATH, outside of system folders.
-exclude <>
Exclude file mentioned in command line that would otherwise have been captured by the automatic routine mentioned above.
-et
Enable transfers if you don't need to add any more files to be transferred (-tranfer implies -et).
-pre
(Note this option likely will not be useful for you; I used it together with a customized condor configuration for a peculiar circumstance in which preemption was disabled on most machines.) Enable preemption & saving of partial output when transferring. Unfortunately, this forces your job to be transferred even if the execution node is on the same filesystem -- which means that you won't see the partial output appear unless you were to cancel the job. Preemption means that your job may be killed & restarted if suspended for an excessive amount of time (note any partial output files produced will be transferred to the new execution host).

qstat

Check the status of jobs in the queue or currently running. If there are lots of jobs, you might want to filter by your username (e.g. qstat philip) or even by an explicit job idea (e.g. qstat 5027).

See man condor_q for more information.


qrm

Remove jobs from the queue (stopping the jobs first if currently running). Type qrm without parameters for a brief usage summary.


qhist

List completed jobs in a format very similar to qstat. You probably want to either filter by username / job id or pipe into head (e.g. qhist philip | head would show my last few jobs that completed or were removed by qrm).


qresub

Resubmit job(s) already in the history (either completed or canceled). Can specify any constraint taken by condor_history -- job id, username, etc.
Documented by Philip Johnson, updated Nov 2012