condor_ submit is the program for submitting jobs for execution under Condor. condor_ submit requires a submit description file which contains commands to direct the queuing of jobs. One submit description file may contain specifications for the queuing of many Condor jobs at once. All jobs queued by a single invocation of condor_ submit must share the same executable, and are referred to as a job cluster. It is advantageous to submit multiple jobs as a single cluster because:
Note that submission of jobs from a Windows machine requires
a stashed password to allow Condor to impersonate the user submitting
the job.
To stash a password, use the condor_ store_cred command.
See the manual page at
page
for details.
SUBMIT DESCRIPTION FILE COMMANDS
Each submit description file describes one cluster of jobs to be placed in the Condor execution pool. All jobs in a cluster must share the same executable, but they may have different input and output files, and different program arguments. The submit description file is the only command-line argument to condor_ submit.
The submit description file must contain one executable command and at least one queue command. All of the other commands have default actions.
The commands which can appear in the submit description file are:
If you define should_transfer_files you must also
define when_to_transfer_output (described below).
For more information about this and other settings related to
transferring files, see section 2.5.4 on
page
.
Setting when_to_transfer_output equal to ON_EXIT will cause Condor to transfer the job's output files back to the submitting machine only when the job completes (exits on its own).
The ON_EXIT_OR_EVICT option is intended for fault tolerant jobs which periodically save their own state and can restart where they left off. In this case, files are transfered to the submit machine any time the job leaves a remote site, either because it exited on its own, or was evicted by the Condor system for any reason prior to job completion. Any output files transferred back to the submit machine are automatically sent back out again as input files if the job restarts.
For more information about this and other settings related to
transferring files, see section 2.5.4 on
page
.
Only the transfer of files is available; the transfer of subdirectories is not supported.
For more information about this and other settings related to
transferring files, see section 2.5.4 on
page
.
For more information about this and other settings related to
transferring files, see section 2.5.4 on
page
.
requirements = Memory >= 64 && Mips > 45
Only one requirements command may be present in a
submit description file.
By default, condor_ submit appends the following clauses to
the requirements expression:
. Also, see the Condor Users
Manual for complete information on the syntax and available attributes
that can be used in the ClassAd expression.
requirements = Memory > 60
rank = Memory
asks Condor to find all available machines with more than 60 megabytes of memory
and give the job the one with the most amount of memory. See the Condor Users
Manual for complete information on the syntax and available attributes
that can be used in the ClassAd expression.
For example: Suppose you have a job that occasionally segfaults but you know if you run it again on the same data, chances are it will finish successfully. This is how you would represent that with on_exit_remove(assuming the signal identifier for segmentation fault is 4):
on_exit_remove = (ExitBySignal == True) && (ExitSignal != 4)
The above expression will not let the job exit if it exited by a signal and that signal number was 4(representing segmentation fault). In any other case of the job exiting, it will leave the queue as it normally would have done.
If left unspecified, this will default to True.
periodic_* expressions take precedence over on_exit_* expressions, and *_hold expressions take precedence over a *_remove expressions.
This expression is available for the vanilla and java universes. It is additionally available, when submitted from a Unix machine, for the standard universe. Note that the schedd, by default, only checks these periodic expressions once every 300 seconds. The period of these evaluations can be adjusted by setting the PERIODIC_EXPR_INTERVAL configuration macro.
For example: Suppose a job is known to run for a minimum of an hour. If the job exits after less than an hour, the job should be placed on hold and an e-mail notification sent, instead of being allowed to leave the queue.
on_exit_hold = (CurrentTime - JobStartDate) > (60 * $(MINUTE))
This expression places the job on hold if it exits for any reason before running for an hour. An e-mail will be sent to the user explaining that the job was placed on hold because this expression became True.
periodic_* expressions take precedence over on_exit_* expressions, and *_hold expressions take precedence over a *_remove expressions.
If left unspecified, this will default to False.
This expression is available for the vanilla and java universes. It is additionally available, when submitted from a Unix machine, for the standard universe.
See the Examples section for an example of a periodic_* expression.
periodic_* expressions take precedence over on_exit_* expressions, and *_hold expressions take precedence over a *_remove expressions. So, the periodic_remove expression takes precedent over the on_exit_remove expression, if the two describe conflicting actions.
This expression is available for the vanilla and java universes. It is additionally available, when submitted from a Unix machine, for the standard universe. Note that the schedd, by default, only checks periodic expressions once every 300 seconds. The period of these evaluations can be adjusted by setting the PERIODIC_EXPR_INTERVAL configuration macro.
See the Examples section for an example of a periodic_* expression.
periodic_* expressions take precedence over on_exit_* expressions, and *_hold expressions take precedence over a *_remove expressions.
This expression is available for the vanilla and java universes. It is additionally available, when submitted from a Unix machine, for the standard universe. Note that the schedd, by default, only checks periodic expressions once every 300 seconds. The period of these evaluations can be adjusted by setting the PERIODIC_EXPR_INTERVAL configuration macro.
This expression is available for the vanilla and java universes. It is additionally available, when submitted from a Unix machine, for the standard universe. Note that the condor_ schedd daemon, by default, only checks periodic expressions once every 300 seconds. The period of these evaluations can be adjusted by setting the PERIODIC_EXPR_INTERVAL configuration macro.
.
job-owner@UID_DOMAIN
where UID_DOMAIN is specified by the Condor site administrator. If
UID_DOMAIN has not been specified, Condor will send the email
to :
job-owner@submit-machine-name
<parameter>=<value>
Multiple environment variables can be specified by separating them with a
semicolon (`` ; '') when submitting from a Unix platform.
Multiple environment variables can be specified by separating them with a
vertical bar (`` | '') when submitting from an NT platform.
These environment variables will be placed (as given) into the
job's environment before execution. The length of all characters
specified in the environment is currently limited to 10240 characters.
Note that spaces are accepted, but rarely desired,
characters within parameter names and values.
Place spaces within the parameter list only if required.
For the MPI universe, a single value (max) is required. It is neither a maximum or minimum, but the number of machines to be dedicated toward running the job.
SIGTSTP which tells the Condor libraries to initiate a checkpoint
of the process. For jobs submitted to the Vanilla Universe, the default
is SIGTERM which is the standard way to terminate a program in UNIX.
If your job attempts to access any of the files mentioned in this list, Condor will automatically compress them (if writing) or decompress them (if reading). The compress format is the same as used by GNU gzip.
The files given in this list may be simple file names or complete paths and may include * as a wildcard. For example, this list causes the file /tmp/data.gz, any file named event.gz, and any file ending in .gzip to be automatically compressed or decompressed as needed:
compress_files = /tmp/data.gz, event.gz, *.gzip
Due to the nature of the compression format, compressed files must only be accessed sequentially. Random access reading is allowed but is very slow, while random access writing is simply not possible. This restriction may be avoided by using both compress_files and fetch_files at the same time. When this is done, a file is kept in the decompressed state at the execution machine, but is compressed for transfer to its original location.
This option only applies to standard-universe jobs.
If your job attempts to access a file mentioned in this list, Condor will automatically copy the whole file to the executing machine, where it can be accessed quickly. When your job closes the file, it will be copied back to its original location. This list uses the same syntax as compress_files, shown above.
This option only applies to standard-universe jobs.
If your job attempts to access a file mentioned in this list, Condor will force all writes to that file to be appended to the end. Furthermore, condor_submit will not truncate it. This list uses the same syntax as compress_files, shown above.
This option may yield some surprising results. If several jobs attempt to write to the same file, their output may be intermixed. If a job is evicted from one or more machines during the course of its lifetime, such an output file might contain several copies of the results. This option should be only be used when you wish a certain file to be treated as a running log instead of a precise result.
This option only applies to standard-universe jobs.
If your job attempts to access a file mentioned in this list, Condor will cause it to be read or written at the execution machine. This is most useful for temporary files not used for input or output. This list uses the same syntax as compress_files, shown above.
local_files = /tmp/*
This option only applies to standard-universe jobs.
Directs Condor to use a new file name in place of an old one. name
describes a file name that your job may attempt to open, and newname
describes the file name it should be replaced with.
newname may include an optional leading
access specifier, local: or remote:. If left unspecified,
the default access specifier is remote:. Multiple remaps can be
specified by separating each with a semicolon.
This option only applies to standard universe jobs.
If you wish to remap file names that contain equals signs or semicolons, these special characters may be escaped with a backslash.
file_remaps = "dataset.1=other.dataset"
file_remaps = "very.big = local:/bigdisk/bigfile"
file_remaps = "very.big = local:/bigdisk/bigfile ; dataset.1 = other.dataset"
These options only apply to standard-universe jobs.
If needed, you may set the buffer controls individually for each file using the buffer_files option. For example, to set the buffer size to 1 Mbyte and the block size to 256 KBytes for the file input.data, use this command:
buffer_files = "input.data=(1000000,256000)"
Alternatively, you may use these two options to set the default sizes for all files used by your job:
buffer_size = 1000000 buffer_block_size = 256000
If you do not set these, Condor will use the values given by these two configuration file macros:
DEFAULT_IO_BUFFER_SIZE = 1000000 DEFAULT_IO_BUFFER_BLOCK_SIZE = 256000
Finally, if no other settings are present, Condor will use a buffer of 512 Kbytes and a block size of 32 Kbytes.
GlobusScheduler = lego.bu.edu/jobmanager-lsf queue
LastMatchName0 = "most-recent-Name" LastMatchName1 = "next-most-recent-Name"
The value for each introduced ClassAd is given by the
value of the Name attribute
from the machine ClassAd of a previous execution (match).
As a job is matched, the definitions for these attributes
will roll,
with LastMatchName1 becoming LastMatchName2,
LastMatchName0 becoming LastMatchName1,
and LastMatchName0 being set by the most recent
value of the Name attribute.
An intended use of these job attributes is in the requirements expression. The requirements can allow a job to prefer a match with either the same or a different resource than a previous match.
#! /bin/sh
# get the host name of the machine
$host=`uname -n`
# grab a standard universe executable designed specifically
# for this host
scp elsewhere@cs.wisc.edu:${host} executable
# The PID MUST stay the same, so exec the new standard universe process.
exec executable ${1+"$@"}
If this command is not present (defined), then the value
defaults to false.
In addition to commands, the submit description file can contain macros and comments:
<macro_name> = <string>
Three pre-defined macros are supplied by the submit description file parser.
The third of the pre-defined macros is only relevant to MPI universe
jobs.
The
$(Cluster) macro supplies the number of the job cluster, and the
$(Process) macro supplies the number of the job. These macros are
intended to aid in the specification of input/output files, arguments,
etc., for clusters with lots of jobs, and/or could be used to supply a
Condor process with its own cluster and process numbers on the command
line. The $(Process) macro should not be used for PVM jobs.
The
$(Node) macro is defined only for MPI universe jobs.
It is a unique value assigned for the duration of the job
that essentially identifies the machine on which a program is
executing.
If the dollar sign (``$'') is desired as a literal character, then use
$(DOLLAR)
In addition to the normal macro, there is also a special kind of macro called a substitution macro that allows you to substitute expressions defined on the resource machine itself (gotten after a match to the machine has been performed) into specific expressions in your submit description file. The special substitution macro is of the form:
$$(attribute)
The substitution macro may only be used in three expressions in the submit description file: executable, environment, and arguments. The most common use of this macro is for heterogeneous submission of an executable:
executable = povray.$$(opsys).$$(arch)The opsys and arch attributes will be substituted at match time for any given resource. This will allow Condor to automatically choose the correct executable for the matched machine.
An extension to the syntax of the substitution macro provides an alternative string to use if the machine attribute within the substitution macro is undefined. The syntax appears as:
$$(attribute:string_if_attribute_undefined)
An example using this extended syntax provides a path name to a required input file. Since the file can be placed in different locations on different machines, the file's path name is given as an argument to the program.
argument = $$(input_file_path:/usr/foo)On the machine, if the attribute input_file_path is not defined, then the path /usr/foo is used instead.
The environment macro, $ENV, allows the evaluation of an environment variable to be used in setting a submit description file command. The syntax used is
$ENV(variable)An example submit description file command that uses this functionality evaluates the submitter's home directory in order to set the path and file name of a log file:
log = $ENV(HOME)/jobs/logfileThe environment variable is evaluated when the submit description file is processed.
The $RANDOM_CHOICE macro allows a random choice to be made from a given list of parameters at submission time. For an expression, if some randomness needs to be generated, the macro may appear as
$RANDOM_CHOICE(0,1,2,3,4,5,6)
When evaluated, one of the parameters values will be chosen.
condor_ submit will exit with a status value of 0 (zero) upon success, and a non-zero value upon failure.
####################
#
# submit description file
# Example 1: queuing multiple jobs with differing
# command line arguments and output files.
#
####################
Executable = foo
Universe = standard
Arguments = 15 2000
Output = foo.out1
Error = foo.err1
Queue
Arguments = 30 2000
Output = foo.out2
Error = foo.err2
Queue
Arguments = 45 6000
Output = foo.out3
Error = foo.err3
Queue
####################
#
# Example 2: Show off some fancy features including
# use of pre-defined macros and logging.
#
####################
Executable = foo
Universe = standard
Requirements = Memory >= 32 && OpSys == "IRIX6" && Arch =="SGI"
Rank = Memory >= 64
Image_Size = 28 Meg
Error = err.$(Process)
Input = in.$(Process)
Output = out.$(Process)
Log = foo.log
Queue 150
condor_submit -a "log = out.log" -a "error = error.log" mysubmitfileNote that each of the added commands is contained within quote marks because there are space characters within the command.
Including the command
periodic_remove = CumulativeSuspensionTime >
((RemoteWallClockTime - CumulativeSuspensionTime) / 2.0)
in the submit description file causes this to happen.
+WantCheckpoint = False
in the submit description file before the queue command(s).
U.S. Government Rights Restrictions: Use, duplication, or disclosure by the U.S. Government is subject to restrictions as set forth in subparagraph (c)(1)(ii) of The Rights in Technical Data and Computer Software clause at DFARS 252.227-7013 or subparagraphs (c)(1) and (2) of Commercial Computer Software-Restricted Rights at 48 CFR 52.227-19, as applicable, Condor Team, Attention: Professor Miron Livny, 7367 Computer Sciences, 1210 W. Dayton St., Madison, WI 53706-1685, (608) 262-0856 or miron@cs.wisc.edu.
See the Condor Version 6.6.0 Manual for additional notices.