Next: 3.4 Contrib Module Installation
Up: 3. Administrators' Manual
Previous: 3.2 Installation
Contents
Index
Subsections
3.3 Configuration
This section describes how to configure all parts of the Condor
system. General information about the configuration
files and their syntax is followed by a description of
settings that affect all
Condor daemons and tools. At the end is a section describing the
settings for each part of Condor. The
settings that control the policy under which Condor will start,
suspend, resume, vacate or kill jobs
are described in
section 3.6 on Configuring Condor's Job
Execution Policy.
3.3.1 Introduction to Configuration Files
The Condor configuration files are used to customize how Condor
operates at a given site. The basic configuration as shipped with
Condor works well for most sites, with few exceptions.
See section 3.2 on
page
for details on where
Condor's configuration files are found.
Each Condor program will, as part of its initialization process,
configure itself by calling a library routine which parses the
various configuration files that might be used including pool-wide,
platform-specific, machine-specific, and root-owned configuration files.
The result is a list of constants and expressions which are
evaluated as needed at run time.
The order in which attributes are defined is important, since later
definitions will override existing definitions.
This is particularly important if configuration files are broken up
using the LOCAL_CONFIG_FILE setting described in
sections 3.3.2
and 3.10.2 below.
3.3.1.1 Config File Macros
Macro definitions are of the form:
<macro_name> = <macro_definition>
NOTE: You must have white space between the macro name, the
``='' sign, and the macro definition.
Macro invocations are of the form:
$(macro_name)
Macro definitions may contain references to other macros, even ones
that aren't yet defined (so long as they are eventually defined in
your config files somewhere).
All macro expansion is done after all config files have been parsed
(with the exception of macros that reference themselves, described
below).
A = xxx
C = $(A)
is a legal set of macro definitions, and the resulting value of
C is
xxx.
Note that
C is actually bound to
$(A), not its value.
As a further example,
A = xxx
C = $(A)
A = yyy
is also a legal set of macro definitions, and the resulting value of
C is yyy.
A macro may be incrementally defined by invoking itself in its
definition. For example,
A = xxx
B = $(A)
A = $(A)yyy
A = $(A)zzz
is a legal set of macro definitions, and the resulting value of
A
is xxxyyyzzz.
Note that invocations of a macro in
its own definition are immediately
expanded.
$(A) is immediately expanded in line 3 of the example.
If it were not, then the definition would be impossible to
evaluate.
NOTE: Macros should not be incrementally defined in the
LOCAL_ROOT_CONFIG_FILE for security reasons.
NOTE: Condor used to distinguish between ``macros'' and ``expressions''
in its config files.
Beginning with Condor version 6.1.13, this distinction has been
removed.
For backward compatibility, you can still use ``:'' instead of ``=''
in your config files, and these attributes will just be treated as
macros.
3.3.1.2 Special Configuration File Macros
References to the Condor process's environment are allowed in the
configuration file.
Environment references are of the form:
$ENV(environment_variable_name)
For example,
A = $ENV(HOME)
binds A to the value of the HOME environment variable.
Environment references are not currently used in standard Condor
configurations.
However, they can sometimes be useful in custom configurations.
This same syntax is used to allow a random choice of a parameter
within a configuration file.
These references are of the form:
$RANDOM_CHOICE(list of parameters)
This allows a random choice within the parameter list to be made
at configuration time. Of the list of parameters, one is
chosen when encountered during configuration. For example,
if one of the integers 0-8 (inclusive) should be randomly
chosen, the macro usage is
$RANDOM_CHOICE(0,1,2,3,4,5,6,7,8)
See section 7.2 on
page
for an actual use of this specialized macro.
3.3.1.3 Comments and Line Continuations
A Condor configuration file can also contain comments or
line continuations.
A comment is any line beginning with a ``#'' character.
A continuation is any entry that continues across multiples lines.
Line continuation is accomplished by placing the ``
''
character at the end of any line to be continued onto another.
Valid examples of line continuation are
START = (KeyboardIdle > 15 * $(MINUTE)) && \
((LoadAvg - CondorLoadAvg) <= 0.3)
and
ADMIN_MACHINES = condor.cs.wisc.edu, raven.cs.wisc.edu, \
stork.cs.wisc.edu, ostrich.cs.wisc.edu, \
bigbird.cs.wisc.edu
HOSTALLOW_ADMIN = $(ADMIN_MACHINES)
3.3.1.4 Pre-Defined Macros
Condor provides pre-defined macros that help configure Condor.
Pre-defined macros are listed as $(macro_name).
This first set are entries whose values are determined at
run time and cannot be overwritten. These are inserted automatically by
the library routine which parses the configuration files.
- $(FULL_HOSTNAME)
- The
fully qualified hostname of the local machine (hostname plus domain
name).
- $(HOSTNAME)
- The hostname of the local machine (no domain name).
- $(IP_ADDRESS)
- The ASCII string version of the local machine's IP address.
- $(TILDE)
- The full path to the
home directory of the Unix user condor, if such a user exists on the
local machine.
- $(SUBSYSTEM)
- The subsystem
name of the daemon or tool that is evaluating the macro.
This is a unique string which identifies a given daemon within the
Condor system. The possible subsystem names are:
- STARTD
- SCHEDD
- MASTER
- COLLECTOR
- NEGOTIATOR
- KBDD
- SHADOW
- STARTER
- CKPT_SERVER
- SUBMIT
- GRIDMANAGER
This second set of macros are entries whose default values are
determined automatically at runtime but which can be overwritten.
- $(ARCH)
- Defines the string
used to identify the architecture of the local machine to Condor.
The condor_ startd will advertise itself with this attribute so
that users can submit binaries compiled for a given platform and
force them to run on the correct machines. condor_ submit will
append a requirement to the job ClassAd that it must
run on the same ARCH and OPSYS of the machine where
it was submitted, unless the user specifies ARCH and/or
OPSYS explicitly in their submit file. See the
the condor_ submit manual page
on page
for details.
- $(OPSYS)
- Defines the
string used to identify the operating system of the local machine to
Condor.
If it is not defined in the configuration file, Condor will
automatically insert the operating system of this machine as
determined by uname.
- $(FILESYSTEM_DOMAIN)
- Defaults to the fully
qualified hostname of the machine it is evaluated on. See
section 3.3.5, Shared
File System Configuration File Entries for the full description of
its use and under what conditions you would want to change it.
- $(UID_DOMAIN)
- Defaults to the fully
qualified hostname of the machine it is evaluated on. See
section 3.3.5 on ``Shared
File System Configuration File Entries'' for the full description of
its use and under what conditions you would want to change it.
Since $(ARCH) and $(OPSYS) will automatically be set to the
correct values, we recommend that you do not overwrite them.
Only do so if you know what you are doing.
3.3.2 Condor-wide Configuration File Entries
This section describes settings which affect all parts of the Condor
system.
- CONDOR_HOST
- This macro is
used to define the $(NEGOTIATOR_HOST) and
$(COLLECTOR_HOST) macros. Normally the condor_ collector
and condor_ negotiator would run on the same machine. If for some
reason they were not run on the same machine,
$(CONDOR_HOST) would not be needed. Some
of the host-based security macros use $(CONDOR_HOST) by
default. See section 3.7.5, Setting up
IP/host-based security in Condor for details.
- COLLECTOR_HOST
- The
hostname of the machine where the condor_ collector is running for
your pool. Normally it is defined with the
$(CONDOR_HOST) macro described above.
- NEGOTIATOR_HOST
- The
hostname of the machine where the condor_ negotiator is running for
your pool. Normally it is defined with the
$(CONDOR_HOST) macro described above.
- CONDOR_VIEW_HOST
- The
hostname of the machine where the CondorView server is running.
This service is optional, and requires additional configuration if
you want to enable it.
See section 3.10.5 on
page
for more details.
- SCHEDD_HOST
- The
hostname of the machine where the condor_ schedd is running for
your pool.
- RELEASE_DIR
- The full path to
the Condor release directory, which holds the bin, etc,
lib, and sbin directories.
Other macros are defined relative to this one.
- BIN
- This directory points to the
Condor directory where user-level programs are installed. It
is usually defined relative to the $(RELEASE_DIR) macro.
- LIB
- This directory points to the
Condor directory where libraries used to link jobs for Condor's
standard universe are stored. The condor_ compile program uses
this macro to find these libraries, so it must be defined.
$(LIB) is usually defined relative to the
$(RELEASE_DIR) macro.
- SBIN
- This directory points to the
Condor directory where Condor's system binaries (such as the
binaries for the Condor daemons) and administrative tools are
installed. Whatever directory $(SBIN) points to ought
to be in the PATH of users acting as Condor
administrators.
- LOCAL_DIR
- The location of the
local Condor directory on each machine in your pool. One common
option is to use the condor user's home directory which may be
specified with $(TILDE). For example:
LOCAL_DIR = $(tilde)
On machines with a shared file system, where either the
$(TILDE) directory or another directory you want to use is
shared among all machines in your pool, you might use the
$(HOSTNAME) macro and have a directory with many
subdirectories, one for each machine in your pool, each named by
host names. For example:
LOCAL_DIR = $(tilde)/hosts/$(hostname)
or:
LOCAL_DIR = $(release_dir)/hosts/$(hostname)
- LOG
- Used to specify the
directory where each Condor daemon writes its log files. The names
of the log files themselves are defined with other macros, which use
the $(LOG) macro by default. The log directory also acts as
the current working directory of the Condor daemons as the run, so
if one of them should produce a core file for any reason, it would
be placed
in the directory defined by this macro. Normally, $(LOG) is
defined in terms of $(LOCAL_DIR).
- SPOOL
- The spool directory is where
certain files used by the condor_ schedd are stored, such as the
job queue file and the initial executables of any jobs that have
been submitted. In addition, for systems not using a checkpoint
server, all the checkpoint files from jobs that have been submitted
from a given machine will be store in that machine's spool
directory. Therefore, you will want to ensure that the spool
directory is located on a partition with enough disk space. If a
given machine is only set up to execute Condor jobs and not submit
them, it would not need a spool directory (or this macro defined).
Normally, $(SPOOL) is defined in terms of
$(LOCAL_DIR).
- EXECUTE
- This directory acts as
the current working directory of any Condor job that is executing on
the local machine. If a given machine is only set up to only submit
jobs and not execute them, it would not need an execute directory
(or this macro defined). Normally, $(EXECUTE) is defined
in terms of $(LOCAL_DIR).
- LOCAL_CONFIG_FILE
- The
location of the local, machine-specific configuration
file for each machine
in your pool. The two most common options would be putting this
file in the $(LOCAL_DIR), or putting all
local configuration files for your pool in a shared directory, each one
named by hostname. For example,
LOCAL_CONFIG_FILE = $(LOCAL_DIR)/condor_config.local
or,
LOCAL_CONFIG_FILE = $(release_dir)/etc/$(hostname).local
or, not using your release directory
LOCAL_CONFIG_FILE = /full/path/to/configs/$(hostname).local
Beginning with Condor version 6.0.1, the $(LOCAL_CONFIG_FILE)
is treated as a list of files, not a single file. You can use
either a comma or space separated list of files as its value. This
allows you to specify multiple files as the local configuration file
and each one will be processed in the order given (with parameters set in
later files overriding values from previous files). This allows
you to use one global configuration file for multiple platforms
in your pool,
define a platform-specific configuration file for each platform, and
use a local configuration file for each machine. For more
information on this, see section 3.10.2 about
Configuring Condor for Multiple Platforms on
page
.
- REQUIRE_LOCAL_CONFIG_FILE
- Beginning in Condor 6.5.5, it is permissiable for the files listed as the
local config file to be missing. This is most useful for sites that have
large numbers of machines in the pool, and a local config file that uses
the hostname builtin macro - instead of having an empty file for every host
in the pool, some files can simply be omitted. The default setting is True,
and Condor will exit with an error if the file listed as the local config
file cannot be read, unless $(REQUIRE_LOCAL_CONFIG_FILE) is set
to False
- CONDOR_IDS
- The User ID (UID) and Group ID (GID) pair that the Condor daemons
should run as, if the daemons are spawned as root.
This value can also be specified in the CONDOR_IDS
environment variable.
If the Condor daemons are not started as root, then neither this
CONDOR_IDS configuration macro nor the CONDOR_IDS
environment variable are used.
The value is given by two integers, separated by a period. For
example,
CONDOR_IDS = 1234.1234
.
If this pair is not specified in either the configuration file or in the
environment, and the Condor daemons are spawned as root,
then Condor will
search for a condor
user on the system, and run as that user's
UID and GID.
See section 3.7.1 on UIDs in Condor for more details.
- CONDOR_ADMIN
- The email
address that Condor will send mail to if something goes wrong in
your pool. For example, if a daemon crashes, the condor_ master
can send an obituary to this address with the last few lines
of that daemon's log file and a brief message that describes what
signal or exit status that daemon exited with.
- CONDOR_SUPPORT_EMAIL
- The email address to be included at the bottom of all email Condor
sends out under the label ``Email address of the local Condor
administrator:''.
This is the address where Condor users at your site should send
their questions about Condor and get technical support.
If this setting is not defined, Condor will use the address
specified in CONDOR_ADMIN (described above).
- MAIL
- The full path to a mail
sending program that uses -s to
specify a subject for the message. On all platforms,
the default shipped with Condor should work. Only if you
installed things in a non-standard location on your system would you
need to change this setting.
- RESERVED_SWAP
- Determines
how much swap space you want to reserve for your own
machine. Condor will not start up more condor_ shadow processes if
the amount of free swap space on your machine falls below this
level.
- RESERVED_DISK
- Determines
how much disk space you want to reserve for your own
machine. When Condor is reporting the amount of free disk space in
a given partition on your machine, it will always subtract this
amount. An example is the condor_ startd, which
advertises the amount of
free space in the $(EXECUTE) directory.
- LOCK
- Condor needs to create
lock files to synchronize access to various log files. Because of
problems with network file systems and file locking over
the years, we highly recommend that you put these lock
files on a local partition on each machine. If you do not have your
$(LOCAL_DIR) on a local partition, be sure to change this
entry. Whatever user or group Condor is running as needs to have
write access to this directory. If you are not running as root, this
is whatever user you started up the condor_ master as. If you are
running as root, and there is a condor account, it is most
likely condor.
Otherwise, it is whatever you set in the CONDOR_IDS
environment variable, or whatever you define in the
CONDOR_IDS setting in the Condor config files.
See section 3.7.1 on UIDs in Condor for details.
- HISTORY
- Defines the
location of the Condor history file, which stores information about
all Condor jobs that have completed on a given machine. This macro
is used by both the condor_ schedd which appends the information
and condor_ history, the user-level program used to view
the history file.
- DEFAULT_DOMAIN_NAME
- If you do not use a fully qualified name in file /etc/hosts
(or NIS, etc.) for either your official hostname or as an
alias, Condor would not normally be able to use fully qualified names
in places that it wants to. You can set this macro to the
domain to be appended to your hostname, if changing your host
information is not a good option. This macro must be set in the
global configuration file (not the $(LOCAL_CONFIG_FILE).
The reason for this is that the special $(FULL_HOSTNAME)
macro is used by the configuration file code in Condor needs
to know the full hostname. So, for $(DEFAULT_DOMAIN_NAME) to
take effect, Condor must already have read in its value. However,
Condor must set the $(FULL_HOSTNAME) special macro since you
might use that to define where your local configuration file is. After
reading the global configuration file, Condor figures out the right values
for $(HOSTNAME) and $(FULL_HOSTNAME) and inserts them
into its configuration table.
- NETWORK_INTERFACE
- For systems with multiple network interfaces, Condor chooses the
first one defined. To choose a network interface other than the
first one, this macro is defined by giving the IP address
to use.
- CM_IP_ADDR
- If neither COLLECTOR_HOST nor
COLLECTOR_IP_ADDR macros are defined, then this
macro will be used to determine the IP address of the central
manager (collector daemon).
This macro is defined by an IP address.
- HIGHPORT
- Specifies an upper limit of given port numbers for Condor to use,
such that Condor is restricted to a range of port numbers.
If this macro is not explicitly specified, then Condor will
not restrict the port numbers that it uses. Condor will use
system-assigned port numbers.
For this macro to work, both HIGHPORT and
LOWPORT (given below) must be defined.
- LOWPORT
- Specifies a lower limit of given port numbers for Condor to use,
such that Condor is restricted to a range of port numbers.
If this macro is not explicitly specified, then Condor will
not restrict the port numbers that it uses. Condor will use
system-assigned port numbers.
For this macro to work, both HIGHPORT (given above) and
LOWPORT must be defined.
- EMAIL_DOMAIN
- By default, if a user does not specify notify_user in the
submit description file, any email Condor sends about that job will
go to "username@UID_DOMAIN".
If your machines all share a common UID domain (so that you would
set UID_DOMAIN to be the same across all machines in your
pool), but email to user@UID_DOMAIN is not the right place for
Condor to send email for your site, you can define the default
domain to use for email.
A common example would be to set EMAIL_DOMAIN to the fully
qualified hostname of each machine in your pool, so users submitting
jobs from a specific machine would get email sent to
user@machine.your.domain, instead of user@your.domain.
You would do this by setting EMAIL_DOMAIN to
$(FULL_HOSTNAME).
In general, you should leave this setting commented out unless two
things are true: 1) UID_DOMAIN is set to your domain, not
$(FULL_HOSTNAME), and 2) email to user@UID_DOMAIN will not
work.
- CREATE_CORE_FILES
- Defines whether or not Condor daemons are to
create a core file if something really bad happens. It is
used to set
the resource limit for the size of a core file. If not defined,
it leaves in place whatever limit was in effect
when you started the Condor daemons (normally the condor_ master).
If this parameter is set and TRUE, the limit is increased to
the maximum. If it is set to FALSE, the limit is set at 0
(which means that no core files are created). Core files
greatly help the Condor developers debug any problems you might be
having. By using the parameter, you do not have to worry about
tracking down where in your boot scripts you need to set the core
limit before starting Condor. You set the parameter
to whatever behavior you want Condor to enforce. This parameter has
no default value, and is commented out in the default configuration file.
- Q_QUERY_TIMEOUT
- Defines the timeout (in seconds) that condor_ q uses when trying to
connect to the condor_ schedd. Defaults to 20.
- UPDATE_COLLECTOR_WITH_TCP
- If your site needs to use TCP connections to send ClassAd updates to
your collector (which it almost certainly does NOT), you can enable
this feature by setting this to TRUE.
Please read section 3.10.11 on ``Using TCP to
Send Collector Updates'' on page
for more details and a discussion of when you would need this
functionality.
At this time, this setting only effects the main condor_ collector
for your site, not any sites that a condor_ schedd might flock to.
If you enable this feature, you must also define
COLLECTOR_SOCKET_CACHE_SIZE ] at your central manager, so
that the collector will accept TCP connections for updates, and will
keep them open for reuse.
Defaults to FALSE.
- TRUST_UID_DOMAIN
- As an added security precaution, when Condor is about to spawn a job
it ensures that the UID_DOMAIN (defined above) of a given
submit machine is a substring of that machine's fully-qualified
hostname.
However, at some sites, there may be multiple UID spaces that do
not clearly correspond to Internet domain names.
In these cases, administrators may wish to use names to describe the
UID domains which are not substrings of the hostnames of the
machines.
For this to work, Condor must not do this regular security check.
If the TRUST_UID_DOMAIN setting is defined to TRUE,
Condor will not perform this test, and will trust whatever
UID_DOMAIN is presented by the submit machine when trying
to spawn a job, instead of making sure the submit machine's hostname
matches the UID_DOMAIN.
The default is FALSE, since it is more secure to perform this test.
- PASSWD_CACHE_REFRESH
- Condor can cause NIS servers to become overwhelmed by queries for uid
and group information in large pools. In order to avoid this problem,
Condor caches uid and group information internally. This setting allows
pool administrators to specify (in seconds) how long Condor should wait
until refreshes a cache entry. The default is set to 300 seconds, or
5 minutes. This means that if a pool administrator updates the user
or group database (e.g. /etc/passwd or /etc/group), it can take up
to 5 minutes before Condor will have the updated information. This
caching feature can be disabled by setting the refresh interval to
0. In addition, the cache can also be flushed explicitly by running
the ``condor_ reconfig -full'' command. This setting has no effect
on Windows.
3.3.3 Daemon Logging Config File Entries
These entries control how and where the Condor daemons write their log
files. Each of the entries in this section represents multiple
macros. There is one for each subsystem (listed
in section 3.3.1).
The macro name for each substitutes SUBSYS with the name
of the subsystem corresponding to the daemon.
- SUBSYS_LOG
- The name of
the log file for a given subsystem. For example,
$(STARTD_LOG) gives the location of the log file for
condor_ startd.
The actual names of the files
are also used in the $(VALID_LOG_FILES) entry used by
condor_ preen. A change to one of the
file names with this setting requires a change to the
$(VALID_LOG_FILES) entry as well, or condor_ preen will
delete your newly named log files.
- MAX_SUBSYS_LOG
- Controls
the maximum length in bytes to which a
log will be allowed to grow. Each log file will grow to the
specified length, then be saved to a file with the suffix
.old. The .old
files are overwritten each time the log is saved, thus the maximum
space devoted to logging for any one program will be twice the
maximum length of its log file. A value of 0 specifies that the
file may grow without bounds. The default is 64 Kbytes.
- TRUNC_SUBSYS_LOG_ON_OPEN
- If this macro is defined and set
to TRUE, the affected log will be truncated and started from an
empty file with each invocation of the program. Otherwise, new
invocations of the program will append to the previous log
file. By default this setting is FALSE for all daemons.
- SUBSYS_LOCK
- This macro
specifies the lock file used to synchronize append operations to the
log file for this subsystem. It must be a separate file from the
$(SUBSYS_LOG) file, since the $(SUBSYS_LOG) file may be
rotated and you want to be able to synchronize access across log
file rotations. A lock file is only required for log files which
are accessed by more than one process. Currently, this includes
only the SHADOW subsystem. This macro is defined relative
to the $(LOCK) macro. If, for some strange
reason, you decide to change this setting, be sure to change the
$(VALID_LOG_FILES) entry that condor_ preen uses as well.
- SUBSYS_DEBUG
- All of the
Condor daemons can produce different levels of output depending on
how much information you want to see. The various levels of
verbosity for a given daemon are determined by this macro. All
daemons have the default level D_ ALWAYS, and log messages for
that level will be printed to the daemon's log, regardless of this
macro's setting. The other possible debug levels are:
- D_ ALL
-
This flag turns on all debugging output by enabling all of the debug
levels at once. There is no need to list any other debug levels in addition
to D_ ALL; doing so would be redundant. Be warned: we are talking
about a HUGE amount of output. If you are interested in just a higher
level of output than the default, consider using D_ FULLDEBUG before
using this option.
- D_ FULLDEBUG
-
This level
provides verbose output of a general nature into the log files.
Frequent log messages for very specific debugging
purposes would be excluded. In those cases, the messages would
be viewed by having that another flag and D_ FULLDEBUG both
listed in the configuration file.
- D_ DAEMONCORE
-
Provides log
file entries specific to DaemonCore, such as
timers the daemons have set and the commands that are registered.
If both D_ FULLDEBUG and D_ DAEMONCORE are set,
expect very verbose output.
- D_ PRIV
-
This flag provides log
messages about the privilege state switching that the daemons
do. See section 3.7.1 on UIDs in Condor for details.
- D_ COMMAND
-
With this flag set, any
daemon that uses DaemonCore will print out a log message
whenever a command comes in. The name and integer of the command,
whether the command was sent via UDP or TCP, and where
the command was sent from are all logged.
Because the messages about the command used by condor_ kbdd to
communicate with the condor_ startd whenever there is activity on
the X server, and the command used for keep-alives are both only
printed with D_ FULLDEBUG enabled, it is best if this setting
is used for all daemons.
- D_ LOAD
-
The condor_ startd keeps track
of the load average on the machine where it is running. Both the
general system load average, and the load average being generated by
Condor's activity there are determined.
With this flag set, the condor_ startd
will log a message with the current state of both of these
load averages whenever it computes them. This flag only affects the
condor_ startd.
- D_ KEYBOARD
-
With this flag set, the condor_ startd will print out a log message
with the current values for remote and local keyboard idle time.
This flag affects only the condor_ startd.
- D_ JOB
-
When this flag is set, the
condor_ startd will send to its log file the contents of any
job ClassAd that the condor_ schedd sends to claim the
condor_ startd for its use. This flag affects only the
condor_ startd.
- D_ MACHINE
-
When this flag is set,
the condor_ startd will send to its log file the contents of
its resource ClassAd when the condor_ schedd tries to claim the
condor_ startd for its use. This flag affects only the
condor_ startd.
- D_ SYSCALLS
-
This flag is used to
make the condor_ shadow log remote syscall requests and return
values. This can help track down problems a user is having with a
particular job by providing the system calls the job is
performing. If any are failing, the reason for the
failure is given. The condor_ schedd also uses this flag for the server
portion of the queue management code. With D_ SYSCALLS
defined in SCHEDD_DEBUG there will be verbose logging of all
queue management operations the condor_ schedd performs.
- D_ MATCH
-
When this flag is
set, the negotiator logs a message for every match.
- D_ NETWORK
-
When this flag is set,
all Condor daemons will log a message on every TCP accept, connect,
and close, and on every UDP send and receive. This flag is not
yet fully supported in the condor_ shadow.
- D_ HOSTNAME
-
When this flag is set, the Condor daemons and/or tools will print
verbose messages explaining how they resolve host names, domain
names, and IP addresses.
This is useful for sites that are having trouble getting Condor to
work because of problems with DNS, NIS or other host name resolving
systems your machines are using.
- D_ CKPT
-
When this flag is set,
the Condor process checkpoint support code, which is linked into a STANDARD
universe user job, will output some low-level details about the checkpoint
procedure into the $(SHADOW_LOG).
- D_ SECURITY
-
This flag will enable debug messages pertaining to the setup of
secure network communication,
including messages for the negotiation of a socket
authentication mechanism, the management of a session key cache.
and messages about the authentication process itself. See
section 3.7.3 for more information about
secure communication configuration.
- D_ PROCFAMILY
-
Condor often times needs to manage an entire family of processes, i.e. a
process and all descendents of that process. This debug flag will
turn on debugging output for the management of families of processes.
- D_ ACCOUNTANT
-
When this flag is set,
the condor_ negotiator will output debug messages relating to the computation
of user priorities (see section 3.5).
- ALL_DEBUG
- To make all subsystems
share a debug flag, simply set the parameter ALL_DEBUG
instead of changing all of the individual parameters. For example,
to turn on all debugging in all subsystems, set
ALL_DEBUG = D_ALL
.
Log files may optionally be specified per debug level as follows:
- SUBSYS_LEVEL_LOG
- This is
the name of a log file for messages at a specific debug level for a
specific subsystem. If the debug level is included in
$(SUBSYS_DEBUG), then all messages of this debug level will be
written both to the $(SUBSYS_LOG) file and the
$(SUBSYS_LEVEL_LOG) file. For example,
$(SHADOW_SYSCALLS_LOG) specifies a log file for all remote
system call debug messages.
- MAX_SUBSYS_LEVEL_LOG
- Similar to MAX_SUBSYS_LOG .
- TRUNC_SUBSYS_LEVEL_LOG_ON_OPEN
- Similar to
TRUNC_SUBSYS_LOG_ON_OPEN .
3.3.4 DaemonCore Config File Entries
Please read section 3.8 for details
on DaemonCore. There are certain configuration file settings that
DaemonCore uses which affect all Condor daemons (except the checkpoint
server, shadow, and starter, none of which use DaemonCore yet).
- HOSTALLOW...
- All
macros that begin with either HOSTALLOW or
HOSTDENY are settings for Condor's host-based security.
See section 3.7.5 on Setting up
IP/host-based security in Condor for details on these
macros and how to configure them.
- SETTABLE_ATTRS...
- All
macros that begin with SETTABLE_ATTRS or
SUBSYS_SETTABLE_ATTRS are settings used to restrict the
configuration values that can be changed using the condor_ config_val
command.
Section 3.7.5 on Setting up
IP/Host-Based Security in Condor for details on these
macros and how to configure them.
In particular, section 3.7.5
on page
contains details specific to
these macros.
- SHUTDOWN_GRACEFUL_TIMEOUT
- Determines how long
Condor will allow daemons try their graceful shutdown methods
before they do a hard shutdown. It is defined in terms of seconds.
The default is 1800 (30 minutes).
- AUTHENTICATION_METHODS
- There are many instances when the Condor system needs to authenticate
the identity of the user. For instance, when a job is submitted with
condor_ submit, Condor needs to authenticate the user so that the job
goes into the queue and runs with the proper credentials. The
AUTHENTICATION_METHODS parameter is a list of
permitted authentication methods. The list is ordered by
preference. The actual authentication method used is the first method
in this list that both the server and client are able to perform.
Possible values are:
- NTSSPI Use NT's standard LAN-MANAGER challenge-response protocol.
NOTE: This is the default method used on Windows NT.
- FS Use the filesystem to authenticate the user.
The server requests the client to create a specified temporary
file, then the server verifies the ownership of that file. NOTE:
This is the default method used on Unix systems.
- FS_REMOTE Use a shared filesystem to authenticate the user.
This is useful for submitting jobs to a remote schedd.
Similar to FS authentication, except the temporary file to be
created by the user must be on a shared filesystem (AFS, NFS, etc.)
If the client's submit description file does not define the
command rendezvousdir, the initialdir value is used
as the default directory in which to create the temporary file.
NOTE: Normal AFS issues apply here: Condor must be able to write
to the directory used.
- CLAIMTOBE The server should simply trust the client.
NOTE: You had better trust all users who have access to your Condor
pool if you enable CLAIMTOBE authentication.
- SUBSYS_ADDRESS_FILE
- Every Condor daemon that uses
DaemonCore has a command port where commands are sent. The
IP/port of the daemon is put in that daemon's ClassAd so that other
machines in the pool can query the condor_ collector (which listens
on a well-known port) to find the address of a given daemon on a
given machine. However, tools and daemons executing on the same
machine they wish to communicate with are not required to query the
collector. They look in a file on the local disk to find
the IP/port. Setting this macro will cause daemons to write the
IP/port of their command socket to a specified file. In this way,
local tools will continue to operate, even if the machine running
the condor_ collector crashes. Using this file will also generate
slightly less network traffic in your pool (since condor_ q,
condor_ rm, and others do not have to send any messages over the network to
locate the condor_ schedd). This macro is not needed
for the collector or negotiator, since their command sockets are at
well-known ports.
- SUBSYS_EXPRS
- Allows any DaemonCore daemon to advertise arbitrary
expressions from the configuration file in its ClassAd. Give the
comma-separated list of entries from the configuration file you want in the
given daemon's ClassAd.
NOTE: The condor_ negotiator and condor_ kbdd do not send
ClassAds now, so this entry does not affect them. The
condor_ startd, condor_ schedd, condor_ master, and
condor_ collector do send ClassAds, so those would be valid
subsystems to set this entry for.
Setting $(SUBMIT_EXPRS) has the slightly
different effect of having the named expressions inserted into all
the job ClassAds that condor_ submit creates. This is equivalent
to the ``+'' syntax in submit files. See the
the condor_ submit manual page
on page
for details on using the ``+''
syntax to add attributes to the job ClassAd.
Attributes defined in the submit description file with ``+'' will
override attributes defined in the config file with
$(SUBMIT_EXPRS).
The condor_ startd $(STARTD_EXPRS) defaults to
``JobUniverse''.
Because of the different syntax of the configuration
file and ClassAds, a little extra work is required to get a
given entry into a ClassAd. In particular, ClassAds require quote
marks (") around strings. Numeric values and boolean expressions
can go in directly.
For example, if the startd is to advertise a string macro, a numeric
macro, and a boolean expression, do something similar to:
STRING = This is a string
NUMBER = 666
BOOL1 = True
BOOL2 = CurrentTime >= $(NUMBER_MACRO) || $(BOOL1)
MY_STRING = "$(STRING_MACRO)"
STARTD_EXPRS = MY_STRING, NUMBER, BOOL1, BOOL2
3.3.5 Shared File System Configuration File Macros
These macros control how Condor interacts with various shared and
network file systems. If you are using AFS as your shared filesystem,
be sure to read section 3.10.1 on Using Condor with
AFS.
- UID_DOMAIN
- The UID_DOMAIN macro
is used to decide under which user to run your jobs.
If the $(UID_DOMAIN)
on the submitting machine is different than
the $(UID_DOMAIN)
on the machine that runs your job, then Condor will run
the job as the user called ``nobody''.
For example, if the submit machine has
the $(UID_DOMAIN)
``flippy.cs.wisc.edu'' and the machine where the job will execute
has the $(UID_DOMAIN)
``cs.wisc.edu'', the job will run as user nobody, because
the two $(UID_DOMAIN)s are not the same.
If the $(UID_DOMAIN)
is the same on both the submit and execute machines,
then Condor will run the job as the user that submitted the job.
A further check attempts to assure that the submitting
machine can not lie about its $(UID_DOMAIN).
Condor compares the
submit machine's claimed $(UID_DOMAIN)
to its fully qualified name.
If the two do not end the same, then the submit machine
is presumed to be lying about its $(UID_DOMAIN).
In this case, Condor will run the job as user nobody.
For example, a job submission to the Condor pool at the UW Madison
from ``flippy.example.com'', claiming a $(UID_DOMAIN)
of ``cs.wisc.edu'',
will run the job as the user nobody.
Because of this verification, you need to set your
$(UID_DOMAIN)
to a real domain name. At the Computer Sciences department
at the UW Madison, we set the $(UID_DOMAIN)
to be ``cs.wisc.edu'' to
indicate that whenever someone submits from a department machine, we
will run the job as the user who submits it.
Also see SOFT_UID_DOMAIN
below for information about one more check
that Condor performs before running a job as a given user.
A few details:
You could set $(UID_DOMAIN)
to ``*''. This will match all domains,
but it is a gaping security hole. It is not recommended.
You can set $(UID_DOMAIN)
to ``none'' or leave it undefined. This will
force Condor to always run jobs as user nobody.
Running standard universe jobs as user nobody enhances
your security and should cause no problems, because the jobs use remote
I/O to access all of their files.
However, if vanilla jobs are run as
user nobody, then files that need to be accessed by the job will need
to be marked as world readable/writable so the user nobody can access
them.
When Condor sends e-mail about a job, Condor sends the e-mail to
user@UID_DOMAIN.
If $(UID_DOMAIN)
is set to ``none'' or it is undefined,
the e-mail is
sent to user@submitmachinename.
- SOFT_UID_DOMAIN
- When Condor is about to run a job as a particular user (instead of the
user nobody), it verifies that the UID given for the user is in the
password file and actually matches the given user name.
However, some
installations may not have every user in every machine's password
file, so this check will fail. If you prefer that Condor not do
this check, because users are not in every password file, set
this attribute to True.
- VMx_USER
- The name of a user for Condor to use instead of
user nobody,
as part of a solution that plugs a security hole whereby
a lurker process can prey on a subsequent job run as user
nobody.
x is an integer associated with virtual machines.
On Windows, VMx_USER
will only work if the credential of the specified
user is stored on the execute machine using condor_ store_cred.
See Section 3.7.1 for more information.
- EXECUTE_LOGIN_IS_DEDICATED
- When set to True,
forces Condor to use the users given by the VMx_USER
configuration variable.
Defaults to False.
- FILESYSTEM_DOMAIN
- The FILESYSTEM_DOMAIN
macro is an arbitrary string that is used to decide if
two machines (a submitting machine and an execute machine) share a
file system.
Although the macro name contains the word ``DOMAIN'',
the macro is not required to be a domain name, although it often is.
Vanilla Unix jobs currently require a shared file system in order to
share any data files or see the output of the program.
Condor decides if there is a shared filesystem by comparing the values
of
$(FILESYSTEM_DOMAIN)
of both the submitting and execute machines.
If the values are the same,
Condor assume there is a shared file system.
Condor implements the check
by extending the Requirements for your job.
You can see these requirements by using the [-v] argument
to condor_ submit.
Note that this implementation is not ideal: machines may share some
file systems but not others. Condor currently has no way to express
this automatically. You can express the need to use a
particular file system by adding additional attributes to your machines
and submit files, similar to the example given in
Frequently Asked Questions,
section 7 on
how to run jobs only on machines that have
certain software installations.
Note that if you do not set
$(FILESYSTEM_DOMAIN), Condor will
automatically set the macro's value to be the fully qualified hostname
of the local machine.
Since each machine will have a different
$(FILESYSTEM_DOMAIN),
they will not be considered to have shared file systems.
- RESERVE_AFS_CACHE
- If
your machine is running AFS and the AFS cache lives on the same
partition as the other Condor directories, and you want Condor to
reserve the space that your AFS cache is configured to use, set this
macro to TRUE. It defaults to FALSE.
- USE_NFS
- This macro influences
how Condor jobs running in the standard universe access their
files. Condor will redirect the file I/O requests
of standard universe jobs to be executed on the machine which
submitted the job. Because of this, as a Condor job migrates around
the network, the file system always appears to be identical to the
file system where the job was submitted. However, consider the case
where a user's data files are sitting on an NFS server. The machine
running the user's program will send all I/O over the network to the
machine which submitted the job, which in turn sends all the I/O
over the network a second time back to the NFS file server. Thus,
all of the program's I/O is being sent over the network twice.
If this macro to TRUE, then Condor will attempt to
read/write files without redirecting I/O back to the submitting
machine if both the submitting machine and the machine running the job
are both accessing the same NFS servers (if they are both in the
same $(FILESYSTEM_DOMAIN) and in the same $(UID_DOMAIN),
as described above). The result is I/O performed by Condor standard
universe jobs is only sent over the network once.
While sending all file operations over the network twice might sound
really bad, unless you are operating over networks where bandwidth
as at a very high premium, practical experience reveals that this
scheme offers very little real performance gain. There are also
some (fairly rare) situations where this scheme can break down.
Setting $(USE_NFS) to FALSE is always safe. It may result
in slightly more network traffic, but Condor jobs are most often heavy
on CPU and light on I/O. It also ensures that a remote
standard universe Condor job will always use Condor's remote system
calls mechanism to reroute I/O and therefore see the exact same
file system that the user sees on the machine where she/he submitted
the job.
Some gritty details for folks who want to know: If the you set
$(USE_NFS) to TRUE, and the $(FILESYSTEM_DOMAIN) of
both the submitting machine and the remote machine about to execute
the job match, and the $(FILESYSTEM_DOMAIN) claimed by the
submit machine is indeed found to be a subset of what an inverse
lookup to a DNS (domain name server) reports as the fully qualified
domain name for the submit machine's IP address (this security
measure safeguards against the submit machine from lying),
then the job will access files using a local system call,
without redirecting them to the submitting machine (with
NFS). Otherwise, the system call will get routed back to the
submitting machine using Condor's remote system call mechanism.
NOTE: When submitting a vanilla job, condor_ submit will, by default,
append requirements to the Job ClassAd that specify the machine to run
the job must be in the same $(FILESYSTEM_DOMAIN) and the same
$(UID_DOMAIN).
- IGNORE_NFS_LOCK_ERRORS
- When set to TRUE, all errors related to file locking errors from
NFS are ignored.
Defaults to FALSE, not ignoring errors.
- USE_AFS
- If your machines have AFS,
this
macro determines whether Condor will use remote system calls for
standard universe jobs to send I/O requests to the submit machine,
or if it should use local file access on the execute machine (which
will then use AFS to get to the submitter's files). Read the
setting above on $(USE_NFS) for a discussion of why you might
want to use AFS access instead of remote system calls.
One important difference between $(USE_NFS) and
$(USE_AFS) is the AFS cache. With $(USE_AFS) set to
TRUE, the remote Condor job executing on some machine will start
modifying the AFS cache, possibly evicting the machine owner's
files from the cache to make room for its own. Generally speaking,
since we try to minimize the impact of having a Condor job run on a
given machine, we do not recommend using this setting.
While sending all file operations over the network twice might sound
really bad, unless you are operating over networks where bandwidth
as at a very high premium, practical experience reveals that this
scheme offers very little real performance gain. There are also
some (fairly rare) situations where this scheme can break down.
Setting $(USE_AFS) to FALSE is always safe. It may result
in slightly more network traffic, but Condor jobs are usually heavy
on CPU and light on I/O. FALSE ensures that a remote
standard universe Condor job will always see the exact same
file system that the user on sees on the machine where he/she
submitted the job. Plus, it will ensure that the machine where the
job executes does not have its AFS cache modified as a result of
the Condor job being there.
However, things may be different at your site, which is why the
setting is there.
3.3.6 Checkpoint Server Configuration File Macros
These macros control whether or not Condor uses a checkpoint server.
If you are using a checkpoint server, this section
describes the settings that the checkpoint server itself needs
defined. A checkpoint server is installed
separately. It is not included in the main Condor binary
distribution or installation procedure. See
section 3.4.2 on Installing a Checkpoint Server
for details on installing and running a checkpoint server for your
pool.
NOTE: If you are setting up a machine to join the UW-Madison CS
Department Condor pool, you should configure the machine to
use a checkpoint server, and use ``condor-ckpt.cs.wisc.edu'' as the
checkpoint server host (see below).
- CKPT_SERVER_HOST
- The
hostname of a checkpoint server.
- STARTER_CHOOSES_CKPT_SERVER
- If this parameter is TRUE
or undefined on
the submit machine, the checkpoint server specified by
$(CKPT_SERVER_HOST) on the execute machine is used. If it is
FALSE on the submit machine, the checkpoint server
specified by $(CKPT_SERVER_HOST) on the submit machine is
used.
- CKPT_SERVER_DIR
- The
checkpoint server needs this macro defined to the full path of the
directory the server should use to store checkpoint files.
Depending on the size of your pool and the size of the jobs your
users are submitting, this directory (and its subdirectories) might
need to store many Mbytes of data.
- USE_CKPT_SERVER
- A boolean
which determines if you want a given submit machine to use a
checkpoint server if one is available. If a
checkpoint server isn't available or USE_CKPT_SERVER is set to
False, checkpoints will be written to the local $(SPOOL) directory on
the submission machine.
- MAX_DISCARDED_RUN_TIME
- If the shadow is unable to read a
checkpoint file from the checkpoint server, it keeps trying only if
the job has accumulated more than this many seconds of CPU usage.
Otherwise, the job is started from scratch. Defaults to 3600 (1
hour). This setting is only used if $(USE_CKPT_SERVER) is
TRUE.
3.3.7 condor_ master Configuration File Macros
These macros control the condor_ master.
- DAEMON_LIST
- This macro
determines what daemons the condor_ master will start and keep its
watchful eyes on. The list is a comma or space separated list of
subsystem names (listed in
section 3.3.1). For example,
DAEMON_LIST = MASTER, STARTD, SCHEDD
NOTE: On your central manager, your $(DAEMON_LIST)
will be different from your regular pool, since it will include
entries for the condor_ collector and condor_ negotiator.
NOTE: On machines running Digital Unix or IRIX, your
$(DAEMON_LIST) will also include KBDD, for the
condor_ kbdd, which is a special daemon that runs to monitor
keyboard and mouse activity on the console. It is only with this
special daemon that we can acquire this information on those
platforms.
- DC_DAEMON_LIST
- This macro
lists the daemons in DAEMON_LIST which use the Condor
DaemonCore library. The condor_ master must differentiate between
daemons that use DaemonCore and those that don't so it uses the
appropriate inter-process communication mechanisms. This list
currently includes all Condor daemons except the checkpoint server
by default.
- SUBSYS
- Once you have defined which
subsystems you want the condor_ master to start, you must provide
it with the full path to each of these binaries. For example:
MASTER = $(SBIN)/condor_master
STARTD = $(SBIN)/condor_startd
SCHEDD = $(SBIN)/condor_schedd
These are most often defined relative to the $(SBIN) macro.
- DAEMONNAME_ENVIRONMENT
- For each subsystem defined in DAEMON_LIST, you may specify
changes to the environment that daemon is started with by setting
DAEMONNAME_ENVIRONMENT, where DAEMONNAME is the name of
a daemon listed in DAEMON_LIST. It should be set to a semicolon
delimited list of name=value pairs. For example, if you wish to redefine the
TMP and CONDOR_CONFIG environment variables seen by the
condor_ schedd, you could place the following in the config file:
SCHEDD_ENVIRONMENT = TMP=/new/value;CONDOR_CONFIG=/special/config
When the condor_ schedd was started by the condor_ master, it would
see the specified values of TMP and CONDOR_CONFIG.
- SUBSYS_ARGS
- This macro
allows the specification of additional command line arguments for any
process spawned by the condor_ master.
List the desired arguments, as typing the
command line into the configuration file.
Set the arguments for a specific daemon with this macro,
and the macro will affect only that daemon. Define
one of these for each daemon the condor_ master is controlling.
For example, set $(STARTD_ARGS) to specify any extra
command line arguments to the condor_ startd.
- PREEN
- In addition to the daemons
defined in $(DAEMON_LIST), the condor_ master also starts up
a special process, condor_ preen to clean out junk files that have
been left laying around by Condor. This macro determines where the
condor_ master finds the condor_ preen binary.
Comment out this macro, and condor_ preen will not run.
- PREEN_ARGS
- Controls how condor_ preen behaves by allowing the specification
of command-line arguments.
This macro works as $(SUBSYS_ARGS) does.
The difference is that you must specify this macro for
condor_ preen if you want it to do anything.
condor_ preen takes action only
because of command line arguments.
-m means you want e-mail about files condor_ preen finds that it
thinks it should remove.
-r means you want condor_ preen to actually remove these files.
- PREEN_INTERVAL
- This macro
determines how often condor_ preen should be started. It is
defined in terms of seconds and defaults to 86400 (once a day).
- PUBLISH_OBITUARIES
- When a daemon crashes, the condor_ master can send e-mail to the
address specified by $(CONDOR_ADMIN) with an obituary letting
the administrator know that the daemon died, the cause of
death (which signal or exit status it exited with), and
(optionally) the last few entries from that daemon's log file. If
you want obituaries, set this macro to TRUE.
- OBITUARY_LOG_LENGTH
- This macro controls how many lines
of the log file are part of obituaries.
- START_MASTER
- If this setting
is defined and set to FALSE when the condor_ master starts up, the first
thing it will do is exit. This appears strange, but perhaps you
do not want Condor to run on certain machines in your pool, yet
the boot scripts for your entire pool are handled by a centralized
system that starts up the condor_ master automatically. This is
an entry you would most likely find in a local configuration file,
not a global configuration file.
- START_DAEMONS
- This macro
is similar to the $(START_MASTER) macro described above.
However, the condor_ master does not exit; it does not start any
of the daemons listed in the $(DAEMON_LIST).
The daemons may be started at a later time with a condor_ on
command.
- MASTER_UPDATE_INTERVAL
- This macro determines how often
the condor_ master sends a ClassAd update to the
condor_ collector. It is defined in seconds and defaults to 300
(every 5 minutes).
- MASTER_CHECK_NEW_EXEC_INTERVAL
- This
macro controls how often the condor_ master checks the timestamps
of the running daemons. If any daemons have been modified, the
master restarts them. It is defined in seconds and defaults to 300
(every 5 minutes).
- MASTER_NEW_BINARY_DELAY
- Once the condor_ master has
discovered a new binary, this macro controls how long it waits
before attempting to execute the new binary. This delay exists
because the condor_ master might notice a new binary while it
is in the process of being copied,
in which case trying to execute it yields
unpredictable results. The entry is defined in seconds and
defaults to 120 (2 minutes).
- SHUTDOWN_FAST_TIMEOUT
- This macro determines the maximum
amount of time daemons are given to perform their
fast shutdown procedure before the condor_ master kills them
outright. It is defined in seconds and defaults to 300 (5 minutes).
- MASTER_BACKOFF_FACTOR
- If a daemon keeps crashing, an
exponential back off waits longer and longer before
restarting it. At the end of this section, there is an example that
shows how all these settings work. This setting is the base of the
exponent used to determine how long to wait before starting the
daemon again. It defaults to 2 seconds.
- MASTER_BACKOFF_CEILING
- This entry determines the maximum
amount of time you want the master to wait between attempts to start
a given daemon. (With 2.0 as the $(MASTER_BACKOFF_FACTOR),
1 hour is obtained in 12 restarts). It is defined in terms of
seconds and defaults to 3600 (1 hour).
- MASTER_RECOVER_FACTOR
- A macro to set How long a daemon
needs to run without crashing before it is considered recovered.
Once a
daemon has recovered, the number of restarts is reset, so the
exponential back off stuff returns to its initial state.
The macro is defined in
terms of seconds and defaults to 300 (5 minutes).
For clarity, the following is an example of the workings of
the exponential back off settings. The example is worked out assuming
the default settings.
When a daemon crashes, it is restarted in 10 seconds. If it keeps
crashing, a longer amount of time is waited before restarting.
The length of time is based on how
many times it has been restarted.
Take the $(MASTER_BACKOFF_FACTOR) (defaults to 2) to
the power the number of times the daemon has restarted, and add 9.
An example:
1st crash: restarts == 0, so, 9 + 2^0 = 9 + 1 = 10 seconds
2nd crash: restarts == 1, so, 9 + 2^1 = 9 + 2 = 11 seconds
3rd crash: restarts == 2, so, 9 + 2^2 = 9 + 4 = 13 seconds
...
6th crash: restarts == 5, so, 9 + 2^5 = 9 + 32 = 41 seconds
...
9th crash: restarts == 8, so, 9 + 2^8 = 9 + 256 = 265 seconds
After the 13th crash, it would be:
13th crash: restarts == 12, so, 9 + 2^12 = 9 + 4096 = 4105 seconds
This is bigger than the $(MASTER_BACKOFF_CEILING), which
defaults to 3600, so the daemon would really be restarted after only
3600 seconds, not 4105.
The condor_ master tries again every hour (since the numbers would
get larger and would always be capped by the ceiling).
Eventually, imagine that daemon finally started and did not crash.
This might happen if, for example, an administrator reinstalled
an accidentally deleted binary after receiving e-mail about
the daemon crashing.
If it stayed alive for
$(MASTER_RECOVER_FACTOR) seconds (defaults to 5 minutes),
the count of how many restarts this daemon has performed is reset to
10 seconds.
The moral of the example is that
the defaults work quite well, and you probably
will not want to change them for any reason.
- MASTER_EXPRS
- This macro is
described in section 3.3.4 as
SUBSYS_EXPRS.
- MASTER_DEBUG
- This macro
is described in section 3.3.3 as
SUBSYS_DEBUG.
- MASTER_ADDRESS_FILE
- This macro is described in
section 3.3.4 as
SUBSYS_ADDRESS_FILE
- SECONDARY_COLLECTOR_LIST
- This macro lists the host names
of secondary collectors. A secondary collector is a machine
running a condor_ collector daemon that is not the central manager.
A secondary collector makes it possible to execute administrative
commands in the pool when the central manager is down by using the
-pool argument to specify the name of a secondary collector to
use to locate the condor_ master daemon.
- ALLOW_ADMIN_COMMANDS
- If set to NO for a given host, this
macro disables administrative commands, such as
condor_ restart, condor_ on, and condor_ off, to that host.
- MASTER_INSTANCE_LOCK
- Defines the name of a file for the condor_ master daemon
to lock in order to prevent multiple condor_ masters
from starting.
This is useful when using shared file systems like NFS which do
not technically support locking in the case where the lock files
reside on a local disk.
If this macro is not defined, the default file name will be
$(LOCK)/InstanceLock.
$(LOCK) can instead be defined to
specify the location of all lock files, not just the
condor_ master's InstanceLock.
If $(LOCK) is undefined, then the master log itself is locked.
3.3.8 condor_ startd Configuration File Macros
NOTE: If you are running Condor on a multi-CPU machine, be sure
to also read section 3.10.6 on
page
which describes how to setup and
configure Condor on SMP machines.
These settings control general operation of the condor_ startd.
Information on how to configure the condor_ startd to start, suspend,
resume, vacate and kill remote Condor jobs is found in
section 3.6 on
Configuring The Startd Policy. In that section is
information on the startd's states and activities.
Macros in the configuration file not described here are ones that
control state or activity transitions within the
condor_ startd.
- STARTER
- This macro holds the
full path to the condor_ starter binary that the startd should
spawn.
It is normally defined relative to $(SBIN).
- ALTERNATE_STARTER_1
- This macro holds the full path to the condor_ starter.pvm
binary that the startd spawns to service PVM jobs. It is normally
defined relative to $(SBIN), since by default,
condor_ starter.pvm is installed in the regular Condor release
directory.
- POLLING_INTERVAL
- When a
startd enters the claimed state, this macro determines how often
the state of the machine is polled to check the need to suspend, resume,
vacate or kill the job. It is defined in terms of seconds and defaults to
5.
- UPDATE_INTERVAL
- Determines how often the startd should send a ClassAd update
to the condor_ collector. The startd also sends update on any
state or activity change, or if the value of its START expression
changes. See section 3.6.5 on condor_ startd
States, section 3.6.6 on condor_ startd
Activities, and section 3.6.3 on condor_ startd
START expression for details on states, activities, and the
START expression. This macro is defined in
terms of seconds and defaults to 300 (5 minutes).
- STARTD_HAS_BAD_UTMP
- When the startd is computing the idle time of all the
users of the machine (both local and remote), it checks the
utmp file to find all the currently active ttys, and only
checks access time of the devices associated with active logins.
Unfortunately, on some systems, utmp is unreliable, and the
startd might miss keyboard activity by doing this. So, if your
utmp is unreliable, set this macro to TRUE and the
startd will check the access time on all tty and pty devices.
- CONSOLE_DEVICES
- This
macro allows the startd to monitor console (keyboard and mouse)
activity by checking the access times on special files in
/dev. Activity on these files shows up as
ConsoleIdle
time in the startd's ClassAd. Give a comma-separated list of
the names of devices considered the console, without the
/dev/ portion of the pathname. The defaults vary from
platform to platform, and are usually correct.
One possible exception to this is on Linux, where
we use ``mouse'' as
one of the entries. Most Linux installations put in a
soft link from /dev/mouse that points to the appropriate
device (for example, /dev/psaux for a PS/2 bus mouse, or
/dev/tty00 for a serial mouse connected to com1). However,
if your installation does not have this soft link, you will either
need to put it in (you will be glad you did), or change this
macro to point to the right device.
Unfortunately, there are no such devices on Digital Unix or IRIX
(don't be fooled by /dev/keyboard0; the kernel does not
update the access times on these devices), so this macro is not
useful in these cases, and we must use the condor_ kbdd to get this
information by connecting to the X server.
- STARTD_JOB_EXPRS
- When
the machine is claimed by a remote user, the startd can also advertise
arbitrary attributes from the job ClassAd in the machine
ClassAd.
List the attribute names to be advertised. NOTE: Since
these are already ClassAd expressions, do not do anything
unusual with strings.
- STARTD_EXPRS
- This macro is
described in section 3.3.4 as
SUBSYS_EXPRS .
- STARTD_DEBUG
- This macro
(and other settings related to debug logging in the startd) is
described in section 3.3.3 as
SUBSYS_DEBUG .
- STARTD_ADDRESS_FILE
- This macro is described in
section 3.3.4 as
SUBSYS_ADDRESS_FILE
- NUM_CPUS
- This macro can be used to ``lie'' to the startd about how many CPUs
your machine has.
If you set this, it will override Condor's automatic computation of
the number of CPUs in your machine, and Condor will use whatever
integer you specify here.
In this way, you can allow multiple Condor jobs to run on a
single-CPU machine by having that machine treated like an SMP
machine with multiple CPUs, which could have different Condor jobs
running on each one.
Or, you can have an SMP machine advertise more virtual machines than
it has CPUs.
However, using this parameter will hurt the performance of the jobs,
since you would now have multiple jobs running on the same CPU,
competing with each other.
The option is only meant for people who specifically want this
behavior and know what they are doing.
It is disabled by default.
NOTE: This setting cannot be changed with a simple reconfig (either
by sending a SIGHUP or using condor_ reconfig.
If you change this, you must restart the condor_ startd for the
change to take effect (by using ``condor_ restart -startd'').
NOTE: If you use this setting on a given machine, you should
probably advertise that fact in the machine's ClassAd by using the
STARTD_EXPRS setting (described above).
This way, jobs submitted in your pool could specify that they did or
did not want to be matched with machines that were only really
offering ``fractional CPUs''.
- MEMORY
- Normally, Condor will automatically detect the amount of physical
memory available on your machine. Define MEMORY to tell
Condor how much physical memory (in MB) your machine has, overriding
the value Condor computes automatically.
- RESERVED_MEMORY
- How much memory would you like reserved from Condor? By default,
Condor considers all the physical memory of your machine as
available to be used by Condor jobs. If RESERVED_MEMORY is
defined, Condor subtracts it from the amount of memory it advertises
as available.
- STARTD_NAME
- Used to give an alternative name in the condor_ startd's
class ad.
This esoteric configuration macro might be used in the situation
where there are two condor_ startd daemons running on one machine,
and each reports to the same condor_ collector.
Different names will distinguish the two daemons.
These macros only apply to the startd when it is running on an
SMP machine.
See section 3.10.6 on
page
on Configuring The Startd for
SMP Machines for details.
- VIRTUAL_MACHINES_CONNECTED_TO_CONSOLE
- An integer which indicates how many of the virtual
machines the startd is representing should be "connected" to the
console (in other words, notice when there's console activity).
This defaults to all virtual machines (N in a machine with N CPUs).
- VIRTUAL_MACHINES_CONNECTED_TO_KEYBOARD
- An integer which indicates how many of the virtual
machines the startd is representing should be "connected" to the
keyboard (for remote tty activity, as well as console activity).
Defaults to 1.
- DISCONNECTED_KEYBOARD_IDLE_BOOST
- If there are virtual machines not connected to either the keyboard
or the console, the corresponding idle time reported will be the
time since the startd was spawned, plus the value of this macro.
It defaults to 1200 seconds (20 minutes).
We do this because if the virtual machine is configured not to care
about keyboard activity, we want it to be available to Condor jobs
as soon as the startd starts up, instead of having to wait for 15
minutes or more (which is the default time a machine must be idle
before Condor will start a job).
If you do not want this boost, set the value to 0.
If you change your START expression to require more than 15 minutes
before a job starts, but you still want jobs to start right away on
some of your SMP nodes, increase this macro's value.
The following settings control the number of virtual machines reported
for a given SMP host, and what attributes each one has.
They are only needed if you do not want to have an SMP machine report
to Condor with a separate virtual machine for each CPU, with all
shared system resources evenly divided among them.
Please read section 3.10.6 on
page
for details on how to properly configure
these settings to suit your needs.
NOTE: You can only change the number of each type of virtual machine
the condor_ startd is reporting with a simple reconfig (such as
sending a SIGHUP signal, or using the condor_ reconfig command).
You cannot change the definition of the different virtual machine
types with a reconfig.
If you change them, you must restart the condor_ startd for the
change to take effect (for example, using ``condor_ restart
-startd'').
- MAX_VIRTUAL_MACHINE_TYPES
- The maximum number of different virtual machine types.
Note: this is the maximum number of different types, not of
actual virtual machines.
Defaults to 10.
(You should only need to change this setting if you define more than
10 separate virtual machine types, which would be pretty rare.)
- VIRTUAL_MACHINE_TYPE_<N>
- This setting defines a given virtual machine type, by specifying
what part of each shared system resource (like RAM, swap space, etc)
this kind of virtual machine gets.
N can be any integer from 1 to the value of
$(MAX_VIRTUAL_MACHINE_TYPES), such as
VIRTUAL_MACHINE_TYPE_1.
The format of this entry can be somewhat complex, so please refer to
section 3.10.6 on page
for
details on the different possibilities.
- NUM_VIRTUAL_MACHINES_TYPE_<N>
- This macro controls how many of a given virtual machine type
are actually reported to Condor.
There is no default.
- NUM_VIRTUAL_MACHINES
- If your SMP machine is being evenly divided, and the virtual
machine type settings described above are not being used, this
macro controls how many virtual machines will be reported.
The default is one virtual machine for each CPU.
This setting can be used to reserve some CPUs on an SMP which would
not be reported to the Condor pool.
The following macros control the cron capabilities of Condor.
The cron mechanism is used to run executables (called
modules) directly from the condor_ startd daemon.
The output of these modules
is incorporated into the machine ClassAd generated by the
condor_ startd. These capabilities are used in Hawkeye, but can be
used in other situations, as well.
- STARTD_CRON_NAME
- Defines a logical name to be used in the formation of related
configuration macro names. While
not required, this macro makes other macros
more readable and maintainable. A common example:
STARTD_CRON_NAME = HAWKEYE
This example allows Condor to refer to other related macros
with the string ``HAWKEYE'' in their name.
- STARTD_CRON_JOBS
- The list of the modules to execute. In Hawkeye, this is usually
named HAWKEYE_JOBS.
This configuration variable is defined by
a whitespace or newline separated list of jobs (called modules) to run, of the form:
modulename:prefix:executable:period[:options]
Each of these fields can be surrounded by matching quote characters
(single quote or double quote, but they must match). This allows
colon and whitespace characters to be specified. For example, the
following specifies an executable name with a colon and a space in it:
foo:foo_:''c:/some dir/foo.exe'':10m
These individual fields are described below:
- modulename The logical name of the module. This name
must be unique (no two modules may have the same name).
- prefix Specifies a string which is prepended by
Condor to all attribute names that the module generates. For
example, if a prefix is ``xyz_'', and an individual attribute is
named ``abc'', the resulting attribute would be ``xyz_abc''.
Although it can be quoted, the prefix can contain only
alpha-numeric characters.
- executable Used to specify the full path to the
executable to run for this module. Note that multiple modules may
specify the same executable (although they need to have different
names). As noted above, the executable name can be quoted,
allowing for names with spaces and/or colons(:).
- period The period specifies time intervals at
which the module should be run.
For non-continuous modules, this
is the time interval that passes between starting the execution
of the module.
The value may be specified in seconds (append value with the character 's'),
in minutes (append value with the character 'm'),
or in hours (append value with the character 'h').
As an example, 5m starts the execution of the module every five minutes.
If no character is appended to the value, seconds are used as a default.
For continuous mode, the value has a different meaning.
It specifies the length of time after the module ceases execution
before it is restarted.
A value of 0 is valid only in continuous mode.
- Several options are available. Using more than one
of these options for one module does not make sense. If this happens,
the last one in the list is followed.
continuous is used to specify a module which runs in continuous
mode (as described above).
In continuous mode, the condor_ startd daemon will
attempt to keep the module running continuously;
this is used for modules that do not normally exit.
If the period is non-zero, Condor uses the period field as
an amount of time to wait after
the module exits before restarting it.
kill
For a noncontinuous mode of module execution,
the module may still be running when the period is up
and it is time to start the module again.
This option causes the module to be killed and restarted.
no kill
For a noncontinuous mode of module execution,
the module may still be running when the period is up
and it is time to start the module again.
This option allows the module to continue execution.
This is the default for noncontinuous mode.
NOTE: The configuration file parsing logic will strip whitespace from
the beginning and end of continuation lines. Thus, a job list like
below will be mis-interpretted:
# Hawkeye Job Definitions
HAWKEYE_JOBS =\
job1:prefix_:$(MODULES)/job1:5m:nokill\
job2:prefix_:$(MODULES)/job1_co:1h
HAWKEYE_JOB1_ARGS =-foo -bar
HAWKEYE_JOB1_ENV = xyzzy=somevalue
HAWKEYE_JOB2_ENV = lwpi=somevalue
Instead, you should write this as below:
# Hawkeye Job Definitions
HAWKEYE_JOBS =
# Job 1
HAWKEYE_JOBS = $(HAWKEYE_JOBS) job1:prefix_:$(MODULES)/job1:5m:nokill
HAWKEYE_JOB1_ARGS =-foo -bar
HAWKEYE_JOB1_ENV = xyzzy=somevalue
# Job 2
HAWKEYE_JOBS = $(HAWKEYE_JOBS) job2:prefix_:$(MODULES)/job2:1h
HAWKEYE_JOB2_ENV = lwpi=somevalue
- STARTD_CRON_modulename_ARGS
- The command line arguments to pass to the module to be executed.
If STARTD_CRON_NAME
is defined, then this configuration macro name is changed from
STARTD_CRON_modulename_ARGS to
$(STARTD_CRON_NAME)_modulename_ARGS.
- STARTD_CRON_modulename_ENV
- The environment string to pass to the module.
The syntax is the same as that of
DAEMONNAME_ENVIRONMENT in 3.3.7.
If STARTD_CRON_NAME
is defined, then this configuration macro name is changed from
STARTD_CRON_modulename_ENV to
$(STARTD_CRON_NAME)_modulename_ENV.
- STARTD_CRON_modulename_CWD
- The working directory in which to start the module.
If STARTD_CRON_NAME
is defined, then this configuration macro name is changed from
STARTD_CRON_modulename_CWD to
$(STARTD_CRON_NAME)_modulename_CWD.
The following macros control the optional computation of resource
availability statistics in the startd.
- STARTD_COMPUTE_AVAIL_STATS
- A boolean that determines if the startd computes resource
availability statistics. The default is False.
- STARTD_AVAIL_CONFIDENCE
- A floating point number that sets the confidence level of the
startd's AvailTime estimate. By default, the estimate is based on
the 80th percentile of past values (i.e., the macro is set to 0.8).
- STARTD_MAX_AVAIL_PERIOD_SAMPLES
- An integer that limits the number of samples of past available
intervals stored by the startd to limit memory and disk consumption.
Each sample requires 4 bytes of memory and approximately 10 bytes of
disk space.
If STARTD_COMPUTE_AVAIL_STATS = True, the startd will
define the following ClassAd attributes for resources:
- AvailTime
- What proportion of the time (between 0.0 and 1.0)
has this resource been in a state other than ``Owner''?
- LastAvailInterval
- What was the duration (in seconds) of the
last period between ``Owner'' states?
The following attributes will also be included if the resource is
not in the ``Owner'' state:
- AvailSince
- At what time did the resource last leave the
``Owner'' state? Measured in the number of seconds since the
epoch (00:00:00 UTC, Jan 1, 1970).
- AvailTimeEstimate
- Based on past history, this is an estimate
of how long the current period between ``Owner'' states will
last.
3.3.9 condor_ schedd Configuration File Entries
These macros control the condor_ schedd.
- SHADOW
- This macro determines the
full path of the condor_ shadow binary that the condor_ schedd
spawns. It is normally defined in terms of $(SBIN).
- SHADOW_PVM
- This macro
determines the full path of the special condor_ shadow.pvm binary
used for supporting PVM jobs that the condor_ schedd spawns. It is
normally defined in terms of $(SBIN).
- MAX_JOBS_RUNNING
- This
macro controls the maximum number of condor_ shadow processes
a given condor_ schedd is allowed to spawn. The actual
number of condor_ shadows may be less if you have reached
your $(RESERVED_SWAP) limit.
- MAX_SHADOW_EXCEPTIONS
- This macro controls the maximum
number of times that condor_ shadow processes can have a fatal
error (exception) before the condor_ schedd will relinquish
the match associated with the dying shadow. Defaults to 5.
- SCHEDD_INTERVAL
- This
macro determines how often the condor_ schedd sends a ClassAd
update to the condor_ collector. It is defined in terms of seconds
and defaults to 300 (every 5 minutes).
- JOB_START_DELAY
- When the
condor_ schedd has finished negotiating and has many new
machines that it has claimed, the condor_ schedd can wait
for a delay period before starting up a condor_ shadow for each job
it is going to run. The delay prevents a sudden, large load on the submit
machine as it spawns many shadows simultaneously. It prevents
having to deal
with their startup activity all at once. This macro determines how
how long the condor_ schedd should wait in between spawning each
condor_ shadow.
Similarly, this macro is also used during the graceful shutdown of the
condor_ schedd.
During graceful shutdown, this macro determines how long to wait in
between asking each condor_ shadow to gracefully shutdown.
Defined in terms of seconds and defaults to 2.
- ALIVE_INTERVAL
- This
macro determines how often the condor_ schedd should send a keep alive
message to any startd it has claimed. When the schedd claims a
startd, it tells the startd how often it is going to send these
messages. If the startd does not get one of these messages after 3
of these intervals has passed, the startd releases the claim, and
the schedd is no longer paying for the resource (in terms of
priority in the system). The macro is defined in terms of seconds
and defaults to 300 (every 5 minutes).
- SHADOW_SIZE_ESTIMATE
- This macro sets the estimated virtual memory size of each
condor_ shadow process. Specified in kilobytes. The default
varies from platform to platform.
- SHADOW_RENICE_INCREMENT
- When the schedd spawns a new
condor_ shadow, it can do so with a nice-level. A
nice-level is a
Unix mechanism that allows users to assign their own processes a lower
priority so that the processes do not interfere with interactive use of the
machine. This is very handy for keeping a submit machine with lots
of shadows running still useful to the owner of the machine. The
value can be any integer between 0 and 19, with a value of 19 being
the lowest priority. It defaults to 10.
- QUEUE_CLEAN_INTERVAL
- The schedd maintains the job queue on a given machine. It does so
in a persistent way such that if the schedd crashes, it can recover
a valid state of the job queue. The mechanism it uses is a
transaction-based log file (the job_queue.log file,
not the SchedLog file). This file contains an initial
state of the job queue, and a series of transactions that were
performed on the queue (such as new jobs submitted, jobs completing,
and checkpointing). Periodically, the schedd will go through
this log, truncate all the transactions and create a new file with
containing only the new initial state of the log.
This is a somewhat expensive operation,
but it speeds up when the schedd restarts since there are
fewer transactions it has to play to figure out what state the job
queue is really in. This macro determines how often the schedd
should rework this queue to cleaning it up. It is defined in terms of
seconds and defaults to 86400 (once a day).
- WALL_CLOCK_CKPT_INTERVAL
- The job queue contains a counter for each job's ``wall clock'' run
time, i.e., how long each job has executed so far. This counter is
displayed by condor_ q. The counter is updated when the job is
evicted or when the job completes. When the schedd crashes, the run
time for jobs that are currently running will not be added to the
counter (and so, the run time counter may become smaller than the
cpu time counter). The schedd saves run time ``checkpoints''
periodically for running jobs so if the schedd crashes, only run
time since the last checkpoint is lost. This macro controls how
often the schedd saves run time checkpoints. It is defined in terms
of seconds and defaults to 3600 (one hour). A value of 0 will
disable wall clock checkpoints.
- ALLOW_REMOTE_SUBMIT
- Starting with Condor Version 6.0, users can run condor_ submit on
one machine and actually submit jobs to another machine in the
pool. This is called a remote submit. Jobs submitted in
this way are entered into the job queue owned by the Unix user
nobody.
This macro determines whether this is allowed.
It defaults to FALSE.
- QUEUE_SUPER_USERS
- This
macro determines what user names on a given machine have
super-user access to the job queue, meaning that they can
modify or delete the job ClassAds of other users. (Normally, you
can only modify or delete ClassAds from the job queue that you own).
Whatever user name corresponds with the UID that Condor is running as
(usually the Unix user condor) will automatically be included in this list
because that is needed for Condor's proper functioning. See
section 3.7.1 on UIDs in Condor for more details on
this. By default, we give root the ability to remove other
user's jobs, in addition to user condor.
- SCHEDD_LOCK
- This macro
specifies what lock file should be used for access to the
SchedLog file. It must be a separate file from the
SchedLog, since the SchedLog may be rotated and
synchronization across log file rotations
is desired.
This macro is defined relative to the $(LOCK) macro.
If you decide to change this setting (not recommended),
be sure to change the $(VALID_LOG_FILES) entry that
condor_ preen uses as well.
- SCHEDD_EXPRS
- This macro is
correctly used as SUBMIT_EXPRS, as described in
section 3.3.4 on SUBSYS_EXPRS.
- SCHEDD_DEBUG
- This macro
(and other settings related to debug logging in the schedd) is
described in section 3.3.3 as
SUBSYS_DEBUG.
- SCHEDD_ADDRESS_FILE
- This macro is described in
section 3.3.4 as
SUBSYS_ADDRESS_FILE.
- FLOCK_NEGOTIATOR_HOSTS
-
This macro defines a list of negotiator host names (not including the
local $(NEGOTIATOR_HOST) machine) for pools in which the
schedd should attempt to run jobs. Hosts in the list should be in
order of preference. The schedd will only send a request to a
central manager in the list if the local pool and pools earlier in
the list are not satisfying all the job requests.
$(HOSTALLOW_NEGOTIATOR_SCHEDD) (see
section 3.3.4) must also be configured to allow
negotiators from all of the $(FLOCK_NEGOTIATOR_HOSTS) to
contact the schedd. Please make sure the
$(NEGOTIATOR_HOST) is first in the
$(HOSTALLOW_NEGOTIATOR_SCHEDD) list. Similarly, the
central managers of the remote pools must be configured to listen to
requests from this schedd.
- FLOCK_COLLECTOR_HOSTS
- This macro defines a list of collector host names for pools in which
the schedd should attempt to run jobs. The
collectors must be specified in order, corresponding to the
$(FLOCK_NEGOTIATOR_HOSTS) list. In the typical case, where each pool
has the collector and negotiator running on the same machine,
$(FLOCK_COLLECTOR_HOSTS) should have the same definition as
$(FLOCK_NEGOTIATOR_HOSTS).
- NEGOTIATE_ALL_JOBS_IN_CLUSTER
- If this macro is set to False (the default), when the schedd fails
to start an idle job, it will not try to start any other
idle jobs in the same cluster during that negotiation cycle. This
makes negotiation much more efficient for large job clusters.
However, in some cases other jobs in the cluster can be started even
though an earlier job can't. For example, the jobs' requirements
may differ, because of different disk space, memory, or
operating system requirements. Or, machines may be willing to run
only some jobs in the cluster, because their requirements reference
the jobs' virtual memory size or other attribute. Setting this
macro to True will force the schedd to try to start all idle jobs in
each negotiation cycle. This will make negotiation cycles last
longer, but it will ensure that all jobs that can be started will be
started.
- PERIODIC_EXPR_INTERVAL
- This macro determines the period,
in seconds, between evaluation of periodic job control expressions,
such as periodic_hold, periodic_release, and periodic_remove,
given by the user in a Condor submit file. By default, this value is
300 seconds (5 minutes). A value of 0 prevents the schedd from
performing the periodic evaluations.
3.3.10 condor_ shadow Configuration File Entries
These settings affect the condor_ shadow.
- SHADOW_LOCK
- This macro
specifies the lock file to be used for access to the
ShadowLog file. It must be a separate file from the
ShadowLog, since the ShadowLog may be rotated
and you want to synchronize access across log file rotations.
This macro is defined relative to the $(LOCK) macro.
If you decide to change this setting (not recommended),
be sure to change the $(VALID_LOG_FILES) entry that
condor_ preen uses as well.
- SHADOW_DEBUG
- This macro
(and other settings related to debug logging in the shadow) is
described in section 3.3.3 as
SUBSYS_DEBUG.
- COMPRESS_PERIODIC_CKPT
- This boolean macro specifies
whether the shadow should instruct applications to compress periodic
checkpoints (when possible). The default is FALSE.
- COMPRESS_VACATE_CKPT
- This boolean macro specifies
whether the shadow should instruct applications to compress vacate
checkpoints (when possible). The default is FALSE.
- PERIODIC_MEMORY_SYNC
- This boolean macro specifies whether the shadow should instruct
applications to commit dirty memory pages to swap space during a
periodic checkpoint. The default is FALSE. This potentially
reduces the number of dirty memory pages at vacate time, thereby
reducing swapping activity on the remote machine.
- SLOW_CKPT_SPEED
- This
macro specifies the speed at which vacate checkpoints should be
written, in kilobytes per second. If zero (the default), vacate
checkpoints are written as fast as possible. Writing vacate
checkpoints slowly can avoid overwhelming the remote machine with
swapping activity.
3.3.11 condor_ shadow.pvm Configuration File Entries
These macros control the condor_ shadow.pvm, the special shadow
that supports PVM jobs inside Condor. See
section 3.4.3 on Installing PVM Support in
Condor for details. condor_ shadow macros also apply to this
special shadow. See section 3.3.10.
- PVMD
- This macro holds the full path
to the special condor_ pvmd, the Condor PVM daemon. This daemon is
installed in the regular Condor release directory by default, so the
macro is usually defined in terms of $(SBIN).
- PVMGS
- This macro holds the full
path to the special condor_ pvmgs, the Condor PVM Group Server
daemon, which is needed to support PVM groups. This daemon is
installed in the regular Condor release directory by default, so the
macro is usually defined in terms of $(SBIN).
3.3.12 condor_ starter Configuration File Entries
These settings affect the condor_ starter.
- EXEC_TRANSFER_ATTEMPTS
- Sometimes due to a router misconfiguration, kernel bug, or other Act
of God network problem, the transfer of the initial checkpoint from
the submit machine to the execute machine will fail midway through.
This parameter allows a retry of the transfer a certain number of times
that must be equal to or greater than 1. If this parameter is not
specified, or specified incorrectly, then it will default to three.
If the transfer of the initial executable fails every attempt, then
the job goes back into the idle state until the next renegotiation
cycle.
NOTE: : This parameter does not exist in the NT starter.
- JOB_RENICE_INCREMENT
- When the condor_ starter spawns a Condor job, it can do so with a
nice-level.
A nice-level is a
Unix mechanism that allows users to assign their own processes a lower
priority, such that these processes do not interfere with interactive
use of the machine.
If you have machines with lots
of real memory and swap space, so that the only scarce resource is CPU time,
you may use this macro in conjunction with a policy that
allows Condor to always start jobs on the machines.
Condor jobs would always run,
but interactive response on your machines would never suffer.
You most likely will not notice Condor is
running jobs. See section 3.6 on
Configuring The Startd Policy for more details on setting up a
policy for starting and stopping jobs on a given machine.
The value is an arbitrary ClassAd expression,
evaluated by the condor_ starter daemon for each job just before the
job runs, and the expression can refer to any attribute in the job ClassAd.
The range of allowable values are integers in the range of 0 to 19
(inclusive),
with a value of 19 being the lowest priority.
If the expression evaluates to a value outside this range,
then on a Unix machine, a value greater than 19 is auto-decreased to 19;
a value less than 0 is treated as 0.
For values outside this range, a Windows machine ignores the value
and uses the default instead.
The default value is 10, which maps to the idle priority class on
a Windows machine.
- STARTER_LOCAL_LOGGING
- This macro determines whether the
starter should do local logging to its own log file, or send debug
information back to the condor_ shadow where it will end up in the
ShadowLog. It defaults to TRUE.
- STARTER_DEBUG
- This setting
(and other settings related to debug logging in the starter) is
described above in section 3.3.3 as
$(SUBSYS_DEBUG).
- USER_JOB_WRAPPER
- This macro
allows the administrator to specify a ``wrapper'' script to handle the
execution of all user jobs.
If specified, Condor will never directly execute a job but instead will
invoke the program specified by this macro.
The command-line arguments passed to this program will include the
full-path to the actual user job which should be executed, followed by all
the command-line parameters to pass to the user job.
This wrapper program must ultimately replace its image with the user job;
in other words, it must exec() the user job, not fork() it.
For instance, if the wrapper program is a Bourne/C/Korn shell script, the
last line of execution should be:
exec $*
3.3.13 condor_ submit Configuration File Entries
- DEFAULT_UNIVERSE
- The universe under which a job is executed may be specified in the submit
description file.
If it is not specified in the submit description file, then
this variable specifies the universe (when defined).
If the universe is not specified in the submit description
file, and if this variable is not defined, then
the default universe for a job will be the standard universe.
If you want condor_ submit to automatically append an expression to
the Requirements expression or Rank expression of
jobs at your site use the following macros:
- APPEND_REQ_VANILLA
- Expression to be appended to vanilla job requirements.
- APPEND_REQ_STANDARD
- Expression to be appended to standard job requirements.
- APPEND_REQUIREMENTS
- Expression to be appended to any type of universe jobs.
However, if APPEND_REQ_VANILLA or APPEND_REQ_STANDARD
is defined, then ignore the APPEND_REQUIREMENTS for those
universes.
- APPEND_RANK
- Expression to be appended to job rank. APPEND_RANK_STANDARD or
APPEND_RANK_VANILLA will override this setting if defined.
- APPEND_RANK_STANDARD
- Expression to be appended to standard job rank.
- APPEND_RANK_VANILLA
- Expression to append to vanilla job rank.
NOTE: The APPEND_RANK_STANDARD and
APPEND_RANK_VANILLA macros were called
APPEND_PREF_STANDARD and
APPEND_PREF_VANILLA in previous versions of Condor.
In addition, you may provide default Rank expressions if your users
do not specify their own with:
- DEFAULT_RANK_VANILLA
- Default Rank for vanilla jobs.
- DEFAULT_RANK_STANDARD
- Default Rank for standard jobs.
Both of these macros default to the jobs preferring machines where
there is more main memory than the image size of the job, expressed
as:
((Memory*1024) > Imagesize)
- DEFAULT_IO_BUFFER_SIZE
- Condor keeps a buffer of recently-used data for each file an
application opens. This macro specifies the default maximum number
of bytes to be buffered for each open file at the executing machine.
The condor_ status buffer_size command will override this
default. If this macro is undefined, a default size of 512 KB will
be used.
- DEFAULT_IO_BUFFER_BLOCK_SIZE
- When buffering is enabled,
Condor will attempt to consolidate small read and write operations
into large blocks. This macro specifies the default block size
Condor will use. The condor_ status buffer_block_size
command will override this default. If this macro is undefined, a
default size of 32 KB will be used.
- SUBMIT_SKIP_FILECHECK
- If True, condor_ submit behaves as if the -d
command-line option is used.
This tells condor_ submit to disable file permission checks when
submitting a job.
This can significantly decrease the amount of time required to submit
a large group of jobs.
The default value is False.
3.3.14 condor_ preen Configuration File Entries
These macros affect condor_ preen.
- PREEN_ADMIN
- This macro
sets the e-mail address where condor_ preen will send e-mail (if
it is configured to send email at all... see the entry for
PREEN). Defaults to $(CONDOR_ADMIN).
- VALID_SPOOL_FILES
- This
macro contains a (comma or space separated) list of files that
condor_ preen considers valid files to find in the $(SPOOL)
directory. Defaults to all the files that are valid. A change
to the $(HISTORY) macro requires a change to this
macro as well.
- VALID_LOG_FILES
- This
macro contains a (comma or space separated) list of files that
condor_ preen considers valid files to find in the $(LOG)
directory. Defaults to all the files that are valid. A change
to the names of any of the log files above requires a change to this
macro as well. In addition, the defaults for the
$(SUBSYS_ADDRESS_FILE) are listed here, so a change to
those requires a change this entry as well.
3.3.15 condor_ collector Configuration File Entries
These macros affect the condor_ collector.
- CLASSAD_LIFETIME
- This
macro determines how long a ClassAd can remain in the collector
before it is discarded as stale information. The ClassAds sent to
the collector might also have an attribute that says how long the
lifetime should be for that specific ad. If that attribute is
present, the collector will either use it or the
$(CLASSAD_LIFETIME), whichever is greater. The macro is
defined in terms of seconds, and defaults to 900 (15 minutes).
- MASTER_CHECK_INTERVAL
- This macro defines how often the
collector should check for machines that have ClassAds from some
daemons, but not from the condor_ master (orphaned daemons)
and send e-mail about it. It is defined in seconds and
defaults to 10800 (3 hours).
- CLIENT_TIMEOUT
- Network
timeout that the condor_ collector uses when talking to any daemons
or tools that are sending it a ClassAd update.
It is defined in seconds and defaults to 30.
- QUERY_TIMEOUT
- Network
timeout when talking to anyone doing a query. It is defined in seconds
and defaults to 60.
- CONDOR_DEVELOPERS
- Condor will send e-mail once per week to this address with the output
of the condor_ status command, which lists how many machines
are in the pool and how many are running jobs. Use the default
value of condor-admin@cs.wisc.edu and
the weekly status message will be sent to the Condor Team at University of
Wisconsin-Madison, the developers of Condor. The Condor Team uses
these weekly status messages in order to have some idea as to how
many Condor pools exist in the world. We appreciate
getting the reports, as this is one way we can convince funding
agencies that Condor is being used in the real world. If you do
not wish this information to be sent to the Condor Team,
set the value to NONE which disables this feature, or put in some other
address that you want the weekly status report sent to.
- COLLECTOR_NAME
- This macro is used to specify a short description of your pool.
It should be about 20 characters long. For example, the name of the
UW-Madison Computer Science Condor Pool is ``UW-Madison CS''.
- CONDOR_DEVELOPERS_COLLECTOR
- By default, every pool sends
periodic updates to a central condor_ collector at UW-Madison with
basic information about the status of your pool. This includes only
the number of total machines, the number of jobs submitted, the
number of machines running jobs, the hostname of your central
manager, and the $(COLLECTOR_NAME) specified above. These
updates help the Condor Team see how Condor is being used around the world.
By default, they will be sent to condor.cs.wisc.edu. If you don't want
these updates to be sent from your pool, set this macro to
NONE.
- COLLECTOR_SOCKET_BUFSIZE
- This specifies the buffer size, in
bytes, reserved for condor_ collector network sockets. The default is
1024000, or a one megabyte buffer. This is a healthy size, even for a large
pool. The larger this value, the less likely the condor_ collector will
have stale information about the pool due to dropping update packets. If
your pool is small or your central manager has very little RAM, considering
setting this parameter to a lower value (perhaps 256000 or 128000).
- COLLECTOR_SOCKET_CACHE_SIZE
-
If your site wants to use TCP connections to send ClassAd updates to
the collector, you must use this setting to enable a cache of TCP
sockets (in addition to enabling
UPDATE_COLLECTOR_WITH_TCP ).
Please read section 3.10.11 on ``Using TCP to
Send Collector Updates'' on page
for more details and a discussion of when you would need this
functionality.
If you do not enable a socket cache, TCP updates will be refused by
the collector.
The default value for this setting is 0, with no cache enabled.
- KEEP_POOL_HISTORY
- This boolean macro is used to decide if the collector will write
out statistical information about the pool to history files. The default
is FALSE. The location, size and frequency of history logging is controlled
by the other macros.
- POOL_HISTORY_DIR
- This macro sets the name of the directory where the history
files reside (if history logging is enabled).
The default is the SPOOL directory.
- POOL_HISTORY_MAX_STORAGE
-
This macro sets the maximum combined size of the history files.
When the size of the history files is close to this limit, the oldest
information will be discarded.
Thus, the larger this parameter's value is, the larger the time
range for which history will be available. The default value is
10000000 (10 Mbytes).
- POOL_HISTORY_SAMPLING_INTERVAL
- This macro sets the interval, in seconds, between samples for
history logging purposes.
When a sample is taken, the collector goes through the information
it holds, and summarizes it.
The information is written to the history file once for each 4
samples.
The default (and recommended) value is 60 seconds. Setting this
macro's value too low will increase the load on the collector,
while setting it to high will produce less precise statistical
information.
- COLLECTOR_DAEMON_STATS
- This macro controls whether or not the Collector keeps update
statistics on incoming updates. The default value is FALSE. If
defined, the collector will insert several attributes into ClassAds
that it stores and sends. ClassAds without the
``UpdateSequenceNumber'' and ``DaemonStartTime'' attributes will not
be counted, and will not have attributes inserted.
The attributes inserted are ``UpdatesTotal'', ``UpdatesSequenced'',
and ``UpdatesLost''. ``UpdatesTotal'' is the total number of
updates (of this ad type) the Collector has received from this host.
``UpdatesSequenced'' is the number of updates that the Collector
could have as lost. In particular, for the first update from a
daemon it's impossible to tell if any previous ones have been lost
or not. ``UpdatesLost'' is the number of updates that the Collector
has detected as being lost.
- COLLECTOR_DAEMON_HISTORY_SIZE
- This macro controls the
size of the published update history that the Collector inserts into
the ClassAds it stores and sends. The default value is zero. This
macro is ignored if $(COLLECTOR_DAEMON_STATS) is not
enabled.
If this has a non-zero value, the Collector will insert
``UpdatesHistory'' into the ClassAd (similar to ``UpdatesTotal''
above). ``UpdatesHistory'' is a hexadecimal string which represents
a bitmap of the last COLLECTOR_DAEMON_HISTORY_SIZE
updates. The most significant bit (MSB) of the bitmap represents the
most recent update, and the least significant bit (LSB) represents
the least recent. A value of zero means that the update was not
lost, and a value of 1 indicates that the update was detected as
lost.
For example, if the last update was not lost, the previous lost, and
the previous two not, the bitmap would be 0100, and the matching hex
digit would be ``4''. Note that the MSB can never be marked as lost
because it's loss can only be detected by a non-lost update (a
``gap'' is found in the sequence numbers).
- COLLECTOR_CLASS_HISTORY_SIZE
- This macro controls the
size of the published update history that the Collector inserts into
the Collector ClassAds it produces. The default value is zero.
If this has a non-zero value, the Collector will insert
``UpdatesClassHistory'' into the Collector ClassAd (similar to
``UpdatesHistory'' above). These are added ``per class'' of
ClassAd, however. The classes refer to the ``type'' of ClassAds
(i.e. ``Start''). Additionally, there is a ``Total'' class created
which represents the history of all ClassAds that this Collector
receives.
Note that the collector always publishes Lost, Total and Sequenced
counts for all ClassAd ``classes''. This is similar to the
statistics gathered if $(COLLECTOR_DAEMON_STATS) is enabled.
- COLLECTOR_QUERY_WORKERS
- This macro sets the maximum
number of ``worker'' processes that the Collector can have. When
receiving a query request, the UNIX Collector will ``fork'' a new
process to handle the query, freeing the main process to handle
other requests. When the number of outstanding ``worker'' processes
reaches this maximum, the request is handled by the main process.
This macro is ignored on Windows, and it's default value is zero.
- COLLECTOR_DEBUG
- This
macro (and other macros related to debug logging in the collector)
is described in section 3.3.3 as
SUBSYS_DEBUG.
3.3.16 condor_ negotiator Configuration File Entries
These macros affect the condor_ negotiator.
- NEGOTIATOR_INTERVAL
- Sets how often the negotiator starts a negotiation cycle. It is defined
in seconds and defaults to 300 (5 minutes).
- NEGOTIATOR_TIMEOUT
- Sets the timeout that the negotiator uses on its network connections
to the schedds and startds. It is defined in seconds and defaults to 30.
- PRIORITY_HALFLIFE
- This
macro defines the half-life of the user priorities. See
section 2.7.2
on User Priorities for details. It is defined in seconds and defaults
to 86400 (1 day).
- DEFAULT_PRIO_FACTOR
-
This macro sets the priority factor for local users. See
section 2.7.2
on User Priorities for details. Defaults to 1.
- NICE_USER_PRIO_FACTOR
-
This macro sets the priority factor for nice users. See
section 2.7.2
on User Priorities for details. Defaults to 10000000.
- REMOTE_PRIO_FACTOR
-
This macro defines the priority factor for remote users (users who
who do not belong to the accountant's local domain - see
below). See section 2.7.2
on User Priorities for details. Defaults to 10000.
- ACCOUNTANT_LOCAL_DOMAIN
-
This macro is used to decide if a user is local or remote. A user
is considered to be in the local domain if the UID_DOMAIN matches
the value of this macro. Usually, this macro is set
to the local UID_DOMAIN. If it is not defined, all users are considered
local.
- MAX_ACCOUNTANT_DATABASE_SIZE
- This macro defines the maximum size (in bytes) that the accountant
database log file can reach before it is truncated (which re-writes
the file in a more compact format).
If, after truncating, the file is larger than one half the maximum
size specified with this macro, the maximum size will be
automatically expanded.
The default is 1 megabyte (1000000).
- NEGOTIATOR_SOCKET_CACHE_SIZE
- This macro defines the
maximum number of sockets that the negotiator keeps in its
open socket cache. Caching open sockets makes the negotiation
protocol more efficient by eliminating the need for socket
connection establishment for each negotiation cycle. The default is
currently 16. To be effective, this parameter should be set to a
value greater than the number of schedds submitting jobs to the
negotiator at any time.
- PREEMPTION_REQUIREMENTS
- The negotiator will not preempt
a job running on a given machine unless the
PREEMPTION_REQUIREMENTS expression evaluates to TRUE and the
owner of the idle job has a better priority than the owner of the
running job. By default, this expression allows jobs to be
preempted only after they have run for one hour, and only if the
priority of the competing user is 20% higher, in order to prevent
``churning'' of jobs in the pool.
- PREEMPTION_RANK
- This
expression is used to rank machines that the job ranks the same.
For example, if the job has no preference, it is usually preferable
to preempt a job with a small ImageSize instead of a job with a
large ImageSize. The default is to rank all preemptable matches the
same. However, the negotiator will always prefer to match the job
with an idle machine over a preemptable machine, if the job has no
preference between them.
- NEGOTIATOR_DEBUG
- This macro
(and other settings related to debug logging in the negotiator) is
described in section 3.3.3 as SUBSYS_DEBUG.
3.3.17 condor_ eventd Configuration File Entries
These macros affect the Condor Event daemon. See
section 3.4.4 on page
for an
introduction. The eventd is not included in the main Condor binary
distribution or installation procedure. It can be installed as a
contrib module.
- EVENT_LIST
- List of macros
which define events to be managed by the event daemon.
- EVENTD_CAPACITY_INFO
- Configures the bandwidth limits used when scheduling job checkpoint
transfers before SHUTDOWN events.
The EVENTD_CAPACITY_INFO file has the same
format as the NETWORK_CAPACITY_INFO file, described in
section 3.10.9.
- EVENTD_ROUTING_INFO
- Configures the network routing information used when scheduling job
checkpoint transfers before SHUTDOWN events.
The EVENTD_ROUTING_INFO file has the same
format as the NETWORK_ROUTING_INFO file, described in
section 3.10.9.
- EVENTD_INTERVAL
- The number
of seconds between collector queries to determine pool
state. The default is 15 minutes (300 seconds).
- EVENTD_MAX_PREPARATION
- The number of minutes before a
scheduled event when the eventd should start periodically querying the
collector. If 0 (default), the eventd always polls.
- EVENTD_SHUTDOWN_SLOW_START_INTERVAL
- The number of seconds
between each machine startup after a shutdown event. The default is 0.
- EVENTD_SHUTDOWN_CLEANUP_INTERVAL
- The number of seconds
between each check for old shutdown configurations in the pool. The default
is one hour (3600 seconds).
3.3.18 condor_ gridmanager Configuration File Entries
These macros affect the condor_ gridmanager.
- GRIDMANAGER_CHECKPROXY_INTERVAL
- The number of seconds
between checks for an updated X509 proxy credential. The default
is 10 minutes (600 seconds).
- GRIDMANAGER_MINIMUM_PROXY_TIME
- The minimum number of
seconds before expiration of the X509 proxy credential for the
gridmanager to continue operation. If seconds until expiration is
less than this number, the gridmanager will shutdown and wait for
a refreshed proxy credential. The default is 3 minutes (180 seconds).
- HOLD_JOB_IF_CREDENTIAL_EXPIRES
- True or False.
Defaults to True.
If True, and for globus universe jobs only,
Condor-G will place a job on hold
GRIDMANAGER_MINIMUM_PROXY_TIME seconds
before the proxy expires.
If False,
the job will stay in the last known state,
and Condor-G will periodically check to see if the job's proxy has been
refreshed, at which point management of the job will resume.
- GRIDMANAGER_CONTACT_SCHEDD_DELAY
- The minimum number of
seconds between connections to the schedd. The default is 5 seconds.
- GRIDMANAGER_JOB_PROBE_INTERVAL
- After a job is submitted, how often (in seconds) the condor_ gridmanager
should probe the remote jobmanager to ensure it is still alive and well.
- GRIDMANAGER_JOB_PROBE_DELAY
- The number of seconds between
active probes of the status of a submitted job. The default is 5
minutes (300 seconds).
- GRIDMANAGER_RESOURCE_PROBE_INTERVAL
- When a resource appears to be down, how often (in seconds) the
condor_ gridmanager
should ping it to test if it is up again.
- GRIDMANAGER_RESOURCE_PROBE_DELAY
- The number of seconds
between pings of a remote resource that is currently down. The default
is 5 minutes (300 seconds).
- GRIDMANAGER_MAX_SUBMITTED_JOBS_PER_RESOURCE
- Limits the number of jobs
that a condor_ gridmanager daemon will submit to a resource.
It is useful for controlling the number of jobmanager
processes running on the front-end node of a cluster.
This number may be exceeded if it is reduced through the use
of condor_ reconfig while the condor_ gridmanager is running
or if the condor_ gridmanager receives new
jobs from the condor_ schedd that were already submitted
(that is, their GlobusContactString is not "X").
In these cases, submitted jobs will not be killed,
but no new jobs can be submitted until the number of submitted
jobs falls below the current limit.
- GRIDMANAGER_MAX_PENDING_SUBMITS_PER_RESOURCE
- The maximum
number of jobs
that can be in the process of being submitted at any time (that is,
how many globus_gram_client_job_request calls are pending).
It is useful for controlling the number of new
connections/processes created at a given time.
The default value is 5.
This variable allows
you to set different limits for each resource.
After the first integer in the value
comes a list of resourcename/number pairs,
where each number is the limit for that resource.
If a resource is not in the list,
Condor uses the first integer.
An example usage:
GRIDMANAGER_MAX_PENDING_SUBMITS_PER_RESOURCE=20,nostos,5,beak,50
- GRIDMANAGER_MAX_PENDING_SUBMITS
- Configuration variable
still recognized, but the name has changed to be
GRIDMANAGER_MAX_PENDING_SUBMITS_PER_RESOURCE.
- GAHP
- The full path to the binary of the GAHP server.
- GAHP_ARGS
- Arguments to be passed to the GAHP server.
- GRIDMANAGER_GAHP_CALL_TIMEOUT
- The number of seconds after
which a pending GAHP command should time out. The default is 5 minutes
(300 seconds).
- GRIDMANAGER_MAX_PENDING_REQUESTS
- The maximum number of GAHP
commands that can be pending at any time. The default is 50.
- GRIDMANAGER_CONNECT_FAILURE_RETRY_COUNT
- The number of times
to retry a command that failed due to a timeout or a failed connection.
The default is 3.
- GRIDMANAGER_SYNC_JOB_IO_INTERVAL
- The number of seconds between
periodic syncs of streamed output to disk. The default is 5 minutes
(300 seconds).
- GLOBUS_GATEKEEPER_TIMEOUT
- The number of seconds after
which if a globus
universe job fails to ping the gatekeeper,
the job will be put on hold.
Defaults to 5 days (in seconds).
3.3.19 Configuration File Entries Relating to Security
These macros affect the secure operation of Condor.
- GSI_DAEMON_NAME
- A comma separated list of the subject
name(s) of the certificate(s) that the daemons use.
- GSI_DAEMON_DIRECTORY
- A directory name used in the
construction of complete paths for the configuration variables
GSI_DAEMON_CERT,
GSI_DAEMON_KEY, and
GSI_DAEMON_TRUSTED_CA_DIR,
for any of these configuration variables are not explicitly set.
- GSI_DAEMON_CERT
- A complete path and file name to the
X.509 certificate to be used in GSI authentication.
If this configuration variable is not defined, and
GSI_DAEMON_DIRECTORY is defined, then Condor uses
GSI_DAEMON_DIRECTORY to construct the path and file name as
GSI_DAEMON_CERT = $(GSI_DAEMON_DIRECTORY)/hostcert.pem
- GSI_DAEMON_KEY
- A complete path and file name to the
X.509 private key to be used in GSI authentication.
If this configuration variable is not defined, and
GSI_DAEMON_DIRECTORY is defined, then Condor uses
GSI_DAEMON_DIRECTORY to construct the path and file name as
GSI_DAEMON_KEY = $(GSI_DAEMON_DIRECTORY)/hostkey.pem
- GSI_DAEMON_TRUSTED_CA_DIR
- The directory that contains the
list of trusted certification authorities to be used in GSI authentication.
The files in this directory are the public keys and signing policies
of the trusted certification authorities.
If this configuration variable is not defined, and
GSI_DAEMON_DIRECTORY is defined, then Condor uses
GSI_DAEMON_DIRECTORY to construct the directory path as
GSI_DAEMON_TRUSTED_CA_DIR = $(GSI_DAEMON_DIRECTORY)/certificates
- GSI_DAEMON_PROXY
- A complete path and file name to the
X.509 proxy to be used in GSI authentication.
When this configuration variable is defined, use of this proxy
takes precedence over use of a certificate and key.
- SEC_DEFAULT_SESSION_DURATION
- The amount of time in seconds before
a communication session expires.
Defaults to 8640000 seconds (100 days) to avoid a bug in session
renegotiation for Condor Version 6.6.0.
A session is a record of necessary information to do communication
between a client and daemon, and is protected by a shared secret key.
The session expires to reduce the window of opportunity where
the key may be compromised by attack.
3.3.20 Root Config Files
Next: 3.4 Contrib Module Installation
Up: 3. Administrators' Manual
Previous: 3.2 Installation
Contents
Index
condor-admin@cs.wisc.edu