Next: 8.7 Development Release Series
Up: 8. Condor Version History
Previous: 8.5 Development Release Series
Contents
Index
Subsections
8.6 Stable Release Series 6.2
This is the second stable release series of Condor.
All of the new features developed in the 6.1 series are now considered
stable, supported features of Condor.
New releases of 6.2.0 should happen infrequently and will only include
bug fixes and support for new platforms.
New features will be added and tested in the 6.3 development series.
The details of each version are described below.
8.6.1 Version 6.2.2
New Features:
- Man pages are now included in the Condor distribution for UNIX.
- Condor administrators can now specify
CONDOR_SUPPORT_EMAIL in their config file.
This address is included at the bottom of all email Condor sends out.
Previously, the CONDOR_ADMIN setting was used for this, but
at many sites, the address where the Condor daemons should send email
about administrative problems is not the same address that users
should use for technical support.
If your site has different addresses for these two things, you can now
specify them properly.
- There is a new macro automatically defined by Condor for use in
your config files: IP_ADDRESS .
If you refer to $(IP_ADDRESS), it will be replaced with the
ASCII string version of the local host's IP address.
Bugs Fixed:
- Fixed a bug with the scheduler universe where if one or more of
the stderr, stdout, or stdin files of a scheduler universe job couldn't
be opened, the condor_ schedd would leak file descriptors.
- Fixed the default startd policy expressions that are shipped
with Condor.
When you upgrade Condor, if you keep your old condor_config
file and do not use the new one we ship with the Condor binaries, we
highly recommend you open the <release_dir>/etc/condor/config.generic
file and see what's changed.
Carefully read part 3, see what's different from your existing policy
expressions, and make the relevent changes to your own expressions.
Here is a summary of the improvements made:
- The policy expressions now use CpuBusyTime to avoid
problems with faulty values for CondorLoadAvg.
For certain kinds of jobs (mostly vanilla jobs), the old policy
could cause the jobs to cycle between the running and suspended
states.
- The policy macros that refered to JobUniverse and
ImageSize now use ``TARGET.'' to ensure we're refering to the
right ClassAd to find those attributes.
Our old policy used to make important decisions based on the
universe and image size of the job it was planning to run, but
instead, it was using the values from the job that was currently
running.
- Added verbose comments to explain the policy in English.
- If condor_ compile can not find the condor_ config_val
program, condor_ compile can not work.
In previous versions, condor_ compile would try to perform the
compilation and fail in an unclear way with cryptic error messages.
Now, condor_ compile detects this case, prints out a verbose error
message, and exits.
- The FILESYSTEM_DOMAIN and UID_DOMAIN settings were
not being automatically append to the requirements of a Vanilla-universe job,
possibly causing it to run on machines where it will not successfully run. This
has been fixed.
- The getdirentries call was not being trapped, causing a few applications
to fail running inside of Condor.
- When the NT shadow and the Unix shadow were used in conjunction with each
other during the submission of heterogeneous jobs, they conflicted in where
they should store their Condor internal files. This would cause extremely
long hangs where the job was listed as running but no job actually started.
This has been fixed and now you can mix the NT shadow and Unix shadow
together just fine during heterogeneous submits.
- Numerous additions/clarifications to sections of the manual to bring
the manual up to date with the source base.
- Fixed a bug where if you set MAX_JOBS_RUNNING to zero in the
config files, the schedd would fail with a floating point error.
- PVM support for Solaris 2.8 was mistakenly turned off in 6.2.1,
it has been turned back on again.
- Added exit status of Condor tools and daemons to the manual.
- Fixed a bug in the schedd where it would segfault if it manipulated
class ads in a certain way.
- Fixed stat() and lstat() to ask the shadow where the
file was that it was trying to locate instead of assuming it was always going
to be remote I/O.
- Fixed a bug where Condor would incorrectly inform you that it
didn't have read/write permission on a file located in NFS when you
actually did have permission to read/write it.
- Removed mention of condor_ master_off since it was deprecated a
long time ago and is not available anymore.
- Removed mention of condor_ reconfig_schedd since it was deprecated a
long time ago and is not available anymore.
- Fixed a bug where the schedd would occasionally EXCEPT with an
``ATTR_JOB_STATUS not found'' error soon after a condor_ rm
command had been invoked.
- Fixed a bug in the Negotiator and the Schedd where it would segfault
during reading of a corrupted log file.
- Fixed a bug where we used to EXCEPT if we couldn't email the
administrator of the pool. We now do not EXCEPT anymore.
- In the condor_ startd, messages were being printed into the log
file about load average and idle time computations whenever
D_ FULLDEBUG was included in the STARTD_DEBUG config
file setting.
Now, to see the periodic messages printing out the current load
averages, you must include D_ LOAD in STARTD_DEBUG , and
to see the messages about idle time, you must include D_ IDLE.
- In most of the Condor tools (condor_ on, condor_ off,
condor_ restart, condor_ vacate, condor_ checkpoint,
condor_ reconfig, condor_ reschedule), if you used -pool to
specify which pool to query to find specific daemons, but did not
specify any daemons or machines, the tool would just act on the local
host.
Now, if you specify -pool and do not list any machines, these
Condor tools will print an error message and exit.
- If you specified a file in the CONDOR_CONFIG environment
variable, all Condor daemons and tools used to silently exit.
Now, they print an error message about it and exit.
Also, if there were any problems with the file you specified in
the CONDOR_CONFIG environment variable, Condor used to try
other options, namely the ``condor'' user's home directory and
/etc/condor_config.
Now, if you specify something in CONDOR_CONFIG and Condor
cannot read it, Condor will print an error and exit, instead of
searching the other locations.
Known Bugs:
- condor_ compile on DIGITAL UNIX 4.0 with the Vendor cc compiler now
passes a different option to specify the path to find the Condor libraries. We
believe that this new option is correct for all versions of the compiler, but
do not have enough testing options to confirm this.
- Condor does not work on international version of Windows 2000. It has
only been tested with the US version.
8.6.2 Version 6.2.1
New Features:
- The condor_ userlog command is now available on Windows NT.
- Jobs run in stand-alone checkpointing mode can now take a -_condor_nowarn
argument, which silences the warnings from the system call library when you
perform a checkpoint-unsafe action, such as opening a file for reading and
writing.
Bugs Fixed:
- When using heterogeneous specifications of an executable between
NT and UNIX, Condor could get confused if the vanilla job had run on
an NT machine, vacated without being completed, and then restarted as
a standard universe job on UNIX. The job would be labeled as running in
the queue, but not perform any work. This has been fixed.
- The entries in the environment
option in a submit file now correctly override the variables brought in
from the getenv option on Windows NT.
In previous version of CondorNT, the job would get an environment with the
variable defined multiple times. This bug did not affect UNIX versions of
Condor.
- Some service packs of Windows NT had bugs that prevented Condor
from determining the file permissions on input and output files. 6.2.1
uses a different set of API's to determine the permissions and works
properly across all service packs
- In versions of Condor previous to 6.2.0, the registry would slowly
grow on Windows NT and sometimes become corrupted. This was fixed in 6.2.0,
but if a previously-corrupted registry was detected Condor aborted. In 6.2.1,
this has been turned into a warning, as it doesn't need to be a fatal error.
- Fixed a memory-corruption bug in the condor_ collector
- PVM resources in Condor were unable to have more than one
@ symbol in a name.
- The TRANSFER_FILES is now set to ON_EXIT
on UNIX by default for the vanilla universe. Previously, users submitting
from UNIX to NT needed to explicitly enable it or include the executable in
the list of input files for the job to run.
- If TRANSFER_FILES was set to TRUE
files created during the job's run would be transfered whenever the job was
vacated and transfered to the next machine the job ran on, but would not be
transfered back to the submit machine when the job finally exited for the last time.
- Determining the current working directory was broken in stand-alone
checkpointing.
- A job's standard output and standard error can now go to the same file.
- When the START_HAS_BAD_UTMP is set to TRUE, the
condor_ startd now detects activity on the
/dev/pts
devices.
- The condor_ negotiator in 6.2.0 could incorrectly reject a job
that should have been successfully matched if it previously rejected a
job. If the same jobs were sent to the condor_ negotiator in a different
order, the match that should succeed would. In 6.2.1, the order is no longer
important, and previous rejections will not prevent future matches.
- The getdents, getdirents, and statfs system calls now work correctly in
cross-platform submissions.
- condor_ compile is better able to detect which version of Linux
it is running on and which flags it should pass to the linker. This should
help Condor users on non-Red Hat distributions.
- Fixed a bug in the condor_ startd that would cause the daemon
to crash if you set the POLLING_INTERVAL macro to a value
greater than 60.
- In condor_ q, dash-arguments (e.g., -pool, -run, etc.) were being
parsed incorrectly such that the same arguments specified without a
dash would be interpreted as if the dash were present, making it
impossible to specify ``pool'' or ``globus'' or ``run'' as an owner
argument.
- Fixed bug in condor_ submit that would cause certain submit
file directives to be silently ignored if you used the wrong attribute
name.
Now, all submit file attributes can use the same names you see in the
job ClassAd (what you'd see with condor_ q -long.
For example, you can now use ``CoreSize = 0'' or ``core_size = 0''
in your submit file, and either one would be recognized.
- A static limit on the number of clusters the condor_ schedd
would accept from the condor_ negotiator was removed.
- On Windows NT, if a job's log file was in a non-existent location,
both the condor_ submit and the condor_ schedd would crash.
- Encounting unsupported system calls could cause Condor to corrupt the
signal state of the job.
- Fixed some of the error messages in condor_ submit so that they
are all consistently formatted.
- Fixed a bug in the Linux standard universe where calloc(2)
would not return zero filled memory.
- condor_ rm, condor_ hold and condor_ release will now return
a non-zero exit status on failure, and only return 0 on success.
Previously, they always returned status 0.
- If a user accidentally put
notify_user = false
in their submit file, Condor used to treat that
as a valid entry.
Now, condor_ submit prints out a warning in this case, telling the
user that they probably want to use
notification = never
instead.
Known Bugs:
- It may be possible to checkpoint with an open socket on IRIX 6.2.
On restart, the job will abort and go back into the queue.
8.6.3 Version 6.2.0
New Features Over the 6.0 Release Series
- Support for running multiple jobs on SMP (Symmetric Multi-Processor)
machines.
New Features Over the Last Development Series: 6.1.17
- If CkptArch isn't specified in the job submission file's
Requirements attribute, then automatically add this expression:
CkptRequirements = ((CkptArch == Arch) || (CkptArch =?= UNDEFINED)) &&
((CkptOpSys == OpSys) || (CkptOpSys =?= UNDEFINED))
to the Requirements expression. This allows for users who specify
a heterogeneous submission to not have to worry about having their checkpoints
incorrectly starting up on architectures for which they were not designed
to run.
- The APPEND_REQ_<universe> config file entries now get
appended to the beginning of the expressions before Condor adds internal
default expressions. This allows the sysadmin to override any default
policy that Condor enforces.
- There is now a single APPEND_REQUIREMENTS attribute
that will get appended to all universe's Requirements
expressions unless a specific APPEND_REQ_STANDARD or
APPEND_REQ_VANILLA expression is defined.
- Increased certain networking parameters to help alleviate the
condor_ shadow's inability to contact the condor_ schedd during heavy load
of the system.
- Added a condor_ glidein man page to the manual.
- Some of the log messages in the condor_ startd were modified to
be more clear and to provide more information.
- Added a new attribute to the condor_ startd ClassAd when the
machine is claimed, RemoteOwner.
Bugs fixed since 6.1.17
- On NT, the Registry would increase in size while Condor was
servicing jobs. This has been fixed.
- Added utmpx support for Solaris 2.8 to fix a problem where
KeyBoardIdle wasn't being set correctly.
- When doing a condor_ hold under NT, the job was removed instead of
held. This has been fixed.
- When using the -master argument tocondor_ restart, the
condor_ master used to exit instead of restarting.
Now, the condor_ master correctly restarts itself in this case.
Known Bugs:
- STARTD_HAS_BAD_UTMP does not work if set to True on Solaris
2.8. However, since utmpx support is enabled, you shouldn't
need to do this normally.
- condor_ kbdd doesn't work properly under Compaq Tru64 5.1, and
as a result, resources may not leave the ``Unclaimed'' state
regardless of keyboard or pty activity. Compaq Tru64 5.0a and earlier
do work properly.
Next: 8.7 Development Release Series
Up: 8. Condor Version History
Previous: 8.5 Development Release Series
Contents
Index
condor-admin@cs.wisc.edu