Condor can be downloaded from http://www.cs.wisc.edu/condor/downloads (Madison, Wisconsin, USA) or http://www.bo.infn.it/condor-mirror/downloads (a mirror site at the Istituto Nazionale di Fisica Nucleare in Bologna, Italy).
If you are trying to download Condor through a web proxy, try disabling it. Our web site uses the ``referring page'' as you navigate through our download menus in order to give you the right version of Condor, but sometimes proxies block this information from reaching our web site.
See Section 1.5, on
page .
Also, you might want to read the platform-specific information in
Chapter 6 on page
.
See Section 6.1 on
page .
At this time we do not distribute source code publicly, but instead consider requests on a case-by-case basis. If you need the source code, please e-mail us at condor-admin@cs.wisc.edu explaining why, and we'll get back to you.
This series of steps explains how to upgrade a pool of machines from running Condor version 6.2.x to version 6.4.x. Read through the entire set of directions before following them.
Briefly, the steps are to download the new version in order to replace your current binaries with the new binaries. Condor will notice that there are new binaries, since it checks for this every few minutes. The next time it checks, the new binaries will be used.
Manufacture test jobs that utilize each universe you use in your Condor pool. Submit each job, and put the job in the hold state, using condor_ hold.
cd <release-dir> mkdir new cd new
Locate the correct Version 6.6.0 binary, and download into this new directory.
Do not install the downloaded version. Do uncompress and then untar the downloaded version. Further untar the release directory (called release.tar). This will create the directories
bin etc include sbin lib manFrom this list of created directories, bin, include, sbin, and lib will be used to replace current directories.
Make a backup copy of the current configuration, to safeguard backing out of the upgrade, if something goes wrong.
Work through the new Version 6.6.0 example configuration file to see if there is anything useful and merge with your site-specific (current) configuration file.
cd <release-dir> mv bin bin.v62 mv new/bin bin mv include include.v62 mv new/include include mv sbin sbin.v62 mv new/sbin sbin mv lib lib.v62 mv new/lib lib
Do this series of directory moves at one sitting, especially avoiding a long time lag between the moves relating to the sbin directory. Condor imposes a delay by design, but it does not idly wait for the new binaries to be in place.
Use condor_ status to observe the propagation of the upgrade through the pool. As the machines notice and use the new binaries, their version number will change. Complete propagation should occur in five to ten minutes.
The command
condor_status -format "%s" Machine -format " %s\n" CondorVersiongives a single line of information about each machine in the pool, containing only the machine name and version of Condor it is running.
The man directory is new with Condor version 6.4. It contains manual pages. Note that installation of manual pages is optional; the chapter containing manual pages are in section 9.
To install the manual pages, move the man directory from <release-dir>/new to the desired location. Add the path name to this directory to the MANPATH.
Personal Condor is a term used to describe a specific style of Condor installation suited for individual users who do not have their own pool of machines, but want to submit Condor jobs to run elsewhere.
A Personal Condor is essentially a one-machine, self-contained Condor
pool which can use flocking to access resources in other Condor
pools.
See Section 5.2, on page for
more information on flocking.
What to do to get Condor running properly depends on what sort of error occurs. One common error category are communication errors. Condor daemon log files report a failure to bind. For example:
(date and time) Failed to bind to command ReliSock
Or, the errors in the various log files may be of the form:
(date and time) Error sending update to collector(s) (date and time) Can't send end_of_message (date and time) Error sending UDP update to the collector (date and time) failed to update central manager (date and time) Can't send EOM to the collector
This problem can also be observed by running condor_ status. It will give a message of the form:
Error: Could not fetch ads --- error communication error
To solve this problem, understand that Condor uses the first network interface it sees on the machine. Since machines often have more than one interface, this problem usually implies that the wrong network interface is being used. It also may be the case that the system simply has the wrong IP address configured.
It is incorrect to use the localhost network interface. This has IP address 127.0.0.1 on all machines. To check if this incorrect IP address is being used, look at the contents of the CollectorLog file on the pool's your central manager right after it is started. The contents will be of the form:
5/25 15:39:33 ****************************************************** 5/25 15:39:33 ** condor_collector (CONDOR_COLLECTOR) STARTING UP 5/25 15:39:33 ** $CondorVersion: 6.2.0 Mar 16 2001 $ 5/25 15:39:33 ** $CondorPlatform: INTEL-LINUX-GLIBC21 $ 5/25 15:39:33 ** PID = 18658 5/25 15:39:33 ****************************************************** 5/25 15:39:33 DaemonCore: Command Socket at <128.105.101.15:9618>
The last line tells the IP address and port the collector has bound to and is listening on. If the IP address is 127.0.0.1, then Condor is definitely using the wrong network interface.
There are two solutions to this problem. One solution changes the order of the network interfaces. The preferred solution sets which network interface Condor should use by adding the following parameter to the local Condor configuration file:
NETWORK_INTERFACE = machine-ip-address
Where machine-ip-address
is the IP address of the interface you wish
Condor to use.