next up previous contents index
Next: 7. Frequently Asked Questions Up: 6. Platform-Specific Information Previous: 6.1 Linux   Contents   Index

Subsections


6.2 Microsoft Windows

Welcome to Condor for Windows NT! Windows NT is a strategic platform for Condor, and therefore we have been working toward a complete port to Windows NT. Our goal is to make Condor every bit as capable on Windows NT as it is on Unix - or even more capable.

Porting Condor from Unix to Windows NT is a formidable task, because many components of Condor must interact closely with the underlying operating system. Instead of waiting until all components of Condor are running and stabilized on Windows NT, we have decided to make a clipped version of Condor for Windows NT. A clipped version is one in which there is no checkpointing and there are no remote system calls.

This section contains additional information specific to running Condor on Windows NT. Eventually this information will be integrated into the Condor Manual as a whole, and this section will disappear. In order to effectively use Condor NT, first read the overview chapter (section 1.1) and the user's manual (section 2.1). If you will also be administrating or customizing the policy and set up of Condor NT, also read the administrator's manual chapter (section 3.1). After reading these chapters, review the information in this chapter for important information and differences when using and administrating Condor on Windows NT. For information on installing Condor for Windows, see section 6.2.7.

6.2.1 What is missing from Condor NT Version 6.6.0?

In general, this release on NT works the same as the release of Condor for Unix. However, the following items are not supported in this version:

6.2.2 What is included in Condor NT Version 6.6.0?

Except for those items listed above, most everything works the same way in Condor NT as it does in the Unix release. This release is based on the Condor Version 6.6.0 source tree, and thus the feature set is the same as Condor Version 6.6.0 for Unix. For instance, all of the following work in Condor NT:

6.2.3 Details on how Condor NT starts/stops a job

This section provides some details on how Condor NT starts and stops jobs. This discussion is geared for the Condor administrator or advanced user who is already familiar with the material in the Administrators' Manual and wishes to know detailed information on what Condor NT does when starting and stopping jobs.

When Condor NT is about to start a job, the condor_ startd on the execute machine spawns a condor_ starter process. The condor_ starter then creates:

  1. a new temporary run account on the machine with a login name of ``condor-run-dir_XXX'', where XXX is the process ID of the condor_ starter. This account is added to group Users and group Everyone.

  2. a new temporary working directory for the job on the execute machine. This directory is named ``dir_XXX'', where XXX is the process ID of the condor_ starter. The directory is created in the $(EXECUTE) directory as specified in Condor's configuration file. Condor then grants write permission to this directory for the user account newly created for the job.

  3. a new, non-visible Window Station and Desktop for the job. Permissions are set so that only the user account newly created has access rights to this Desktop. Any windows created by this job are not seen by anyone; the job is run in the background.

Next, the condor_ starter (called the starter) contacts the condor_ shadow (called the shadow) process, which is running on the submitting machine, and pulls over the job's executable and input files. These files are placed into the temporary working directory for the job. After all files have been received, the starter spawns the user's executable as user ``condor-reuse-vmX'', where X is the number of the virtual machine (or 1 on a uniprocessor machine). Its current working directory set to the temporary working directory (that is, $(EXECUTE)/dir_XXX, where XXX is the process id of the condor_ starter daemon).

While the job is running, the starter closely monitors the CPU usage and image size of all processes started by the job. Every 20 minutes the starter sends this information, along with the total size of all files contained in the job's temporary working directory, to the shadow. The shadow then inserts this information into the job's ClassAd so that policy and scheduling expressions can make use of this dynamic information.

If the job exits of its own accord (that is, the job completes), the starter first terminates any processes started by the job which could still be around if the job did not clean up after itself. The starter examines the job's temporary working directory for any files which have been created or modified and sends these files back to the shadow running on the submit machine. The shadow places these files into the initialdir specified in the submit description file; if no initialdir was specified, the files go into the directory where the user invoked condor_ submit. Once all the output files are safely transferred back, the job is removed from the queue. If, however, the condor_ startd forcibly kills the job before all output files could be transferred, the job is not removed from the queue but instead switches back to the Idle state.

If the condor_ startd decides to vacate a job prematurely, the starter sends a WM_CLOSE message to the job. If the job spawned multiple child processes, the WM_CLOSE message is only sent to the parent process (that is, the one started by the starter). The WM_CLOSE message is the preferred way to terminate a process on Windows NT, since this method allows the job to cleanup and free any resources it may have allocated. When the job exits, the starter cleans up any processes left behind. At this point, if transfer_files is set to ONEXIT (the default) in the job's submit description file, the job switches from states, from Running to Idle, and no files are transferred back. If transfer_files is set to ALWAYS, then any files in the job's temporary working directory which were changed or modified are first sent back to the submitting machine. But this time, the shadow places these so-called intermediate files into a subdirectory created in the $(SPOOL) directory on the submitting machine ($(SPOOL) is specified in Condor's configuration file). The job is then switched back to the Idle state until Condor finds a different machine on which to run. When the job is started again, Condor places into the job's temporary working directory the executable and input files as before, plus any files stored in the submit machine's $(SPOOL) directory for that job.

NOTE: A Windows console process can intercept a WM_CLOSE message via the Win32 SetConsoleCtrlHandler() function if it needs to do special cleanup work at vacate time; a WM_CLOSE message generates a CTRL_CLOSE_EVENT. See SetConsoleCtrlHandler() in the Win32 documentation for more info.

NOTE: The default handler in Windows NT for a WM_CLOSE message is for the process to exit. Of course, the job could be coded to ignore it and not exit, but eventually the condor_ startd will get impatient and hard-kill the job (if that is the policy desired by the administrator).

Finally, after the job has left and any files transferred back, the starter deletes the temporary working directory, the temporary account, the WindowStation and the Desktop before exiting itself. If the starter should terminate abnormally, the condor_ startd attempts the clean up. If for some reason the condor_ startd should disappear as well (that is, if the entire machine was power-cycled hard), the condor_ startd will clean up when Condor is restarted.

6.2.4 Security considerations in Condor NT

On the execute machine, the user job is run using the access token of an account dynamically created by Condor which has bare-bones access rights and privileges. For instance, if your machines are configured so that only Administrators have write access to C:\WINNT, then certainly no Condor job run on that machine would be able to write anything there. The only files the job should be able to access on the execute machine are files accessible by group Everybody and files in the job's temporary working directory.

On the submit machine, Condor permits the File Transfer mechanism to only read files which the submitting user has access to read, and only write files to which the submitting user has access to write. For example, say only Administrators can write to C:\WINNT on the submit machine, and a user gives the following to condor_ submit :

         executable = mytrojan.exe
         initialdir = c:\winnt
         output = explorer.exe
         queue
Unless that user is in group Administrators, Condor will not permit explorer.exe to be overwritten.

If for some reason the submitting user's account disappears between the time condor_ submit was run and when the job runs, Condor is not able to check and see if the now-defunct submitting user has read/write access to a given file. In this case, Condor will ensure that group ``Everyone'' has read or write access to any file the job subsequently tries to read or write. This is in consideration for some network setups, where the user account only exists for as long as the user is logged in.

Condor also provides protection to the job queue. It would be bad if the integrity of the job queue is compromised, because a malicious user could remove other user's jobs or even change what executable a user's job will run. To guard against this, in Condor's default configuration all connections to the condor_ schedd (the process which manages the job queue on a given machine) are authenticated using Windows NT's SSPI security layer. The user is then authenticated using the same challenge-response protocol that NT uses to authenticate users to Windows NT file servers. Once authenticated, the only users allowed to edit job entry in the queue are:

  1. the user who originally submitted that job (i.e. Condor allows users to remove or edit their own jobs)
  2. users listed in the condor_config file parameter QUEUE_SUPER_USERS. In the default configuration, only the ``SYSTEM'' (LocalSystem) account is listed here.
WARNING: Do not remove ``SYSTEM'' from QUEUE_SUPER_USERS, or Condor itself will not be able to access the job queue when needed. If the LocalSystem account on your machine is compromised, you have all sorts of problems!

To protect the actual job queue files themselves, the Condor NT installation program will automatically set permissions on the entire Condor release directory so that only Administrators have write access.

Finally, Condor NT has all the IP/Host-based security mechanisms present in the full-blown version of Condor. See section 3.7.5 starting on page [*] for complete information on how to allow/deny access to Condor based upon machine host name or IP address.

6.2.5 Interoperability between Condor for Unix and Condor NT

Unix machines and Windows NT machines running Condor can happily co-exist in the same Condor pool without any problems. Jobs submitted on Windows NT can run on Windows NT or Unix, and jobs submitted on Unix can run on Unix or Windows NT. Without any specification (using the requirements expression in the submit description file), the default behavior will be to require the execute machine to be of the same architecture and operating system as the submit machine.

There is absolutely no need to run more than one Condor central manager, even if you have both Unix and NT machines. The Condor central manager itself can run on either Unix or NT; there is no advantage to choosing one over the other. Here at University of Wisconsin-Madison, for instance, we have hundreds of Unix (Solaris, Linux, Irix, etc) and Windows NT machines in our Computer Science Department Condor pool. Our central manager is running on Windows NT. All is happy.

6.2.6 Some differences between Condor for Unix -vs- Condor NT


6.2.7 Installation on Windows

This section contains the instructions for installing the Microsoft Windows NT version of Condor (Condor NT) at your site. The install program will set you up with a slightly customized configuration file that you can further customize after the installation has completed.

Please read the copyright and disclaimer information in section [*] on page [*] of the manual, or in the file LICENSE.TXT, before proceeding. Installation and use of Condor is acknowledgement that you have read and agreed to these terms.

Be sure that the Condor tools that get run are of the same version as the daemons installed. If they were not (such as 6.5.3 daemons, when running 6.4 condor_ submit), then things will not work. There may be errors generated by the condor_ schedd daemon (in the log). It is likely that a job would be correctly placed in the queue, but the job will never run.

The Condor NT executable for distribution is packaged in a single file such as:

  condor-6.1.8_preview-WINNT40-x86.exe

This file is approximately 5 Mbytes in size, and may be removed once Condor is fully installed.

Before installing Condor, please consider joining the condor-world mailing list. Traffic on this list is kept to an absolute minimum. It is only used to announce new releases of Condor. To subscribe, send an email to majordomo@cs.wisc.edu with the body:

   subscribe condor-world

6.2.7.1 Installation Requirements


6.2.7.2 Preparing to Install Condor under Windows NT

Before you install the Windows NT version of Condor at your site, there are two major decisions to make about the basic layout of your pool.

  1. What machine will be the central manager?
  2. Do I have enough disk space for Condor?

If you feel that you already know the answers to these questions, skip to the Windows NT Installation Procedure section below, section 6.2.7 on page [*]. If you are unsure, read on.


6.2.7.3 Installation Procedure using the included Setup Program

Installation of Condor must be done by a user with administrator privileges. After installation, the Condor services will be run under the local system account. When Condor is running a user job, however, it will run that User job with normal user permissions. Condor will dynamically create an account, and then delete that account when the job is finished or is removed from the machine.

Download Condor, and start the installation process by running the file (or by double clicking on the file). The Condor installation is completed by answering questions and choosing options within the following steps.

If Condor is already installed.

For upgrade purposes, you may be running the installation of Condor after it has been previously installed. In this case, a dialog box will appear before the installation of Condor proceeds. The question asks if you wish to preserve your current Condor configuration files. Answer yes or no, as appropriate.

If you answer yes, your configuration files will not be changed, and you will proceed to the point where the new binaries will be installed.

If you answer no, then there will be a second question that asks if you want to use answers given during the previous installation as default answers.

STEP 1: License Agreement.

The first step in installing Condor is a welcome screen and license agreement. You are reminded that it is best to run the installation when no other Windows programs are running. If you need to close other Windows NT programs, it is safe to cancel the installation and close them. You are asked to agree to the license. Answer yes or no. If you should disagree with the License, the installation will not continue.

After agreeing to the license terms, the next Window is where fill in your name and company information, or use the defaults as given.

STEP 2: Condor Pool Configuration.

The Condor NT installation will require different information depending on whether the installer will be creating a new pool, or joining an existing one.

If you are creating a new pool, the installation program requires that this machine is the central manager. For the creation of a new Condor pool, you will be asked some basic information about your new pool:

Name of the pool
hostname
of this machine.
Size of pool
Condor needs to know if this a Personal Condor installation, or if there will be more than one machine in the pool. A Personal Condor pool implies that there is only one machine in the pool. For Personal Condor, several of the following steps are omitted as noted.

If you are joining an existing pool, all the installation program requires is the hostname of the central manager for your pool.

STEP 3: This Machine's Roles.

This step is omitted for the installation of Personal Condor.

Each machine within a Condor pool may either submit jobs or execute submitted jobs, or both submit and execute jobs. This step allows the installation on this machine to choose if the machine will only submit jobs, only execute submitted jobs, or both. The common case is both, so the default is both.

STEP 4: Where will Condor be installed?

The next step is where the destination of the Condor files will be decided. It is recommended that Condor be installed in the location shown as the default in the dialog box: C: $\mathtt{\backslash}$Condor.

Installation on the local disk is chosen for several reasons.

The Condor services run as local system, and within Microsoft Windows NT, local system has no network privileges. Therefore, for Condor to operate, Condor should be installed on a local hard drive as opposed to a network drive (file server).

The second reason for installation on the local disk is that the Windows NT usage of drive letters has implications for where Condor is placed. The drive letter used must be not change, even when different users are logged in. Local drive letters do not change under normal operation of Windows NT.

While it is strongly discouraged, it may be possible to place Condor on a hard drive that is not local, if a dependency is added to the service control manager such that Condor starts after the required file services are available.

STEP 5: Where should Condor send e-mail if things go wrong?

Various parts of Condor will send e-mail to a Condor administrator if something goes wrong and requires human attention. You specify the e-mail address and the SMTP relay host of this administrator. Please pay close attention to this email since it will indicate problems in your Condor pool.

STEP 6: The domain.

This step is omitted for the installation of Personal Condor.

Enter the machine's accounting (or UID) domain. On this version of Condor for Windows NT, this setting only used for User priorities (see section 3.5 on page [*]) and to form a default email address for the user.

STEP 7: Access permissions.
This step is omitted for the installation of Personal Condor.

Machines within the Condor pool will need various types of access permission. The three categories of permission are read, write, and administrator. Enter the machines to be given access permissions.

Read
Read access allows a machine to obtain information about Condor such as the status of machines in the pool and the job queues. All machines in the pool should be given read access. In addition, giving read access to *.cs.wisc.edu will allow the Condor team to obtain information about your Condor pool in the event that debugging is needed.
Write
All machines in the pool should be given write access. It allows the machines you specify to send information to your local Condor daemons, for example, to start a Condor Job. Note that for a machine to join the Condor pool, it must have both read and write access to all of the machines in the pool.
Administrator
A machine with administrator access will be allowed more extended permission to to things such as change other user's priorities, modify the job queue, turn Condor services on and off, and restart Condor. The central manager should be given administrator access and is the default listed. This setting is granted to the entire machine, so care should be taken not to make this too open.

For more details on these access permissions, and others that can be manually changed in your condor_config file, please see the section titled Setting Up IP/Host-Based Security in Condor in section section 3.7.5 on page [*].

STEP 8: Job Start Policy.
Condor will execute submitted jobs on machines based on a preference given at installation. Three options are given, and the first is most commonly used by Condor pools. This specification may be changed or refined in the machine ClassAd requirements attribute.

The three choices:

After 15 minutes of no console activity and low CPU activity.
Always run Condor jobs.
After 15 minutes of no console activity.

Console activity is the use of the mouse or keyboard. For instance, if you are reading this document online, and are using either the mouse or the keyboard to change your position, you are generating Console activity.

Low CPU activity is defined as a load of less than 30% (and is configurable in your condor_config file). If you have a multiple processor machine, this is the average percentage of CPU activity for both processors.

For testing purposes, it is often helpful to use use the Always run Condor jobs option. For production mode, however, most people chose the After 15 minutes of no console activity and low CPU activity.

STEP 9: Job Vacate Policy.
This step is omitted if Condor jobs are always run as the option chosen in STEP 8.

If Condor is executing a job and the user returns, Condor will immediately suspend the job, and after five minutes Condor will decide what to do with the partially completed job. There are currently two options for the job.

The job is killed 5 minutes after your return.
The job is suspended immediately once there is console activity. If the console activity continues, then the job is vacated (killed) after 5 minutes. Since this version does not include check-pointing, the job will be restarted from the beginning at a later time. The job will be placed back into the queue.
Suspend job, leaving it in memory.
The job is suspended immediately. At a later time, when the console activity has stopped for ten minutes, the execution of Condor job will be resumed (the job will be unsuspended). The drawback to this option is that since the job will remain in memory, it will occupy swap space. In many instances, however, the amount of swap space that the job will occupy is small.

So which one do you choose? Killing a job is less intrusive on the workstation owner than leaving it in memory for a later time. A suspended job left in memory will require swap space, which could possibly be a scarce resource. Leaving a job in memory, however, has the benefit that accumulated run time is not lost for a partially completed job.

STEP 10: Review entered information.
Check that the entered information is correctly entered. You have the option to return to previous dialog boxes to fix entries.


6.2.7.4 Manual Installation Condor on Windows NT

If you are to install Condor on many different machines, you may wish to use some other mechanism to install Condor NT on additional machines rather than running the Setup program described above on each machine.

WARNING: This is for advanced users only! All others should use the Setup program described above.

Here is a brief overview of how to install Condor NT manually without using the provided GUI-based setup program:

The Service
The service that Condor NT will install is called "Condor". The Startup Type is Automatic. The service should log on as System Account, but do not enable "Allow Service to Interact with Desktop". The program that is run is condor_ master.exe.

For your convenience, we have included a file called install.exe in the bin directory that will install a service. It is typically called in the following way:

install Condor Condor c:\condor\bin\condor_master.exe

If you wish to remove the service, we have provided a file called remove.exe. To use it, call it in the following way:

remove Condor

The Registry
Condor NT uses a few registry entries in its operation. The key that Condor uses is HKEY_LOCAL_MACHINE/Software/Condor. The values that Condor puts in this registry key serve two purposes.
  1. The values of CONDOR_CONFIG and RELEASE_DIR are used for Condor to start its service.

    CONDOR_CONFIG should point to the condor_config file. In this version of Condor NT, it must reside on the local disk.

    RELEASE_DIR should point to the directory where Condor is installed. This is typically C: $\mathtt{\backslash}$Condor, and again, this must reside on the local disk.

  2. The other purpose is storing the entries from the last installation so that they can be used for the next one.

The Filesystem
The files that are needed for Condor to operate are identical to the Unix version of Condor, except that executable files end in .exe. For example the on Unix one of the files is condor_master and on Condor NT the corresponding file is condor_master.exe.

These files currently must reside on the local disk for a variety of reasons. Advanced Windows NT users might be able to put the files on remote resources. The main concern is twofold. First, the files must be there when the service is started. Second, the files must always be in the same spot (including drive letter), no matter who is logged into the machine. Specifying a UNC path is not supported at this time.


6.2.7.5 Condor is installed... now what?

After the installation of Condor is completed, the Condor service must be started. If you used the GUI-based setup program to install Condor, the Condor service should already be started. If you installed manually, Condor must be started by hand, or you can simply reboot. NOTE: The Condor service will start automatically whenever you reboot your machine.

To start condor by hand:

  1. From the Start menu, choose Settings.
  2. From the Settings menu, choose Control Panel.
  3. From the Control Panel, choose Services.
  4. From Services, choose Condor, and Start.

Or, alternatively you can enter the following command from a command prompt:

         net start condor

Run the Task Manager (Control-Shift-Escape) to check that Condor services are running. The following tasks should be running:

Also, you should now be able to open up a new cmd (DOS prompt) window, and the Condor bin directory should be in your path, so you can issue the normal Condor commands, such as condor_ q and condor_ status.


6.2.7.6 Condor is running... now what?

Once Condor services are running, try building and submitting some test jobs. See the README.TXT file in the examples directory for details.


next up previous contents index
Next: 7. Frequently Asked Questions Up: 6. Platform-Specific Information Previous: 6.1 Linux   Contents   Index
condor-admin@cs.wisc.edu