1.4.1.2. Parallel universe¶
To request a job in the parallel universe, create a file example.ini
,
universe = parallel
machine_count = 2
request_cpus = 2
request_memory = 1024M
request_disk = 10240K
executable = /bin/echo
arguments = "hello world from process $(Node)"
output = hello_world-$(Node).out
error = hello_world-$(Node).err
log = hello_world.log
stream_output = True
stream_error = True
queue
And then submit your job using
condor_submit example.ini
After waiting for a while as the job finished, you can see what happened by reading the contents of log
, output
, and error
as specified in the ClassAd.
See Monitor your jobs to see how to monitor the status of your job. For advance use, use this command instead,
condor_submit example.ini; tail -F hello_world.log hello_world-0.out hello_world-0.err hello_world-1.out hello_world-1.err
and see Streaming stdout & stderr with tail for an explanation on what it does.
1.4.1.2.1. Explanation¶
- universe = parallel
This specifies that the job you’re submitting is a parallel job. In HTCondor, the
universe
attribute defines the type of environment or execution context for the job. In the case of theparallel
universe, it allows for the coordination of multiple job processes that will run simultaneously.- machine_count = 2
This indicates that the job requires two machines (or slots) from the HTCondor pool. Essentially, the job is requesting two instances of itself to run concurrently.
- request_cpus = 2
This asks for two CPUs for each instance (or slot) of the job. So, for the two machines specified by
machine_count
, each machine should have at least 2 CPUs.- request_memory = 1024M
This is a request for each machine (or slot) to have at least 1024 Megabytes (1 Gigabyte) of memory.
- request_disk = 10240K
This requests that each machine (or slot) has at least 10240 Kilobytes (10 Megabytes) of available disk space.
- executable = /bin/echo
This specifies the executable that will be run. In this case, it’s the
echo
command commonly found on UNIX-like systems.- arguments = “hello world from process $(Node)”
Here, the
arguments
attribute specifies what arguments will be passed to theecho
command. The$(Node)
is a placeholder that gets replaced with the node (or process) number when the job runs. So, for a parallel job running two instances, you’d see one instance printing “hello world from process 0” and the other “hello world from process 1”.- output = hello_world-$(Node).out
This specifies where the standard output of each job process should be written. Using the
$(Node)
placeholder, each process will write its output to a unique file. For instance, “hello_world-0.out” for the first process, “hello_world-1.out” for the second, and so on.- error = hello_world-$(Node).err
Similarly, this defines where the standard error of each job process should be written. For instance, any errors from the first process would go to “hello_world-0.err”, from the second to “hello_world-1.err”, and so on.
- log = hello_world.log
This is a consolidated log file for the job. It will contain logging information from all instances of the job, such as when each instance starts, stops, etc.
- stream_output = True
This means that the standard output of the job will be streamed (written in real-time) to the specified output file, rather than being buffered and written at the end of the job.
- stream_error = True
Similarly, this streams the standard error of the job to the specified error file in real-time.
- queue
This final command actually submits the job (or jobs, if more than one) to the HTCondor scheduler. It tells HTCondor that the job is ready to be matched with available resources in the pool.