1.4.1.2. Parallel universe¶

To request a job in the parallel universe, create a file example.ini,

universe       = parallel
machine_count  = 2
request_cpus   = 2
request_memory = 1024M
request_disk   = 10240K
executable     = /bin/echo
arguments      = "hello world from process $(Node)"
output         = hello_world-$(Node).out
error          = hello_world-$(Node).err
log            = hello_world.log
stream_output  = True
stream_error   = True
queue

And then submit your job using

condor_submit example.ini

After waiting for a while as the job finished, you can see what happened by reading the contents of log, output, and error as specified in the ClassAd.

See Monitor your jobs to see how to monitor the status of your job. For advance use, use this command instead,

condor_submit example.ini; tail -F hello_world.log hello_world-0.out hello_world-0.err hello_world-1.out hello_world-1.err

and see Streaming stdout & stderr with tail for an explanation on what it does.

1.4.1.2.1. Explanation¶

universe = parallel: This specifies that the job you’re submitting is a parallel job. In HTCondor, the universe attribute defines the type of environment or execution context for the job. In the case of the parallel universe, it allows for the coordination of multiple job processes that will run simultaneously.
machine_count = 2: This indicates that the job requires two machines (or slots) from the HTCondor pool. Essentially, the job is requesting two instances of itself to run concurrently.
request_cpus = 2: This asks for two CPUs for each instance (or slot) of the job. So, for the two machines specified by machine_count, each machine should have at least 2 CPUs.
request_memory = 1024M: This is a request for each machine (or slot) to have at least 1024 Megabytes (1 Gigabyte) of memory.
request_disk = 10240K: This requests that each machine (or slot) has at least 10240 Kilobytes (10 Megabytes) of available disk space.
executable = /bin/echo: This specifies the executable that will be run. In this case, it’s the echo command commonly found on UNIX-like systems.
arguments = “hello world from process $(Node)”: Here, the arguments attribute specifies what arguments will be passed to the echo command. The $(Node) is a placeholder that gets replaced with the node (or process) number when the job runs. So, for a parallel job running two instances, you’d see one instance printing “hello world from process 0” and the other “hello world from process 1”.
output = hello_world-$(Node).out: This specifies where the standard output of each job process should be written. Using the $(Node) placeholder, each process will write its output to a unique file. For instance, “hello_world-0.out” for the first process, “hello_world-1.out” for the second, and so on.
error = hello_world-$(Node).err: Similarly, this defines where the standard error of each job process should be written. For instance, any errors from the first process would go to “hello_world-0.err”, from the second to “hello_world-1.err”, and so on.
log = hello_world.log: This is a consolidated log file for the job. It will contain logging information from all instances of the job, such as when each instance starts, stops, etc.
stream_output = True: This means that the standard output of the job will be streamed (written in real-time) to the specified output file, rather than being buffered and written at the end of the job.
stream_error = True: Similarly, this streams the standard error of the job to the specified error file in real-time.
queue: This final command actually submits the job (or jobs, if more than one) to the HTCondor scheduler. It tells HTCondor that the job is ready to be matched with available resources in the pool.