Parallel Help

Back to Top

Should I use Automatic or the Manual Thread Allocation Scheme?

Assuming you have enough memory and disk space, the way to maximize throughput is to run multiple molecules in parallel, with each molecule using only 1 thread/core. The quickest way to complete a single molecule is to run one molecule at a time using multiple cores. The Automatic Core/Thread Allocation algorithm chooses between these extremes by detecting if you are running multiple molecules or just a single job, respectively. In most cases, it makes sense to use this default Automatic setting.

You may want to override this Automatic algorithm with the Manual algorithm. To do so it is useful to benchmark performance on your machine. (In general this is very cpu-brand, cache-size, etc. and task type dependent.) For benchmarking performance for Spartan'14 Parallel Suite on your machine, use Parallel Test File. The procedure for running the file is:

  1. Open Spartan '14 Parallel Suite
  2. Go to the Options Menu -> Preferences -> Jobs tab and change the default settings as follows. Set the Maximum Concurrent Jobs/Threads to 1, entering Manual mode, and set Concurrent Molecules Per Core as well as Cores Per Molecule to 1.

  3. Examples of what the panel should look like are shown below for Windows and Mac OSX:

  4. Open the Parallel Test File (click on link) and save as ParallelTest.spartan, then submit this job (Setup Menu -> Submit).
  5. When finished, (~30 min.) you will have 4 plots that look at computation times as a function of # of cores (up to 8 cores) [You may need to rescale or move different plots to see to entire curves.]
  6. Reset the Jobs tab back to default, or whatever values you are comfortable with.

Some discussion on these plots is probably required.

The top two plots examine a (single point) energy calculation. The bottom two plots consider an NMR calculation.

The wide charts show "Wall" and "Theoretical cpu" time for the job using a different number of cores/threads. You should focus on the "Wall Time" which will always be greater than CPU time.

The Square plots represent the same Wall Time data, but are scaled to be in units of "Speed Up per core". Optimal performance is represented by the diagonal line.

On many modern CPUs, if only one core is being used the chip can increase its CPU frequency making that 1 core faster than normal. Thus scaling may look worse than it really is.

It is not unusual for the time to actually slow down, once the resources of your machine is "maxed out". It would make no sense to run any job with more cores than this. Usually there is a point of "Diminishing returns" but the location of this "point" is really a personal preference and based on your work style.

As an example, let's examine the plot shown below. This is from a 2 core multi-threaded (Intel CPU) on a machine which runs some jobs, but also runs other programs. Clearly using more than 4 threads is not a good idea, and 4 is not much better than 3. It might make sense only run 1 job at a time on this machine, and to allow it to use 3 threads, giving the system (or other programs) a thread to do other light weight work.

Depending on your work flow other choices will make sense. On my desktop machine I typically only run one job at a time, but use only half the number of threads (2 in this example). In my environment there is almost always a single job running (often a list containing multiple molecules). If there is another molecule I want to finish quickly I will submit a single molecule list, then go to the monitor and "Run It Now". Since I have some spare threads, it runs reasonably well.

Back to Top
Last modified: Fri Jun 20 07:18:15 PDT 2014