Projects

JDM Lab Suite

JDM Lab Suite is a program that Joseph Johnson – Psychology and his graduate students use in their research of judgment and decision making (JDM). The program displays a grid (as shown below) of similar objects and their attributes. For example, each row of the grid could represent a car, and each column of the grid could represent an attribute of a car, such as its cost, color, model year, etc. The grid cell at the intersection of a row and column has the relevant information, e.g., the color of a Toyota. The information is not visible.

JDM grid. 151% shown at the intersection of RDA Salt column and Burrito row.

To perform an experiment the subject moves the cursor from cell to cell. Only the information in the cell that has the cursor is visible. The subject looks at whatever information she needs to determine what car she wants. The program keeps track of the path the cursor takes through the grid, the time spent in each cell, etc. The researchers then use this information in their mathematical models of how people make judgments and decisions.

Dates

05/19/2006 - present

Technologies

C++, GUI, XML, interactive, Embarcadero (Borland)

Speeding Up Data Processing using Parallel Processing

Whenever there is a huge data set coming in the form of 100,000 or even millions of files serial processing (on a PC or Mac for instance) quickly reaches a limit. Assuming that one file may be processed in 10 seconds the total compute time reaches about 278 hours for 100,000 files, the equivalent of 11 and a half days, about 110 days for a million files and so on. It is a not a very practical approach to throw PC resources at this task especially when considering that other daily compute tasks such a reading email need to be performed simultaneously while the CPU(s) are crunching the data at full speed. A better idea is to utilize high-performance computer resources in the form of a compute cluster.

A first step when attempting a possible speed-up is to assess available cluster hardware resources and whether the processing tool is compatible with the high-performance cluster’s operating system. In most cases, this translates into whether the tool is available for the Linux platform because most clusters run on that system. Not the entire amount of resources will be available at a given high-performance facility at once, rather, the resources are shared with other users and compute tasks are lined up in a queue and executed based on their priority which typically means wait time. Most of the compute nodes in a cluster, which can have tens to hundreds of them, carry multiple processors (8-12 is a safe guess these days). To optimally use these resources for this kind of task it is advisable to aggregate tasks in several jobs which each fit on a single compute node. This is a resource amount large enough to bundle a set of processes but not too large to get lined up in the wait queue for too much time. Experience tells that jobs that request many nodes (and many processor cores as a consequence) at once wait longer in the queue before they get executed. An example of a possible, (not a necessary one) compute scenario for the 100,000 files on a 8 core per node cluster would be the following.

1250 compute jobs are created, each carrying 8*10 files. 8 files per job will be processed simultaneously, so each job will complete in about 10*10=100 seconds given each file needs 10 seconds to be processed. Now, the extent to which the 1250 jobs get processed simultaneously largely depends on how busy the cluster is loaded by other users and whether there is a mandated limit on the maximum number of jobs a single user can run – the limit on the number a user can submit can be much much higher. A practical estimate assumes that the limit is 20 jobs per user and about the same number of nodes are actually available during the entire processing time (This would be a scenario on the Redhawk cluster at Miami when it is about half loaded). The number of these 20-jobs-at-a-time batches will be 1250/20=62.5, so the total time for processing will be 62.5*100 seconds which amounts to 1.7 hrs. The total speed up consequentially would be by a factor of 8*20=160.

Visual Experiment

“Visual Experiment” is a program that Lynn Olzak and her graduate students use in their research perception of brightness, color, patterns, and the mechanisms underlying figure-ground segregation. The software displays images specified by the researcher in random order to the subject, who makes a decision about the images and signals it by pressing a keyboard key. The program keeps track of the subject’s responses.

Dates

05/19/20016 - present

Technologies

MATLAB, GUI, XML, interactive, image