Parallel Processing

Students cheer on the Redhawks during a sporting event at Miami University.

Sometimes processes may take a long time to execute because the code requires many computations or it must be run several times to complete. Parallel processing can be a good way to accelerate the procedure.

Normally, R code only uses one core in a computer's processor, but parallel processing allows multiple cores to be used at once, reducing the overall workload and allowing your machine to tackle big projects more quickly.

Package

The doParallel package will be used for parallel packaging (there are other packages like snow that have a very similar syntax). Some of the functions will be described below:

foreach():

  • : things to be used in this function. In this case, an iterator is made to iterate that many times
  • .combine: specify how the data should be output

Example:

# install.packages('doParallel')
library(doParallel)
# Variable for number of clusters
ncores = 2
# makeCluster runs multiple copies at once. The variable used specifies the number of clusters to use.
cl <- makeCluster(ncores)
# Activate the cores
registerDoParallel(cl)
x <- iris # Iris dataset used
trials <- 10000 # Do 10,000 trials
# Check the time of the process
ptime <- system.time({
r <- foreach(icount(trials), .combine = cbind) %dopar% {
ind <- sample(1000, 1000, replace = TRUE)
result1 <- glm( x[ind,5] ~ x[ind,4], family = binomial(logit) )
coefficients(result1)
}
})[3]

ptime

## elapsed
## 22.18

In the code above, if %dopar% is changed to %do%, the code is run normally. That is, parallel processing will not take place.

Need a Refresher?

Go back to the beginner tutorials.