R性能優化

程序性能剖析

確定程序運行時間

利用system.time：

system.time(for (i in 1:50) mad(stats::runif(500)))

利用proc.time：

ptm <- proc.time()
for (i in 1:50) mad(stats::runif(500))
proc.time() - ptm

性能監控的函數Rprof¹

可視化性能監控lineprof²

OpenBLAS：加速矩陣運算³

$ brew install openblas --build-from-source
$ brew install r --with-openblas

# 安裝可能遇到的問題：
# curl: (7) Failed to connect to rcompletion.googlecode.com port 443: Operation timed out
# Error: Failed to download resource "r--completion"
# Download failed: https://rcompletion.googlecode.com/svn-history/r31/trunk/bash_completion/R

如果程序已經是多線程，可能會和OpenBLAS發生衝突，可以在環境變量中設置OpenBLAS爲單線程：

export OPENBLAS_NUM_THREADS=1

OpenBLAS提升效果：

## 使用了OpenBLAS：
x <- matrix(1:(6000 * 6000), 6000, 6000)
system.time(tmp <- x %*% x)
#   user  system elapsed 
# 13.321   0.323   7.315 

## 沒有使用OpenBLAS：
x <- matrix(1:(6000 * 6000), 6000, 6000)
system.time(tmp <- x %*% x)
#    user  system elapsed 
# 206.588   2.216 214.333

parallel：並行計算包⁴⁵

parallel包是從snow包和multicore包合併繼承而來，包含了很多非常好用的函數。multicore只能在支持fork的操作系統使用，只能用於單臺計算機。snow可以用在Unix系列、Windows或者二者混合的集羣上。在單處理器單核上使用multicore和snow沒效果。

parallel包可以通過PVM（rpvm包）、MPI（Rmpi包）、NetWorkSpaces（nws包）和raw sockets（如果以上3種都不能使用）平臺進行分佈計算，支持cluster和多核個人/服務器計算機。原則上，parallel可以通過線程（thread）或輕量級進程（lightweight process）實現並行，但是目前都是依賴於進程（process），實現並行有三種方式：

通過system("Rscript")或類似的方式啓動進程。安全機制可能會阻止進程間通過socket通信。按照snow的方式，通過socket監聽來自主進程命令的進程池稱爲節點集羣。
通過fork系統調用。fork出的進程副本會共享主進程的內存頁，直到其內容發生改變，因此forking方式速度很快。fork的方式最早被multicore採用。由於進程的共享機制，也會共享GUI元素，這回導致havoc⁶。進程間可以通過管道和socket方式通信。
通過系統級機制向其它成員分發任務。snow包利用Rmpi包使用MPI（message passing interface）。這種情況下，通訊過載會增加計算時間，常用於高速內連的網絡。在這種工作模式下，CRAN還提供了GridR和Rsge包。

doit <- function(x)(x)^2 + 2*x
system.time(res <- lapply(1:5000000,  doit))

 #   user  system elapsed 
 # 24.624   0.224  25.049 

library(parallel)
cl <- makeCluster(getOption("cl.cores", 4)) # use 4 cores
system.time(res <- parLapply(cl, 1:5000000,  doit))
stopCluster(cl) 

  #  user  system elapsed 
  # 2.405   0.258  10.444 

mc <- getOption("mc.cores", 4)
system.time(res <- mclapply(1:5000000,  doit, mc.cores = mc))

  #  user  system elapsed 
  # 6.023   1.632   5.300

注意：

需要先確定系統處理器核心數目，通常可用detectCores(logical = F)；
注意函數的調用方式是否爲Rscript，該方式會複製對象，內存佔用大，處理大數據時要當心。

foreach：並行計算包[1]

foreach包是revolution analytics公司貢獻給R開源社區的一個包，它能使R中的並行計算更爲方便。

doParallel包是foreach包並行計算的後端，它提供了並行執行foreach循環的機制。foreach必須採用doParallel這樣的包才能實現並行計算。用戶在使用時必須註冊並行計算後端，否則即使用了%dopar%程序也串行執行。doParallel包起着foreach包和parallel包之間接口的作用。默認情況，doParallel包在Unix系列操作系統使用multicore功能，在Windows系統使用snow功能。[2]

# snow-like
library(doParallel)
cl <- makeCluster(2)
registerDoParallel(cl)
foreach(i=1:3) %dopar% sqrt(i)

# multicore-like
library(doParallel)
registerDoParallel(cores=2)
foreach(i=1:3) %dopar% sqrt(i)

## 該環境的後續程序都按multicore模式進行。

並行的boostrap：

## 已經註冊了並行方式，不需要再註冊……

x <- iris[which(iris[,5] != "setosa"), c(1,5)]
trials <- 10000

# 並行方案
ptime <- system.time({
    r <- foreach(icount(trials), .combine=cbind) %dopar% {
      ind <- sample(100, 100, replace=TRUE)
      result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
      coefficients(result1)
  }
})[3] 

# 串行方案：
stime <- system.time({
    r <- foreach(icount(trials), .combine=cbind) %do% {
      ind <- sample(100, 100, replace=TRUE)
      result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
      coefficients(result1)
  }
})[3] 

c(ptime, stime)

# elapsed elapsed 
#  26.372  37.126

memoise：本地緩存包⁷

memoise是一個簡單的緩存包，主要用來減少重複計算，從而提升CPU性能。當你用相同的參數執行計算的時候，你會得到之前計算過的結果，而不是重算一遍。緩存技術對於有併發訪問的應用來說，是性價比最高的性能提升方案。memoise包只有2個函數：forget重置緩存函數，memoize定義緩存函數。

#定義緩存函數
fun <- memoise(function(x) { Sys.sleep(1); runif(1) })

#第一次執行fun函數
system.time(print(fun()))
# [1] 0.4342335
#    user  system elapsed 
#   0.002   0.002   1.004 

#第二次執行fun函數 
system.time(print(fun()))
# [1] 0.4342335
#    user  system elapsed 
#   0.001   0.000   0.000 

#重置緩存函數
forget(fun)

#第三次執行fun函數 
system.time(print(fun()))
# [1] 0.786522
#    user  system elapsed 
#   0.003   0.003   1.002

compiler：編譯功能包⁸

執行函數之前，把它編譯成二進制程序。

library(compiler)

myFunction<-function() {for(i in 1:1e7) {1*(1+1)}}
myCompiledFunction <- cmpfun(myFunction) # 編譯函數

system.time(myFunction())
  #  user  system elapsed 
  # 3.448   0.024   3.486 

system.time(myCompiledFunction())
  #  user  system elapsed 
  # 0.611   0.017   0.637

Rcpp：R中融合C++[3]

library(Rcpp)

cppFunction(
    'int fib_cpp_0(int n){
       if(n == 1 || n == 2) return 1;
       return(fib_cpp_0(n - 1) + fib_cpp_0( n - 2));
   }'
   )

fib_r <- function(n){
    if(n == 1 || n == 2) return(1)
        return(fib_r(n - 1) + fib_r(n - 2))
}

system.time(fib_cpp_0(30))

  #  user  system elapsed 
  # 0.002   0.000   0.002 

system.time(fib_r(30))

  #  user  system elapsed 
  # 1.697   0.021   1.739

Rcpp簡化了在R中集成C++代碼，它將各種R對象映射爲特定的C++類，使得C++和R之間的對象管理變得簡單、靈活，並提供了對STL等的廣泛支持。C++代碼可以被編譯、鏈接並動態加載，或者通過包加載。

Rcpp包提供了在C++層次無縫訪問、擴展和修改R對象的API。R的API基於SEXP上的函數與宏操作，SEXP是R對象的內部表示。這些API的關鍵功能包括：C++類對R對象的輕量級封裝、自動垃圾回收策略、代碼內連、R與C++的數據交換，以及錯誤處理。

Rcpp包API的兩個典型應用場景：

用C++代碼替代R代碼以提升程序性能；
方便調用其它庫提供的函數。

以下代碼是採用Rcpp計算卷積：

#include <Rcpp.h>

RcppExport SEXP convolve3cpp(SEXP a, SEXP b) {
    Rcpp::NumericVector xa(a);
    Rcpp::NumericVector xb(b);
    int n_xa = xa.size(), n_xb = xb.size();
    int nab = n_xa + n_xb - 1;
    Rcpp::NumericVector xab(nab);
    for (int i = 0; i < n_xa; i++)
        for (int j = 0; j < n_xb; j++)
            xab[i + j] += xa[i] * xb[j];
    return xab; 
}

以上程序展示了使用Rcpp的幾個重要方法：

使用Rcpp的API只需要一個頭文件Rcpp.h；
RcppExport是方便從C調用C++的宏；
兩個SEXP類型的輸入變量，輸出變量類型通過R的API的.Call()定義；
Rcpp將兩個輸入變量轉換成了C++的向量類型；
通過成員函數size()查看對象大小，通過[]索引向量元素；
內存管理仍然由R完成；
返回值自動實現從NumericVector到SEXP的轉換。

Rcpp sugar能在C++中使用類似R的語法。它不僅提供了漂亮的語法，而且使程序運行更高效。

其它技術

提升讀入數據效率

read.table()、read.csv()適合讀取小規模數據框，有效地讀取大數值矩陣要使用更爲底層的read.delim()，甚至scan()函數。⁹

在讀取大型數據時，設定comment.char=""，以讀取目標的原子向量類型（邏輯型，整型，數值型，複數型，字符型等）。事先設置好每列的colClasses，給定需要讀入的行數nrows（適當地高估一點比不設置這個參數還要快）等措施會部分地提高效率。如果需要試探數據，可以把nrows設置爲10或者更小，這樣就可以只讀取並查看數據的前幾行。⁹¹⁰

參考資料

[1]R. Analytics and S. Weston, doParallel: Foreach parallel adaptor for the parallel package. 2014.[Online]
[2]S. Weston and R. Calaway, “Getting Started with doParallel and foreach,” 2014.
[3]D. Eddelbuettel and R. François, “Rcpp: Seamless R and C++ Integration,” Journal of Statistical Software, vol. 40, no. 8, pp. 1–18, 2011. [Online]

R語言性能監控工具Rprof ↩
R語言性能可視化lineprof ↩
用 OpenBLAS 加速 R 的矩陣運算 ↩
並行化你的運算-初識parallel包 ↩
R使用parallel包並行計算 ↩
Some precautions are taken on Mac OS X: for example the event loops for R.app and the quartz device are inhibited in the child. This information is available at C level in the Rboolean variable R_isForkedChild. ↩
R語言本地緩存memoise ↩
簡單加速R ↩
R數據讀取的幾點tips ↩ ↩²
R語言讀取數據-性能優化

文章來源：http://qianjiye.de/2015/04/speed-up-r#fn:speed-up-R

http://www.r-statistics.com/2012/04/speed-up-your-r-code-using-a-just-in-time-jit-compiler/

程序性能剖析

確定程序運行時間

性能監控的函數Rprof¹

可視化性能監控lineprof²

OpenBLAS：加速矩陣運算³

parallel：並行計算包⁴⁵

foreach：並行計算包[1]

memoise：本地緩存包⁷

compiler：編譯功能包⁸

Rcpp：R中融合C++[3]

其它技術

提升讀入數據效率

參考資料

Linux-權限管理-chattr與lsattr命令

mahout學習(二)--基於Mahout的電影推薦系統

CentOS install Python 2.7.6 and 3.3.3

hadoop學習--MapReduce初級案例

rstudio-server使用github版本控制

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

R性能優化

程序性能剖析

確定程序運行時間

性能監控的函數Rprof1

可視化性能監控lineprof2

OpenBLAS：加速矩陣運算3

parallel：並行計算包45

foreach：並行計算包[1]

memoise：本地緩存包7

compiler：編譯功能包8

Rcpp：R中融合C++[3]

其它技術

提升讀入數據效率

參考資料

性能監控的函數Rprof¹

可視化性能監控lineprof²

OpenBLAS：加速矩陣運算³

parallel：並行計算包⁴⁵

memoise：本地緩存包⁷

compiler：編譯功能包⁸