這篇博文主要來介紹R中的高級循環。
replication
之前介紹的rep函數可以把輸入的參數重複數次,這次介紹的replicate函數則能調用表達式數次。
> rep(runif(1),5)
[1] 0.2756356 0.2756356 0.2756356 0.2756356 0.2756356
> replicate(5,runif(1))
[1] 0.3577430 0.3919983 0.6912650 0.9946233 0.3270321
time_for_commute <- function()
{
mode_of_transport <- sample(
c("car","bus","train","bike"),
size = 1,
prob = c(0.1,0.3,0.2,0.4) #probability
)
time <- switch(
mode_of_transport,
car = rlnorm(1, log(30), 0.5),
bus = rlnorm(1, log(40), 0.5),
train = rnorm(1, 30, 10),
bike = rnorm(1, 60, 5)
)
names(time) <- mode_of_transport
return (time)
}
replicate(5, time_for_commute())
Out:
lapply
lapply是list apply的縮寫,該函數的輸入參數是某個函數,此函數將依此作用於列表中的每個元素上,並將結果返回到另一個列表中。
#質因數分解列表
prime_factors <- list(
two = 2,
three = 3,
four = c(2,2),
five = 5,
six = c(2,3),
seven = 7,
eight = c(2,2,2),
nine = c(3,3),
ten = c(2,5)
)
Out:
#可以用for循環來逐個檢查元素
unique_primes <- vector("list", length(prime_factors))
for(i in seq_along(prime_factors))
{
unique_primes[[i]] <- unique(prime_factors[[i]])
}
names(unique_primes) <- names(prime_factors)
unique_primes
Out:
用for循環很明顯,代碼比較繁瑣,這個時候,lapply函數的優勢就體現出來了。
lapply(prime_factors, unique)
Out:
vapply
vapply函數應用於列表,返回向量vector或數組,和前面一樣,它的輸入參數是一個列表和函數,以及第三個參數:返回值的模板。
> vapply(prime_factors, length, numeric(1))
two three four five six seven eight nine ten
1 1 2 1 2 1 3 2 2
sapply
sapply函數的含義爲:simplify apply,它的輸入參數也是列表和函數,它不需要模板,但會盡量把結果簡化到向量和數組中。
> sapply(prime_factors, unique) #返回一個列表
$two
[1] 2
$three
[1] 3
$four
[1] 2
$five
[1] 5
$six
[1] 2 3
$seven
[1] 7
$eight
[1] 2
$nine
[1] 3
$ten
[1] 2 5
> sapply(prime_factors, length) #返回一個向量
two three four five six seven eight nine ten
1 1 2 1 2 1 3 2 2
> sapply(prime_factors, summary) #返回一個數組
two three four five six seven eight nine ten
Min. 2 3 2 5 2.00 7 2 3 2.00
1st Qu. 2 3 2 5 2.25 7 2 3 2.75
Median 2 3 2 5 2.50 7 2 3 3.50
Mean 2 3 2 5 2.50 7 2 3 3.50
3rd Qu. 2 3 2 5 2.75 7 2 3 4.25
Max. 2 3 2 5 3.00 7 2 3 5.00
sapply函數並不推薦使用,因爲它的結果有的時候是一個列表,有的時候是一個向量。
應用和參數
在上面的例子中,傳入到lapply,vapply,sapply的函數都只有一個參數,那麼當函數有多個參數時呢。
#times是rep.int函數中的參數
complemented <- c(2,3,4,5)
lapply(complemented, rep.int, times=4)
Out:
下面這種做法更常見,是把函數的定義包括在lapply的調用中,返回的結果和上面的一樣。
#complemented是rep4x的參數
complemented <- c(2,3,4,5)
rep4x <- function(x){
rep.int(x,times=4)
}
lapply(complemented, rep4x)
我們還可以通過把匿名函數傳給lapply,實現上面的操作。
#這裏其實是省去了給函數命名的步驟,直接把函數體寫在function(x)後面。
lapply(complemented, function(x) rep.int(x,times=4))
遍歷數組
先介紹magic函數,它會返回n*n的矩陣,矩陣每行每列的和相等。
Returns an n-by-n matrix constructed from the integers 1 through N^2 with equal row and column sums.
#使用magic函數之前必須載入matlab包
library(matlab)
> (magic4 <- magic(4))
[,1] [,2] [,3] [,4]
[1,] 16 2 3 13
[2,] 5 11 10 8
[3,] 9 7 6 12
[4,] 4 14 15 1
#計算行的和
> rowSums(magic4)
[1] 34 34 34 34
如果想要計算每行其他的統計值,可以使用apply函數。它的參數爲:矩陣、維數(1表示把函數應用於每一行,2表示把函數應用於每一列)、函數
> apply(magic4, 1, sum)
[1] 34 34 34 34
> apply(magic4, 2, toString)
[1] "16, 5, 9, 4" "2, 11, 7, 14" "3, 10, 6, 15" "13, 8, 12, 1"
apply也可以用於數據框。
當把函數按列應用到數據框上,apply和sapply的行爲相同。
多個輸入的應用函數
lapply的缺點是它的函數參數只能循環作用於單個向量參數,另一個是對於每個元素的函數,你不能訪問該元素的名稱。
mapply(multiple argument list apply),能夠傳入儘可能多的向量作爲參數,對於mapply,每一個傳遞的參數都是函數。
#ifelse(條件,若T則執行該條語句,若F則執行該條語句)
msg <- function(name, factors)
{
ifelse(
length(factors) == 1,
paste(name, "is prime"),
paste(name, "has factors", toString(factors))
)
}
mapply(msg, names(prime_factors), prime_factors)
Out:
mapply與sapply的表現相同,會儘可能地簡化輸出,給mapply傳入某參數時,可以關閉此行爲。
> mapply(msg, names(prime_factors), prime_factors, SIMPLIFY = FALSE)
$two
[1] "two is prime"
$three
[1] "three is prime"
$four
[1] "four has factors 2, 2"
$five
[1] "five is prime"
$six
[1] "six has factors 2, 3"
$seven
[1] "seven is prime"
$eight
[1] "eight has factors 2, 2, 2"
$nine
[1] "nine has factors 3, 3"
$ten
[1] "ten has factors 2, 5"
Vectorize是mapply地包裝函數,它接受一個標量作爲輸入參數,並且返回一個新的接受向量的函數。
gender_re <- function(gender)
{
switch(
gender,
male = "It is a boy!",
female = "It is a girl",
"Ummmm"
)
}
new_gender_re <- Vectorize(gender_re)
new_gender_re(c("male","famle","others"))
Out:
male famle others
"It is a boy!" "Ummmm" "Ummmm"
拆分-應用-合併
split - application - combine
frogger_scores <- data.frame(
player = rep(c("Tom","Dick","Sally"), times = c(2,5,3)),
score = round(rlnorm(10, 8), -1)
)
with(frogger_scores, tapply(score, player, mean))
Out:
Dick Sally Tom
19486.000 4723.333 1840.000
資料來源:
《學習R》 Ricbard Cotton 著
劉軍 譯