Functions in R

<span style="font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; line-height: 1.25; background-color: rgb(255, 255, 255);">譯自:</span>

        http://adv-r.had.co.nz/Functions.html

“To understand computations in R, two slogans are helpful:

  • Everything that exists is an object.
  • Everything that happens is a function call."

— John Chambers

所有均爲對象,包括函數;


函數


函數成分
函數3個基本元素
  1. formals,形參表,可用formals(fun)查看;
  2. body,函數體,可用body(fun)查看;
  3. environment,環境,可用environment(fun)查看;
R函數以上3成分均非空;
Primitive函數,該類型函數直接使用.Primitive()調用同名C函數;
    如sum函數;
> sum
function (..., na.rm = FALSE)  .Primitive("sum")
    此類函數滿足
> formals(sum)
NULL
> body(sum)
NULL
> environment(sum)
NULL

習題答案(自編,僅供參考)
formal <- sapply(funs,FUN=formals);

len <- sapply(formal,FUN=length);

# Q1: which base funtion has the most arguments?
> max(len);
[1] 22
> funs(len==22)
$scan

# Q2: how many base functions has no arguments? What's special baout these functions?
> sum(len==0)
[1] 221

# Q3: how could you adapt the code to find all primitive functions?
funs <- Filter(is.function,objs);
is_primitive <- function(x) 
                {
                  return(is.null(formals(x))&is.null(body(x))&is.null(environment(x)));
                }
primitive_funs <- Filter(is_primitive,funs)
primitive_funs

靜態域

四條基本原則:

  • name masking
  • functions vs. variables
  • a fresh start
  • dynamic lookup
name masking
對於變量或函數,規則一樣:
The same rules apply if a function is defined inside another function: look inside the current function, then where that function was defined, and so on, all the way up to the global environment, and then on to other loaded packages. Run the following code in your head, then confirm the output by running the R code.
The same rules apply to closures, functions created by other functions. Closures will be described in more detail in functional programming; here we’ll just look at how they interact with scoping.
<pre name="code" class="plain"># 尋常局部<strong>變量</strong>

x <- 2
g <- function() {
  y <- 1
  c(x, y)
}
g()
rm(x, g)
</pre><pre name="code" class="plain"># <strong>變量包含在嵌套定義函數中</strong> <span style="color: rgb(51, 51, 51); font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 14px; line-height: 20px;">a function is defined inside another function</span>
x <- 1
h <- function() {
  y <- 2
  i <- function() {
    z <- 3
    c(x, y, z)
  }
  i()
}
h()
rm(x, h)
</pre><pre name="code" class="plain"># <strong>局部定義函數</strong>屏蔽上級函數
l <- function(x) x + 1
m <- function() {
  l <- function(x) x * 2
  l(10)
}
m()
#> [1] 20
rm(l, m)
</pre><pre name="code" class="plain"># <strong>closure</strong>
j <- function(x) {
  y <- 2
  function() {
    c(x, y)
  }
}
k <- j(1) # 返回函數
k() # 調用該函數
rm(j, k)
<span style="color: rgb(51, 51, 51); font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 14px; line-height: 20px;"># 注: 可能有點奇怪,調用完j後,返回函數k()是如何知道局部變量y的值?</span><span style="color: rgb(51, 51, 51); font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 14px; line-height: 20px;">. It works because </span><code style="box-sizing: border-box; font-family: Inconsolata, sans-serif; font-size: 14px; padding: 1px; color: rgb(51, 51, 51); border-top-left-radius: 4px; border-top-right-radius: 4px; border-bottom-right-radius: 4px; border-bottom-left-radius: 4px; line-height: 20px; background-color: rgb(250, 250, 250);"><strong>k</strong></code><span style="color: rgb(51, 51, 51); font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 14px; line-height: 20px;"><strong> preserves the environment</strong> in which it was defined #-> and because the environment includes the value of </span><code style="box-sizing: border-box; font-family: Inconsolata, sans-serif; font-size: 14px; padding: 1px; color: rgb(51, 51, 51); border-top-left-radius: 4px; border-top-right-radius: 4px; border-bottom-right-radius: 4px; border-bottom-left-radius: 4px; line-height: 20px; background-color: rgb(250, 250, 250);">y</code><span style="color: rgb(51, 51, 51); font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 14px; line-height: 20px;">. </span><a target=_blank href="http://adv-r.had.co.nz/Environments.html#environments" style="box-sizing: border-box; color: rgb(66, 139, 202); text-decoration: none; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 14px; line-height: 20px;">Environments</a> <span style="color: rgb(51, 51, 51); font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 14px; line-height: 20px;">gives some pointers on how you can dive in and figure out what values are #-> stored in the environment associated with each function.</span>

有趣的是,name masking規則可以使我們“重載”各種操作,所以每次最好重啓R session
`(` <- function(e1) {
  if (is.numeric(e1) && runif(1) < 0.1) {
    e1 + 1
  } else {
    e1
  }
}
replicate(50, (1 + 2))
#>  [1] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 3 4 3 3 3 3 3 4 3 3 3 3 3 3 3 4 3 3 3
#> [36] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
rm("(")

a fresh start
每次調用函數均是開闢一個新的空間。查看以下實例:
j <- function() {
  if (!exists("a")) {
    a <- 1
  } else {
    a <- a + 1
  }
  print(a)
}
j()
rm(j)
此函數每次調用均返回1!因爲每調用完一次,資源就會被收回,當次調用不會知道上次調用結果a。

dynamic lookup
f <- function() x
x <- 15
f()
#> [1] 15

x <- 20
f()
#> [1] 20
問題出在:R looks for values when the function is run, not when it’s created.
以上函數f每次行爲都跟全局環境中x有關,導致x函數行爲不是self-contained。
檢測這種錯誤的方法是
f <- function() x + 1
codetools::findGlobals(f)
解決這種問題的一種極端方法是將每次調用f的新環境強制定義爲空環境
environment(f) <- emptyenv()
f()
#> Error: could not find function "+"

Every operation is a function call

Note that `(Esc下面那個鍵,類似於Shell裏面eval操作符), the backtick, lets you refer to functions or variables that have otherwise reserved or illegal names:
x <- 10; y <- 5
x+y 等價於 `+`(x,y)
</pre><pre name="code" class="plain">for (i in 1:2) print(i) 等價於 `for`(i,1:2,print(i))
</pre><pre name="code" class="plain">if(i==1) print("yes") else print("no") 等價於 if(i==1,print("yes"),print("no"))
</pre><pre name="code" class="plain">x[3] 等價於 `[`(x,3)
</pre>比較常用的地方是sapply/lapply等</div><div><span style="color:#333333;"><span style="line-height: 20px;"><span style="font-size: 14px;"></span></span></span><pre name="code" class="plain">add <- function(x, y) x + y
sapply(1:10,add, 3)
等價於
sapply(1:10,`+`,3)
等價於
sapply(1:10,"+",3)
最後一個可行的原因在於<code style="box-sizing: border-box; font-family: Inconsolata, sans-serif; font-size: 14px; padding: 1px; color: rgb(51, 51, 51); border-top-left-radius: 4px; border-top-right-radius: 4px; border-bottom-right-radius: 4px; border-bottom-left-radius: 4px; line-height: 20px; background-color: rgb(250, 250, 250);">lapply() source code</code><span style="color: rgb(51, 51, 51); font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 14px; line-height: 20px;">, you’ll see the first line uses </span><code style="box-sizing: border-box; font-family: Inconsolata, sans-serif; font-size: 14px; padding: 1px; color: rgb(51, 51, 51); border-top-left-radius: 4px; border-top-right-radius: 4px; border-bottom-right-radius: 4px; border-bottom-left-radius: 4px; line-height: 20px; background-color: rgb(250, 250, 250);">match.fun()</code><span style="color: rgb(51, 51, 51); font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 14px; line-height: 20px;"> to find functions given their names.</span>
</pre><h3 style="box-sizing: border-box; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; font-weight: 500; line-height: 1.1; margin-top: 20px; margin-bottom: 10px; font-size: 24px; color: rgb(51, 51, 51);">Calling a function given a list of arguments</h3>當參數較長時,我們可以調用do.call</div><div><span style="color:#333333;"><span style="line-height: 20px;"><span style="font-size: 14px;"></span></span></span><pre name="code" class="plain">args <- list(1:10, na.rm = TRUE);
do.call(mean,args);


Default and missing arguments

A) missing()函數可以驗證是否提供參數
i <- function(a, b) {
  c(missing(a), missing(b))
}
i()
#> [1] TRUE TRUE
i(a = 1)
#> [1] FALSE  TRUE
i(b = 2)
#> [1]  TRUE FALSE
i(1, 2)
#> [1] FALSE FALSE

B) 默認參數可以
     1)在函數體中提供;
     2)形參表中定爲NULL;(推薦)

C) 未知參數可以用
     ...
     表示,使用時必須小心。

Lazy evaluation

所謂“Lazy”是指,函數形參只有在使用的時候纔會eval,如,
f <- function(x) {
  10
}
f(stop("This is an error!"))
#> [1] 10
如果想保證x被eval,那麼可以使用force()函數,如
f <- function(x)
{
  force(x);
  10;
}
這點在sapply或循環創建多個enclosure時,顯得尤其重要
add <- function(x) {
  function(y) x + y
}
adders <- lapply(1:10, add)
adders[[1]](10)
#> [1] 20
adders[[10]](10)
#> [1] 20
因爲在使用lapply創建closure時,x沒有被eval,只有在第一次調用時,才eval x,此時x值是10;
修改如下:
add <- function(x) {
  force(x); # eval x every time create a closure
  function(y) x + y
}
adders <- lapply(1:10, add)
adders[[1]](10)
#> [1] 11
adders[[10]](10)
#> [1] 20

這種Lazily evaluation的好處有(一般是短路技巧):
# 1
`&&` <- function(x,y)
{
  if(!x) return(FALSE);
  if(!y) return(FALSE);
  return(TRUE);
}
x <- NULL;
if(!is.null(x)&&x>0) # if and only if x is not NULL
{print("yes")}

# 2
if(is.null(x)) stop("a is null")
等價於
!is.null(x)||stop("x is null")

Special calls

infix operator
一般函數是prefix function,也就是函數名在參數表後面,類似於雙元運算符的運算符稱爲中序運算符。
R規定,所有prefix function必須以%開始,以%結束。如%*%(矩陣相乘)

R關於prefix function的運算方向規定:R’s default precedence rules mean that infix operators are composed from left to right:
# R 預定的prefix function有: 
%%, %*%, %/%, %in%, %o%, %x%. 
# The complete list of built-in infix operators that don’t need % is:
 ::, :::, $, @, ^, *, /, +, -, >, >=, <, <=, ==, !=, !, &, &&, |, ||, ~, <-, <<-

replacement functions

也就是可以改變形參數值的函數,標誌爲func<-。然而方式不是採用C指針的方式,而是採用複製形參,撤銷傳參的形式。如,
`second<-` <- function(x,value)
{
  x[2] <- value;
  return(x);
}
調用方法
second(x) <- 3;
等價於
x=`second<-`(x,3);
通過pryr::address()函數可以查看對象內存地址,可以發現replacement functions實際是採用複製的形式間接改變傳參值。
> address(x)
[1] "0x15b70688"
> second(x)<-3
> address(x)
[1] "0x144ac9c8"
R語言還有一些內置的replacement function,如[]等,可以作爲左值的函數。


Return values

A) 返回值可以是invisible
f1 <- function() return(invisible(1));
> f1()

B) 調用on.exit保證,不管函數時正常還是非正常退出,都可以執行某一操作
in_dir <- function(dir, code) {
  old <- setwd(dir)
  on.exit(setwd(old)) # 而on.exit當且僅當函數退出時纔會執行。

  force(code)
}
getwd()
#> [1] "/home/travis/build/hadley/adv-r"
in_dir("~", getwd())
#> [1] "/home/travis"













發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章