先看一下R能幹什麼
df <- mtcars
?mtcars
names(df)
head(df)
nrow(df)
summary(df)
hist(df$hp)
plot(df$hp,df$qsec)
cor(df$hp,df$qsec)
cor(df$cyl,df$qsec)
df$hpPerCyl <- df$hp/df$cyl
df[order(df$hpPerCyl),]
head(df[order(df$hpPerCyl),])
基礎內容
如何給變量賦值
#either
x <- 1
#or
x = 1
注意不要忘了空白號,x<-1會被翻譯成x <- 1而不是x < -1
R的基本原子類型(atomic types)
x <- TRUE #class(x) is "logical"
x <- 1L #class(x) is "integer"
x <- 1.5 #class(x) is "numeric"
x <- 1.5 + o.4i #class(x) is "complex"
x <- "Text" #class(x) is "character"
R中不含有標量(scalars),標量在這裏被表示爲長度爲1的向量。
TRUE和FALSE可以簡單的用T,F來代替
Strings即可以用雙括號也可以用單括號闊起來
如何調用函數
ls #is a function object
ls() #is the object returned by calling the function
如何尋求幫助
#open help:
?ls
#search help for the specific topic:
??correlation
#other functions to analyze things
str(x)
summary(x)
class(x)
attributes(x)
R的控制流
#if:
if(STATEMENT)
STATEMENT
else
STATEMENT
#for loop:
for(name in vector)
STATEMENT
#repeat
repeat
STATEMENT #until 'break' is called
有趣的地方是:if結構可以被當成statement使用(只有if結構可以):
y <- if(x > 0) "y" else "no"
另外STATEMENT也可以被用{}包圍起來的一組statement代替,這些語句可以全部扔在一行,也可以分成多行寫
if({x <- mean(1:5); y <- mean(1:4); x<y}){
cat("evaluating the 'true' block \n")
"y"
}else {
cat("evaluating the 'false' block \n")
"n"
}
向量(Vectors)
如何生成向量
x <- c(1,2,3) #c for "c"Kombine
x <- seq(1,3) #to create a sequence, with possibility to
x <- seq(1,3,by=0.5) #-define step size
x <- seq(1,3,len=10) #-define numer of steps
x <- 1:3 #quick notation for integer sequences
x <- rep(0,10) #repeat first "vector" 10 times
x <- numeric(10)#"numeric" vector of length 10
x[1] = 10
x[2] = 20
x[3] = 30
無論如何c()的結果都會被鋪平所以試試這個吧:
x <- c(1,c(c(2,3),c(4,5,6)))
另外原子向量中是允許缺值的:
c(T,F,T,NA)
c(1:5,NA)
c("A",NA,"B","C",NA)
如果在一個Vector中混合不同類型的值會怎麼樣呢
因爲vector中的值的類性應該一直保持一致,因此當上述情況發生時,R會強制改變類型。他們的優先順序是:logical,integer,numeric,complex,character
#notice how the type changes when you remove elements from the end
x <- c(T,1L,1.0,1i,"text")
#notice that modifying elements can change the whole type
x <- rep(F,10)
x[6] = "text"
關於vector的一些重要函數
x <- runif(10)
length(x)
sum(x)
#statistics
mean(x)
median(x)
var(x)
sd(x)
quantile(x)
#range related
min(x)
max(x)
range(x)
#sorting related
sort(x)
rank(x)
order(x)
#vectorized functions
x + 1
x * 10
x < 0.5
(x < 0.2)|(x > 0.8)
sin(x)
exp(x)
round(x)
如果用不同長度的兩個vector進行運算會怎麼樣
c(1,2,3,4) + c(1,2)
c(1,2,3,4) + c(1,2,3)
c(1,2,3,4) > 1
R採用的是循環的機制,但這也就要求長的vector的長度應該是短的vector的長度的整數倍
如何使用索引
x <- 1:10 * 10
#a positive index select an element
#a negative index omits an element
x[3]
x[-3]
#indices can again be vectors
x[c(1,3,5)]
x[3:5]
x[-c(1,3,5)]
x[-(3:5)]
#note:mixing positive and negative indices does not make sense
#indices can also be logical vectors
x[c(T,F,T,F,T,F,F,F,F,F)]
x[x == 10 | x == 30]
#indices can even be named
names(x) <- c("a","b","c","d","e","f","g","h","i","j")
x
x["c"]
矩陣和數組
R的vector其實就是其他語言中的array。他沒有緯度的概念,只有長度。
R的array是特指多維的array。他有確定的緯度。但是本質上,他也是由vector實現的
R的matrix就是指一個二維的array
如何構建一個matrix
#generate without a vector
m <- matrix(nrow=2, ncol=3)
m[1,1] = 1
m[2,1] = 2
m[1,2] = 3
m[2,2] = 4
m[1,3] = 5
m[2,3] = 6
#generate from ONE vector
matrix(1:6, nrow=2, ncol=3)
matrix(1:6, nrow=2)
matrix(1:6, ncol=3)
matrix(1:6, byrow=TRUE, ncol=3)
#generate from multiple column/row vectors
rbind(c(1,3,5),c(2,4,6))
cbind(1:2,3:4,5:6)
因爲matrix本質上就是一個擁有緯度的vector,因此也可以通過vector來構造matrix
m <- 1:12
dim(m) <- c(2,6)
dim(m) <- c(3,2,2)
c(is.vector(m),is.matrix(m),is.array(m))
matrix又是如何使用索引的呢
和vector一樣
m[2,2]
m[1,1:3]
m[1,]
m[,2]
唯一不同的是現在要給元素起名的話,要加上緯度屬性了
colnames(m) <- c("A","B","C")
rownames(m) <- c("o1","o2")
dimnames(m)
attibutes(m)
進行簡單的線性運算
m <- cbind(1:2,3:4,5:6)
#multiply by vectors
m %*% c(1,2,4)
c(5,6) %*% m
第一個c三行一列,第二個c一行兩列。真心不知道他在玩什麼。以爲他能自動轉化嗎,於是試了以下用c(1,2,3,4) %% m.結果卻是出現錯誤,我還以爲可以自動翻譯成兩行兩列呢??然後又試了以下,5 %% m。結果還是錯了
#multiply by matrices
m %*% matrix(runif(6),nrow=3)
#typical unary operators
m <- cbind(1:2,3:4)
t(m) #transpose
diag(m) #diagonal
solve(m) #inverse
eigen(m) #eigenvector/eigenvalues
#solving linear equations
solve(m,c(1,1)) #solve mx=c(1,1) for x
#misc matrix functions
dim(m)
nrow(m)
ncol(m)
rowSums(m)
colSums(m)
rowMeans(m)
colMeans(m)
其他數據類型
R語言中的list
#to construct a list
l <- list(name="Joe", unemployed=FALSE, salary=50000)
#naming the fields is optional
l <- list("Joe",FALSE,50000)
#in fact, the mechanism to set the "names" is the same as for vectors
names(l) <- c("name", "unemployed", "salary")
#access single elements
l[[1]]
l$name
l["name]
l$sal #it is even allowed to abbreviate the names if it is unique
#generate a "sub" list
l[c(1,3)]
l[c("name","salary")]
l[1]
#"fields" can be added dynamically
l$department <- "IT"
l[["position"]] <- c("consultant","developer")
#note that this can create gaps and unnamed fields
l[[8]] <- TRUE
好像只有在$後面的才能以省略
R語言中的factor
f <- factor(c("A","B","C","A","B","A","A"), ordered = T)
attributes(f)
levels(f)
summary(f)
#to rename a category
levels(f)[1] <- "a"
#to convert to a numeric type
as.numeric(f)
#or:
attributes(f) <- NULL
R語言的data.frame
#create a data frame manually
df <- data.frame(x=runif(20), y=runif(20), type=as.factor(runif(20) > 0.5))
str(df)
#create a new "feature"
df$z <- df$x * df$y
df
#sort by a feature
permutation <- order(df$x)
df <- df[permutation,]
#remove features
toRemove <- c("x","y")
toRemoveLogical <- names(df) %in% toRemove
df <- df[,!toRemoveLogical]
df <- df[,!toRemoveLogical, drop= FALSE]
#better if only one feature is kept
#there is also the powerful "subset" function
#parameter "subset" works on rows
#parameter "subset" works on columns
subset(df, subset = x > 0.5, select = c(y,z))
什麼是S3/S4 classes
不知??
Functions
如何定義一個function
nameOfMyFunction <- function(a,b){
return (a+b)
}
注意,return需要有圓括號,另外如果沒有return的話,那就默認返回最後一個statement
如何定義函數的默認參數
myfunc <- function(range = 1 : myelin) range^2
myfunc(1:5)
mylen=10
myfunc()
mylen=5
myfunc()
rm(malen)
myfunc()
如何返回多於一個值
myfunc <- function(){
list(result=42,additional="no errors",numberOfIterations=4)
}
ret <- myfunc()
ret$result
ret$numberOfIterations
補充
R裏是否有預定義的變量
答案是有得如:
mtcars
Nile
iris
diamonds
如何連接兩個strings
注意不能直接用“+”連接起來
paste("concatenate", "these", "strings")
paste("concatenate", "these", "strings", seq="")
也可以用paste0(…),他的效率比paste好一點點
如何標準輸出
x <- 1:10
cat(sprintf("the sum of %d elements is %f\n", length(x), sum(x)))
如何生成隨機數
hist(rnorm(1000))
hist(runif(1000))
hist(rbeta(1000,2,4))
hist(rbinom(1000,3,0.5))
Plotting
#histograms:
hist(rnom(1000))
#scatter plots:
plot(rnom(100), rnorm(100)
#line plots:
x <- seq(0, 2*pi, len=100)
plot(x, sin(x)^2+tan(x), type='l'
Numerical Measures
duration = faithful$eruptions
mean(duration)
median(duration)
quantile(duration)
quantile(duration,c(.37,.45,.99))
var(duration) #variance
waiting = faithful$waiting
cov(duration,waiting) #covariance
cor(duration, waiting) #correlation coefficient
cov(duration, waiting)/(sd(duration)*sd(waiting))#standard deviations
概率分佈
//待補
靜態測試
//待補