在R中,字符串其實是字符向量元素。
創建和打印字符串
1.字符向量可以用c函數創建,儘量使用雙引號
c(
"you may not believe",
"I ofen imagined how divine and adorable you can be"
)
Out:
[1] "you may not believe"
[2] "I ofen imagined how divine and adorable you can be"
2.paste函數可以將不同的字符串組合起來。sep參數更改分隔符,collapse參數把結果收縮爲一個包含所有元素的字符串。
paste(c("author","作者"),"朱生豪")
paste(c("author","作者"),"朱生豪",sep=":")
paste(c("author","作者"),"朱生豪",collapse="and")
Out:
[1] "author 朱生豪" "作者 朱生豪"
[1] "author:朱生豪" "作者:朱生豪"
[1] "author 朱生豪and作者 朱生豪"
3.toString函數在打印向量的時候非常有用,width參數限制輸出的字符個數
x <- (1:5)^2
toString(x)
toString(x,width=12)
Out:
[1] "1, 4, 9, 16, 25"
[1] "1, 4, 9,...."
4.當字符串打印到控制檯時,會以雙引號括起來,noquote函數可以去掉這些字符串的雙引號
x <- c("But","when","i","have","seen","you","eventually")
y <- noquote(x)
x
y
Out:
> x
[1] "But" "when" "i" "have" "seen" "you" "eventually"
> y
[1] But when i have seen you eventually
格式化數字
1.formatC函數可以爲數字指定固定型或科學型的格式、小數的位數以及輸出的寬度,使用該函數輸出是character字符向量或數組。默認保留四位有效數字。
format參數可設置爲科學格式,digits參數可以指定有效數字的位數,width參數官方解釋爲 the total field width。
pow <- 1:3
powers_of_e <- exp(pow)
formatC(powers_of_e)
formatC(powers_of_e,format="e")
formatC(powers_of_e,digits=3,width=10)
Out:
[1] "2.718" "7.389" "20.09"
[1] "2.7183e+00" "7.3891e+00" "2.0086e+01"
[1] " 2.72" " 7.39" " 20.1"
2.sprintf函數的用法和C語言中的printf很相似。
%s代表字符串,%f代表固定型格式的浮點數,%e代表科學型格式的浮點數,%d代表整數。
sprintf("To three decimal places, e ^ %d = %.3f", pow, powers_of_e)
Out:
[1] "To three decimal places, e ^ 1 = 2.718"
[2] "To three decimal places, e ^ 2 = 7.389"
[3] "To three decimal places, e ^ 3 = 20.086"
3.format提供的格式化字符串的語法和formatC的用法基本類似。
digits參數表示保留的有效數字個數,scientific參數決定是否用科學記數法,trim參數爲TRUE時,會去掉多餘的0.
format(powers_of_e, digits=3, scientific=TRUE,trim=TRUE)
Out:
[1] "2.72e+00" "7.39e+00" "2.01e+01"
更改大小寫
toupper("you were even more divine and adorable than i fancied")
tolower("YOU CANNOT SAY I AM LING ,FOR IF IT IS NOT TRUE")
Out:
[1] "YOU WERE EVEN MORE DIVINE AND ADORABLE THAN I FANCIED"
[1] "you cannot say i am ling ,for if it is not true"
截取字符串
substring和substr函數可以從字符串中截取子串,不同之處在於,前者輸出的長度和最長的輸入一樣,對後者來說,輸出的長度只與第一個輸入的相等。(第二個向量參數中的元素和第三個參數搭配着來截取的)
poem_sen <- c(
"I will be content with merely mising you",
"rather than die to see you so",
"Don't worry about aging",
"For you must be dazzling",
"even when you are greying"
)
substring(poem_sen, 1:6, 10)
substr(poem_sen, 1:6, 10)
Out:
[1] "I will be " "ather tha" "n't worr" " you mu" " when " "l be "
[1] "I will be " "ather tha" "n't worr" " you mu" " when "
分割字符串
strsplit函數可以在某些指定的點上分割字符串,將字符串按照第二個參數分開,返回的是列表。
strsplit(poem_sen, " ", fixed="TRUE")#按照空格分開
Out:
[[1]]
[1] "I" "will" "be" "content" "with" "merely" "mising" "you"
[[2]]
[1] "rather" "than" "die" "to" "see" "you" "so"
[[3]]
[1] "Don't" "worry" "about" "aging"
[[4]]
[1] "For" "you" "must" "be" "dazzling"
[[5]]
[1] "even" "when" "you" "are" "greying"
我們也可以使用正則表達式來分割字符串。
strsplit(poem_sen, "[A-Z]")
Out:
[[1]]
[1] "" " will be content with merely mising you"
[[2]]
[1] "rather than die to see you so"
[[3]]
[1] "" "on't worry about aging"
[[4]]
[1] "" "or you must be dazzling"
[[5]]
[1] "even when you are greying"
文本路徑
路徑分爲絕對路徑和相對路徑,在相對路徑中,.用於當前目錄,而…用於父目錄,~代表當前用戶主目錄。
path.expand可以將相對路徑轉爲絕對路徑。
path.expand(".")
path.expand("..")
path.expand("~")
Out:
[1] "."
[1] ".."
[1] "C:/Users/Beryl/Documents"
basename函數只返回文件名,dirname只返回文件目錄
file <- "E:/Ksoftware/Rstudio/R/modules/ModuleTools.R"
basename(file)
dirname(file)
Out:
[1] "ModuleTools.R"
[1] "E:/Ksoftware/Rstudio/R/modules"
getwd()#查看R中文件被讀寫的地方
setwd("E:/Data/Rstudio")#更改位置
file.path("E:","Data","R")#可自動在目錄名稱之間插入正斜槓
R.home()#R的安裝位置
Out:
[1] "C:/Users/Beryl/Documents"
[1] "E:/Data/R"
[1] "E:/KSOFTW~1/R/R-36~1.1"