R 學習筆記(4)－天天向上

Writing your own functions

計算兩點距離

   1: > calDist<-function(x1, x2){   2: + d = x1 - x2;   3: + dist = d * d;   4: + sdist = sqrt( sum(dist) );   5: + sdist   6: + }

定義兩個資料點x1, x2

   1: > x1<-c(1,2)   2: > x2<-c(0,0)

計算兩點Euclidean distance

   1: > calDist(x1,x2)   2: [1] 2.236068

回傳結果至另一個變數y

   1: > y = calDist(x1,x2)   2: > y   3: [1] 2.236068

範例2: y1 = 2*x1

   1: > timestwo<-function(x){   2: + y<- 2*x   3: + print(x)   4: + print(y)   5: + }

呼叫timestwo副程式, 其中第2行為印出x1, 第3行為印出y1的值

   1: > y1 = timestwo(x1)   2: [1] 1 2   3: [1] 2 4

Writing a for-loop in R

 > for(i in 1:9)+ print(i)[1] 1[1] 2[1] 3[1] 4[1] 5[1] 6[1] 7[1] 8[1] 9

寫一個99乘法表

for(i in 1:9)for(j in 1:9)print( c(i, j, i*j) );

跑出來如下:

產生一個1~50的奇數序列

> foo = seq(1, 100, by=2)

> foo [1]  1  3  5  7  9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47[25] 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95[49] 97 99

將foo的每個元素取平方, 並存在foo.squared

> foo.squared = NULL

接著跑一個 for-loop , 儲存在foo.squared[i]

 > for (i in 1:50 ) {+ foo.squared[i] = foo[i]^2+ }> foo.squared [1]    1    9   25   49   81  121  169  225  289  361  441  529  625  729[15]  841  961 1089 1225 1369 1521 1681 1849 2025 2209 2401 2601 2809 3025[29] 3249 3481 3721 3969 4225 4489 4761 5041 5329 5625 5929 6241 6561 6889[43] 7225 7569 7921 8281 8649 9025 9409 9801

當然, 如果不跑for-loop 也可以直接將每個元素, 即向量化(vectorization) 取平方

foo.squared2 = foo^2> foo.squared2 [1]    1    9   25   49   81  121  169  225  289  361  441  529  625  729[15]  841  961 1089 1225 1369 1521 1681 1849 2025 2209 2401 2601 2809 3025[29] 3249 3481 3721 3969 4225 4489 4761 5041 5329 5625 5929 6241 6561 6889[43] 7225 7569 7921 8281 8649 9025 9409 9801

將不必要的元素刪除提升計算速度

(1)速度快版本: 利用 !is.na(bar.squared) 找出非nan的元素, 並回存bar.squared

bar = seq(1,200000, by=2)bar.squared = rep(NA, 200000) for (i in 1:length(bar) ) {bar.squared[i] = bar[i]^2} #get rid of excess NAsbar.squared = bar.squared[!is.na(bar.squared)]summary(bar.squared)

(2)速度慢版本: bar.squared含有nan, 造成進行統計時速度降低很多

bar = seq(1, 200000, by=2)bar.squared = NULL for (i in 1:length(bar) ) {bar.squared[i] = bar[i]^2}summary(bar.squared)

Least squares

x <- c(0.02, 0.02, 0.06, 0.06, 0.11, 0.11, 0.22, 0.22, 0.56, 0.56,1.10, 1.10)y <- c(76, 47, 97, 107, 123, 139, 159, 152, 191, 201, 207, 200)

利用plot(x,y)繪圖

假設模型 y = p1*x/p2*x

則SSE(sum of squared error) 函式為

fn <- function(p) sum((y - (p[1] * x)/(p[2] + x))^2)

如果使用spline

> spline(x, y)$x [1] 0.02000000 0.08352941 0.14705882 0.21058824 0.27411765 0.33764706 [7] 0.40117647 0.46470588 0.52823529 0.59176471 0.65529412 0.71882353[13] 0.78235294 0.84588235 0.90941176 0.97294118 1.03647059 1.10000000 $y [1]  61.5000 118.0314 143.0677 154.2625 162.6790 171.1198 179.2518 186.7350 [9] 193.2297 198.4064 202.1812 204.7155 206.1813 206.7504 206.5948 205.8866[17] 204.7977 203.5000

並將spline的數據和原始x,y畫在同一張, 不過這一組spline看起來的確很”卡”, 因此有必要內插一些數據讓曲線更加平滑

假設我們猜一組參數p1=200, p2 =0.1當作初始值, 並在x區間每隔0.05內插一筆, 存成xfit

xfit <- seq(.02, 1.1, .05)

帶

入函式 yfit = 200*xfit/(0.1+xfit)得到xfit相同資料長度的yfit

plot(x, y)xfit <- seq(.02, 1.1, .05)yfit <- 200 * xfit/(0.1 + xfit)lines(spline(xfit, yfit))

曲線有比較平滑不過誤差頗大, 接著利用nlm(Non-Linear Minimization)函式進行參數估測(parameter estimation)

out <- nlm(fn, p = c(200, 0.1), hessian = TRUE)

其中fn為error function

fn <- function(p) sum((y - (p[1] * x)/(p[2] + x))^2)

p為估測參數猜測的初始值200和0.1

   1: > out <- nlm(fn, p = c(200, 0.1), hessian = TRUE)   2: > out   3: $minimum   4: [1] 1195.449   5:     6: $estimate   7: [1] 212.68384222   0.06412146   8:     9: $gradient  10: [1] -0.0001534978  0.0934207640  11:    12: $hessian  13:             [,1]        [,2]  14: [1,]    11.94725   -7661.319  15: [2,] -7661.31875 8039421.152  16:    17: $code  18: [1] 3  19:    20: $iterations  21: [1] 26

$esitmate 即為參數估測結果

   1: > out$estimate[1]   2: [1] 212.6838   3: > out$estimate[2]   4: [1] 0.06412146

因此, yfit更新為

yfit <- out$estimate[1] * xfit/(out$estimate[2] + xfit)

相當於

yfit <- 212.68384222 * xfit/(0.06412146 + xfit)

(1)繪製原始數據

plot(x, y)

(2)非線性估測的yfit

lines(spline(xfit, yfit))

(1)+(2)結果如下

假設估測初始值猜測從(0,0)開始猜, 發現迭代次數反而比較少…

out <- nlm(fn, p = c(0, 0), hessian = TRUE)

$minimum[1] 1195.449 $estimate[1] 212.68385154   0.06412146 $gradient[1] -4.433729e-05  2.426467e-02 $hessian            [,1]        [,2][1,]    11.94725   -7661.319[2,] -7661.31938 8039422.512 $code[1] 3 $iterations[1] 16

Dynamic graphics

R透過rggobi套件, 才有辦法進行互動

http://www.ggobi.org/

3D繪圖

rgl 套件: 3D visualization device system (OpenGL)

參考資料:

Writing a for-loop in R

me1237guy

天天向上

me1237guy 發表在痞客邦留言(0) 人氣()

天天向上

R 學習筆記(4)

Writing a for-loop in R

Writing a for-loop in R

歷史上的今天

留言列表

站方公告

活動快報

痞客邦...

我的好友

熱門文章

文章分類

最新文章

最新留言

動態訂閱

文章精選

文章搜尋

新聞交換(RSS)

誰來我家

參觀人氣

QR Code

POWERED BY