close

Writing your own functions

計算兩點距離

   1: > calDist<-function(x1, x2){
   2: + d = x1 - x2;
   3: + dist = d * d;
   4: + sdist = sqrt( sum(dist) );
   5: + sdist
   6: + }

定義兩個資料點x1, x2

   1: > x1<-c(1,2)
   2: > x2<-c(0,0)

計算兩點Euclidean distance

   1: > calDist(x1,x2)
   2: [1] 2.236068

回傳結果至另一個變數y

   1: > y = calDist(x1,x2)
   2: > y
   3: [1] 2.236068

範例2: y1 = 2*x1

   1: > timestwo<-function(x){
   2: + y<- 2*x
   3: + print(x)
   4: + print(y)
   5: + }

呼叫timestwo副程式, 其中第2行為印出x1, 第3行為印出y1的值

   1: > y1 = timestwo(x1)
   2: [1] 1 2
   3: [1] 2 4

Writing a for-loop in R

 
> for(i in 1:9)
+ print(i)
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9

寫一個99乘法表

for(i in 1:9)
for(j in 1:9)
print( c(i, j, i*j) );

跑出來如下:

image

產生一個1~50的奇數序列

> foo = seq(1, 100, by=2)
> foo
 [1]  1  3  5  7  9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47
[25] 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95
[49] 97 99

將foo的每個元素取平方, 並存在foo.squared

> foo.squared = NULL

接著跑一個 for-loop , 儲存在foo.squared[i]

 
> for (i in 1:50 ) {
+ foo.squared[i] = foo[i]^2
+ }
> foo.squared
 [1]    1    9   25   49   81  121  169  225  289  361  441  529  625  729
[15]  841  961 1089 1225 1369 1521 1681 1849 2025 2209 2401 2601 2809 3025
[29] 3249 3481 3721 3969 4225 4489 4761 5041 5329 5625 5929 6241 6561 6889
[43] 7225 7569 7921 8281 8649 9025 9409 9801

當然, 如果不跑for-loop 也可以直接將每個元素, 即向量化(vectorization) 取平方

foo.squared2 = foo^2
> foo.squared2
 [1]    1    9   25   49   81  121  169  225  289  361  441  529  625  729
[15]  841  961 1089 1225 1369 1521 1681 1849 2025 2209 2401 2601 2809 3025
[29] 3249 3481 3721 3969 4225 4489 4761 5041 5329 5625 5929 6241 6561 6889
[43] 7225 7569 7921 8281 8649 9025 9409 9801

將不必要的元素刪除提升計算速度

(1)速度快版本: 利用 !is.na(bar.squared) 找出非nan的元素, 並回存bar.squared

bar = seq(1,200000, by=2)
bar.squared = rep(NA, 200000)
 
for (i in 1:length(bar) ) {
bar.squared[i] = bar[i]^2
}
 
#get rid of excess NAs
bar.squared = bar.squared[!is.na(bar.squared)]
summary(bar.squared)

(2)速度慢版本: bar.squared含有nan, 造成進行統計時速度降低很多

bar = seq(1, 200000, by=2)
bar.squared = NULL
 
for (i in 1:length(bar) ) {
bar.squared[i] = bar[i]^2
}
summary(bar.squared)

Least squares

x <- c(0.02, 0.02, 0.06, 0.06, 0.11, 0.11, 0.22, 0.22, 0.56, 0.56,
1.10, 1.10)
y <- c(76, 47, 97, 107, 123, 139, 159, 152, 191, 201, 207, 200)

利用plot(x,y)繪圖

image

假設模型 y = p1*x/p2*x

則SSE(sum of squared error) 函式為

fn <- function(p) sum((y - (p[1] * x)/(p[2] + x))^2)

如果使用spline

> spline(x, y)
$x
 [1] 0.02000000 0.08352941 0.14705882 0.21058824 0.27411765 0.33764706
 [7] 0.40117647 0.46470588 0.52823529 0.59176471 0.65529412 0.71882353
[13] 0.78235294 0.84588235 0.90941176 0.97294118 1.03647059 1.10000000
 
$y
 [1]  61.5000 118.0314 143.0677 154.2625 162.6790 171.1198 179.2518 186.7350
 [9] 193.2297 198.4064 202.1812 204.7155 206.1813 206.7504 206.5948 205.8866
[17] 204.7977 203.5000

並將spline的數據和原始x,y畫在同一張, 不過這一組spline看起來的確很”卡”, 因此有必要內插一些數據讓曲線更加平滑

image

假設我們猜一組參數p1=200, p2 =0.1當作初始值, 並在x區間每隔0.05內插一筆, 存成xfit

xfit <- seq(.02, 1.1, .05)
入函式 yfit = 200*xfit/(0.1+xfit)得到xfit相同資料長度的yfit
plot(x, y)
xfit <- seq(.02, 1.1, .05)
yfit <- 200 * xfit/(0.1 + xfit)
lines(spline(xfit, yfit))

曲線有比較平滑不過誤差頗大, 接著利用nlm(Non-Linear Minimization)函式進行參數估測(parameter estimation)

image

out <- nlm(fn, p = c(200, 0.1), hessian = TRUE)

其中fn為error function

fn <- function(p) sum((y - (p[1] * x)/(p[2] + x))^2)

p為估測參數猜測的初始值200和0.1

   1: > out <- nlm(fn, p = c(200, 0.1), hessian = TRUE)
   2: > out
   3: $minimum
   4: [1] 1195.449
   5:  
   6: $estimate
   7: [1] 212.68384222   0.06412146
   8:  
   9: $gradient
  10: [1] -0.0001534978  0.0934207640
  11:  
  12: $hessian
  13:             [,1]        [,2]
  14: [1,]    11.94725   -7661.319
  15: [2,] -7661.31875 8039421.152
  16:  
  17: $code
  18: [1] 3
  19:  
  20: $iterations
  21: [1] 26

$esitmate 即為參數估測結果

   1: > out$estimate[1]
   2: [1] 212.6838
   3: > out$estimate[2]
   4: [1] 0.06412146

因此, yfit更新為

yfit <- out$estimate[1] * xfit/(out$estimate[2] + xfit)

相當於

yfit <- 212.68384222 * xfit/(0.06412146 + xfit)

(1)繪製原始數據

plot(x, y)

(2)非線性估測的yfit

lines(spline(xfit, yfit))

(1)+(2)結果如下

image

假設估測初始值猜測從(0,0)開始猜, 發現迭代次數反而比較少…微笑

out <- nlm(fn, p = c(0, 0), hessian = TRUE)
$minimum
[1] 1195.449
 
$estimate
[1] 212.68385154   0.06412146
 
$gradient
[1] -4.433729e-05  2.426467e-02
 
$hessian
            [,1]        [,2]
[1,]    11.94725   -7661.319
[2,] -7661.31938 8039422.512
 
$code
[1] 3
 
$iterations
[1] 16

Dynamic graphics

R透過rggobi套件, 才有辦法進行互動

http://www.ggobi.org/

3D繪圖

rgl 套件: 3D visualization device system (OpenGL)


參考資料:

Writing a for-loop in R

arrow
arrow
    全站熱搜

    me1237guy 發表在 痞客邦 留言(0) 人氣()