R 學習筆記(6)－天天向上

(1)numeric資料格式

   1: > x=c(1,2,3,4)   2: > x   3: [1] 1 2 3 4   4: > class(x)   5: [1] "numeric"

(2)integer資料格式

   1: > y = as.integer(x)   2: > class(y)   3: [1] "integer"

(3)logical資料格式

   1: > z = x==y   2: > class(z)   3: [1] "logical"

判斷是否為logical資料格式

   1: > is.logical(x)   2: [1] FALSE   3: > is.logical(y)   4: [1] FALSE   5: > is.logical(z)   6: [1] TRUE

判斷是否為numeric資料格式

   1: > is.numeric(x)   2: [1] TRUE   3: > is.numeric(y)   4: [1] TRUE   5: > is.numeric(z)   6: [1] FALSE

(4)character資料格式

   1: > m = c('1')   2: > class(m)   3: [1] "character"   4: > n=c("1")   5: > class(n)   6: [1] "character"

單引號和雙引號是否一樣?

   1: > m==n   2: [1] TRUE

   1: > o=c("Hello","wolrd")   2: > o=="Hello"   3: [1]  TRUE FALSE   4: > o=='Hello'   5: [1]  TRUE FALSE

(5)factor資料格式: 以數字型式表現字元資料型態的變數, 或是離散化字元型態的資料格式

   1: > weather = factor(c(0,1,2,1,2,2,3),levels=c(0,1,2,3),labels=c("晴天","陰天","雨天","起霧"))   2: > weather   3: [1] 晴天 陰天 雨天 陰天 雨天 雨天 起霧   4: Levels: 晴天 陰天 雨天 起霧

factor資料格式無法進行四則運算, 例如 weather+1

   1: > weather+1   2: [1] NA NA NA NA NA NA NA   3: 警告訊息：   4: In Ops.factor(weather, 1) : ‘+’ not meaningful for factors

轉數值(numeric)可以取得離散化的索引值序列, 從1開始

   1: > ind = as.numeric(weather)   2: > ind   3: [1] 1 2 3 2 3 3 4

假設想要篩選索引等於2(陰天)的資料資料該如何篩選?

利用ind==2來比對序列化的資料,相當於 [1 2 3 2 3 3 4]==2 的結果

   1: > c(1, 2, 3, 2, 3, 3, 4)==2   2: [1] FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE

   1: > pos = (ind == 2)    2: > pos   3: [1] FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE

將pos的結果篩選資料,如同陣列索引方式傳入weather變數

   1: > weather[pos]   2: [1] 陰天 陰天   3: Levels: 晴天 陰天 雨天 起霧

輸入指令 library(), 可以查詢目前R提供的套件

利用ls()指令觀察剛才載入的物件, 也就是Insurance

   1: > ls()   2: [1] "ind"       "Insurance" "pos"       "weather"  

Insurance資料格式: data.frame

   1: > class(Insurance)   2: [1] "data.frame"

dim()指令取出資料維度

   1: > dim(Insurance)   2: [1] 64  5   3: > dim(Insurance)[1]   4: [1] 64   5: > dim(Insurance)[2]   6: [1] 5

觀察資料

取得欄位名稱

   1: > names(Insurance)   2: [1] "District" "Group"    "Age"      "Holders"  "Claims" 

取得前兩筆欄位名稱

(1)利用[]運算子

   1: > names(Insurance)[1:2]   2: [1] "District" "Group"  

(2)利用head()指令

   1: > head(names(Insurance), n=2)   2: [1] "District" "Group"   

所以我推敲head()可以套用在取資料上前n筆

   1: > head(Insurance, n=5)   2:   District  Group   Age Holders Claims   3: 1        1    <1l   <25     197     38   4: 2        1    <1l 25-29     264     35   5: 3        1    <1l 30-35     246     20   6: 4        1    <1l   >35    1680    156   7: 5        1 1-1.5l   <25     284     63

那麼取後兩筆資料指令

   1: > tail(Insurance, n=2)   2:    District Group   Age Holders Claims   3: 63        4   >2l 30-35      25      8   4: 64        4   >2l   >35     114     33

取得Insurance前5筆資料

   1: > Insurance[1:5,]   2:   District  Group   Age Holders Claims   3: 1        1    <1l   <25     197     38   4: 2        1    <1l 25-29     264     35   5: 3        1    <1l 30-35     246     20   6: 4        1    <1l   >35    1680    156   7: 5        1 1-1.5l   <25     284     63

取得每一行資料

Column 1. District投保人居住區 2.投保汽車引擎排氣量 3.投保人年齡 4. 投保人數 5.要求索賠人數

   1: > Insurance[,1]   2:  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3   3: [37] 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4   4: Levels: 1 2 3 4   5: > Insurance[,2]   6:  [1] <1l    <1l    <1l    <1l    1-1.5l 1-1.5l 1-1.5l 1-1.5l 1.5-2l 1.5-2l   7: [11] 1.5-2l 1.5-2l >2l    >2l    >2l    >2l    <1l    <1l    <1l    <1l      8: [21] 1-1.5l 1-1.5l 1-1.5l 1-1.5l 1.5-2l 1.5-2l 1.5-2l 1.5-2l >2l    >2l      9: [31] >2l    >2l    <1l    <1l    <1l    <1l    1-1.5l 1-1.5l 1-1.5l 1-1.5l  10: [41] 1.5-2l 1.5-2l 1.5-2l 1.5-2l >2l    >2l    >2l    >2l    <1l    <1l     11: [51] <1l    <1l    1-1.5l 1-1.5l 1-1.5l 1-1.5l 1.5-2l 1.5-2l 1.5-2l 1.5-2l  12: [61] >2l    >2l    >2l    >2l     13: Levels: <1l < 1-1.5l < 1.5-2l < >2l  14: > Insurance[,3]  15:  [1] <25   25-29 30-35 >35   <25   25-29 30-35 >35   <25   25-29 30-35 >35    16: [13] <25   25-29 30-35 >35   <25   25-29 30-35 >35   <25   25-29 30-35 >35    17: [25] <25   25-29 30-35 >35   <25   25-29 30-35 >35   <25   25-29 30-35 >35    18: [37] <25   25-29 30-35 >35   <25   25-29 30-35 >35   <25   25-29 30-35 >35    19: [49] <25   25-29 30-35 >35   <25   25-29 30-35 >35   <25   25-29 30-35 >35    20: [61] <25   25-29 30-35 >35    21: Levels: <25 < 25-29 < 30-35 < >35  22: > Insurance[,4]  23:  [1]  197  264  246 1680  284  536  696 3582  133  286  355 1640   24   71  24: [15]   99  452   85  139  151  931  149  313  419 2443   66  175  221 1110  25: [29]    9   48   72  322   35   73   89  648   53  155  240 1635   24   78  26: [43]  121  692    7   29   43  245   20   33   40  316   31   81  122  724  27: [57]   18   39   68  344    3   16   25  114  28: > Insurance[,5]  29:  [1]  38  35  20 156  63  84  89 400  19  52  74 233   4  18  19  77  22  19  30: [19]  22  87  25  51  49 290  14  46  39 143   4  15  12  53   5  11  10  67  31: [37]  10  24  37 187   8  19  24 101   3   2   8  37   2   5   4  36   7  10  32: [55]  22 102   5   7  16  63   0   6   8  33

觀察每一行的資料格式

   1: > class(Insurance[,1])   2: [1] "factor"   3: > class(Insurance[,2])   4: [1] "ordered" "factor"    5: > class(Insurance[,3])   6: [1] "ordered" "factor"    7: > class(Insurance[,4])   8: [1] "integer"   9: > class(Insurance[,5])  10: [1] "integer"

計算索賠比例

   1: > ratio = Insurance[,5]/Insurance[,4]   2: > ratio   3:  [1] 0.19289340 0.13257576 0.08130081 0.09285714 0.22183099 0.15671642   4:  [7] 0.12787356 0.11166946 0.14285714 0.18181818 0.20845070 0.14207317   5: [13] 0.16666667 0.25352113 0.19191919 0.17035398 0.25882353 0.13669065   6: [19] 0.14569536 0.09344791 0.16778523 0.16293930 0.11694511 0.11870651   7: [25] 0.21212121 0.26285714 0.17647059 0.12882883 0.44444444 0.31250000   8: [31] 0.16666667 0.16459627 0.14285714 0.15068493 0.11235955 0.10339506   9: [37] 0.18867925 0.15483871 0.15416667 0.11437309 0.33333333 0.24358974  10: [43] 0.19834711 0.14595376 0.42857143 0.06896552 0.18604651 0.15102041  11: [49] 0.10000000 0.15151515 0.10000000 0.11392405 0.22580645 0.12345679  12: [55] 0.18032787 0.14088398 0.27777778 0.17948718 0.23529412 0.18313953  13: [61] 0.00000000 0.37500000 0.32000000 0.28947368

Age分成四個levels

   1: > levels(Insurance$Age)   2: [1] "<25"   "25-29" "30-35" ">35"  

更新Age的levels的名稱

   1: > levels(Insurance$Age) = c("少年","青年","成人","中年")   2: > levels(Insurance$Age)   3: [1] "少年" "青年" "成人" "中年"

取得Insurance 第2~5資料

   1: > Insurance[2:5,]   2:   District  Group  Age Holders Claims   3: 2        1    <1l 青年     264     35   4: 3        1    <1l 成人     246     20   5: 4        1    <1l 中年    1680    156   6: 5        1 1-1.5l 少年     284     63

me1237guy

天天向上

me1237guy 發表在痞客邦留言(0) 人氣()

天天向上

R 學習筆記(6)

歷史上的今天

留言列表

站方公告

活動快報

輕薄透...

我的好友

熱門文章

文章分類

最新文章

最新留言

動態訂閱

文章精選

文章搜尋

新聞交換(RSS)

誰來我家

參觀人氣

QR Code

POWERED BY