Learning Python N-Dimensional Array - Part III－天天向上

接續這兩篇

Learning Python N-Dimensional Array–Part I

Learning Python N-Dimensional Array-Part II
接下來, 利用Python 內建函式庫計算求和, 平均值和方差

a = np.random.randint(0, 10, size=(4, 5)) aOut[11]: array([[7, 5, 1, 1, 2],       [8, 8, 2, 8, 2],       [0, 3, 2, 7, 6],       [3, 9, 7, 8, 2]])

沿著axis0, axis1進行加總

np.sum(a, axis=0)Out[13]: array([18, 25, 12, 24, 12]) np.sum(a, axis=1)Out[14]: array([16, 28, 18, 29])

所有元素加總

np.sum(a)Out[12]: 91

計算a沿著axis=0, 1, 和全部元素的平均值

n [15]: aOut[15]: array([[7, 5, 1, 1, 2],       [8, 8, 2, 8, 2],       [0, 3, 2, 7, 6],       [3, 9, 7, 8, 2]]) In [16]: np.mean(a, axis=0)Out[16]: array([ 4.5 ,  6.25,  3.  ,  6.  ,  3.  ]) In [17]: np.mean(a, axis=1)Out[17]: array([ 3.2,  5.6,  3.6,  5.8]) In [18]: np.mean(a)Out[18]: 4.5499999999999998

計算a沿著axis=0, 1, 和全部元素的方差

 var(a, axis=0)Out[19]: array([ 10.25  ,   5.6875,   5.5   ,   8.5   ,   3.    ]) var(a, axis=1)Out[20]: array([ 5.76,  8.64,  6.64,  7.76]) var(a)Out[21]: 8.5474999999999994

計算a沿著axis=0, 1, 和全部元素的標準差

std(a, axis=0)Out[22]: array([ 3.20156212,  2.384848  ,  2.34520788,  2.91547595,  1.73205081]) std(a, axis=1)Out[23]: array([ 2.4       ,  2.93938769,  2.57681975,  2.78567766]) std(a)Out[24]: 2.9236107812087435

計算a沿著axis=0, 1, 和全部元素的最大值

aOut[28]: array([[7, 5, 1, 1, 2],       [8, 8, 2, 8, 2],       [0, 3, 2, 7, 6],       [3, 9, 7, 8, 2]]) np.max(a, axis=0)Out[29]: array([8, 9, 7, 8, 6]) np.max(a, axis=1)Out[30]: array([7, 8, 7, 9]) np.max(a)Out[31]: 9

最大的索引值

 np.argmax(a)Out[32]: 16

承上, 已經知道二微陣列最大值索引, 如何手動取得對應的最大值?

arr 為二維陣列a的全部索引值, 利用布林索引方式,

取出最大索引值對應的數值

 arr = np.arange(20).reshape((4, 5)) arrOut[54]: array([[ 0,  1,  2,  3,  4],       [ 5,  6,  7,  8,  9],       [10, 11, 12, 13, 14],       [15, 16, 17, 18, 19]]) aOut[57]: array([[7, 5, 1, 1, 2],       [8, 8, 2, 8, 2],       [0, 3, 2, 7, 6],       [3, 9, 7, 8, 2]]) a[arr==np.argmax(a)]Out[58]: array([9]) 

更簡單的方式 revel()

Return a flattened array. 將a轉成1維, 並傳入對應1維的最大索引值

a.ravel()[np.argmax(a)]Out[70]: 9

或是將1維索引轉回2維索引值unravel_index()

其中16為最大索引值(一維)

其對應二維最大索引值為(3, 1), 即3列1行

 ind = np.unravel_index( 16, a.shape) indOut[78]: (3, 1) aOut[79]: array([[7, 5, 1, 1, 2],       [8, 8, 2, 8, 2],       [0, 3, 2, 7, 6],       [3, 9, 7, 8, 2]])

a列方向排序

aOut[80]: array([[7, 5, 1, 1, 2],       [8, 8, 2, 8, 2],       [0, 3, 2, 7, 6],       [3, 9, 7, 8, 2]]) a.sort(axis=0) aOut[86]: array([[0, 3, 1, 1, 2],       [3, 5, 2, 7, 2],       [7, 8, 2, 8, 2],       [8, 9, 7, 8, 6]])

a行方向排序

 a.sort(axis=1) aOut[88]: array([[0, 1, 1, 2, 3],       [2, 2, 3, 5, 7],       [2, 2, 7, 8, 8],       [6, 7, 8, 8, 9]])

回傳a陣列unique value 和 index

a = np.random.randint(10, size = (3,4)) aOut[107]: array([[6, 7, 9, 2],       [4, 6, 6, 3],       [1, 7, 4, 7]]) x, ind = np.unique(a, return_index=True) xOut[111]: array([1, 2, 3, 4, 6, 7, 9]) indOut[112]: array([8, 3, 7, 4, 0, 1, 2])

a=[1, 1, 2, 2, 2, 3, 4, 4]

 a=array([1,1, 2, 2, 2, 3,4,4]) aOut[128]: array([1, 1, 2, 2, 2, 3, 4, 4])

計算 unique(a) 計算每個unique元素出現次數

出現次數會以0~4進行出現頻率計算

np.bincount(a)Out[129]: array([0, 2, 3, 1, 2]) np.unique(a)Out[130]: array([1, 2, 3, 4])

重新定義a

a =[10, 1, 1, 2, 2, 2, 3, 4, 4] 介於0~10

出現次數會以0~10進行出現頻率計算

a=array([10, 1,1, 2, 2, 2, 3,4,4]) np.bincount(a)Out[132]: array([0, 2, 3, 1, 2, 0, 0, 0, 0, 0, 1])

隱藏陣列

import numpy.ma as ma x = np([1, 2, 3, 5, 7, 4, 3, 2, 8, 0]) mask = x<5 maskOut[138]: array([ True,  True,  True, False, False,  True,  True,  True, False,  True], dtype=bool) mx = ma.array(x, mask=mask) mxOut[140]: masked_array(data = [-- -- -- 5 7 -- -- -- 8 --],             mask = [ True  True  True False False  True  True  True False  True],       fill_value = 999999)

計算mean

y =mx.mean() yOut[145]: 6.666666666666667

np.mean()也可以求得隱藏陣列的平均值

 z=np.mean(x[~mask]) zOut[149]: 6.666666666666667

0: 0.1+1.2 = 1.3

1: 0.3+0.5 + 0.8=1.6

2: 0.2+0.4 = 0.6

 x = np.array([0, 1, 2, 2, 1, 1, 0]) w = np.array([0.1, 0.3, 0.2, 0.4, 0.5, 0.8, 1.2]) np.bincount(x, w)Out[5]: array([ 1.3,  1.6,  0.6])

histogram()

0~0.2: 19 筆; 0.2~0.4: 22 筆; 以此類推…

a = np.random.rand(100) aOut[7]: array([ 0.00872626,  0.36558685,  0.68205683,  0.42889437,  0.97332189,        0.51793388,  0.33187748,  0.39076355,  0.65933135,  0.46039967,        0.4531602 ,  0.46032272,  0.2162054 ,  0.99834442,  0.99574178,        0.84728636,  0.14715493,  0.11302403,  0.72354382,  0.42097522,        0.14111554,  0.37278971,  0.5613764 ,  0.29387561,  0.34060089,        0.87344041,  0.63322027,  0.52276657,  0.20584798,  0.41653945,        0.1914504 ,  0.25949296,  0.97079113,  0.42865701,  0.40900406,        0.99593667,  0.14859718,  0.32781547,  0.86623437,  0.09069545,        0.58958441,  0.43301911,  0.07623798,  0.55077995,  0.32233891,        0.22505729,  0.24731831,  0.75467141,  0.86785649,  0.26346466,        0.47383062,  0.59548231,  0.17756108,  0.4445461 ,  0.09928862,        0.19127033,  0.32028578,  0.376644  ,  0.43897254,  0.38783224,        0.60179702,  0.52129171,  0.46613597,  0.54652293,  0.35948433,        0.21664976,  0.95731711,  0.99504905,  0.59254467,  0.42166526,        0.20300776,  0.81326924,  0.92572197,  0.8328689 ,  0.01605331,        0.41221627,  0.97628396,  0.33769637,  0.13246742,  0.64917933,        0.41369906,  0.0193705 ,  0.58424844,  0.37340307,  0.57927891,        0.47939027,  0.99614169,  0.13922503,  0.72049764,  0.17861748,        0.14567132,  0.15476356,  0.17753868,  0.93961619,  0.95160203,        0.60532905,  0.7173731 ,  0.74216991,  0.41021054,  0.78688659]) b = np.histogram(a, bins=5, range=(0, 1)) bOut[9]: (array([19, 22, 29, 12, 18]), array([ 0. ,  0.2,  0.4,  0.6,  0.8,  1. ]))

字串向量[2]

先來看基本的數值轉字串

str(10)Out[1]: '10' str(10.2)Out[2]: '10.2' str('a')Out[3]: 'a' str('a')+str(10)Out[4]: 'a10' 'a'+str(10)Out[5]: 'a10'

字串轉數值

int('10')Out[6]: 10

如果浮點字串, 用int()會發生錯誤, 須改用float()

 int("10.2")---------------------------------------------------------------------------ValueError                                Traceback (most recent call last)<ipython-input-8-81225a994e2f> in <module>()----> 1 int("10.2") float('10')Out[13]: 10.0 

float和double差異

float("10.2")Out[9]: 10.2 double('10.2')Out[11]: 10.199999999999999

產生一個1x10字串向量(每個元素為’’)

 strs = ['' for x in arange(10)] strsOut[15]: ['', '', '', '', '', '', '', '', '', '']

將0~9數值轉字串並儲存在strs[i], 接著print出來

for i in arange(10):    strs[i] = str(i)    print strs[i]    0123456789

修改strs[1]內容

strs[1]='abc' strsOut[30]: ['0', 'abc', '2', '3', '4', '5', '6', '7', '8', '9']

搜尋字串’abc’

 matching = [ s for s in strs if "abc" in s] matchingOut[37]: ['abc']

接下來, 練習常用到的功能:讀寫檔案

書本的範例[1], 讀取一個2行的資料 height.csv

(1) 利用 genfromtxt 讀入

仔細觀察, 你將會發現第一筆資料沒有正確

a=np.genfromtxt("height.csv", delimiter=',', dtype=['double','double'], names=('len', 'hei')) aOut[6]: array([(nan, 173.4), (7.8, 126.2), (8.5, 131.2), (12.5, 155.0),       (7.4, 126.8), (15.0, 170.4), (7.1, 121.8), (15.2, 169.3),       (19.2, 176.8), (16.6, 175.2), (18.7, 175.9), (13.3, 160.4),       (19.1, 174.6), (15.1, 169.9), (16.7, 173.1), (12.7, 155.8),       (19.3, 175.4), (18.6, 174.4), (11.8, 148.5), (15.5, 172.5),       (17.2, 175.2), (18.3, 175.6), (7.1, 123.1), (18.5, 170.8),       (7.4, 125.0), (7.4, 128.4), (9.8, 140.9), (16.8, 175.8),       (10.0, 142.8), (10.9, 146.3), (9.4, 137.2), (13.5, 163.2),       (15.8, 174.7), (18.4, 174.3), (10.4, 143.6), (12.4, 153.3),       (7.1, 127.2), (16.2, 171.9), (12.2, 156.6), (9.4, 135.4),       (16.6, 172.4), (18.6, 176.8), (9.9, 140.2), (11.0, 148.0),       (18.3, 173.0), (18.9, 172.0), (10.1, 143.1), (13.7, 165.0),       (15.2, 169.9), (12.5, 153.6), (15.9, 178.2), (10.4, 143.7),       (17.2, 173.9), (11.5, 151.1), (12.5, 154.1), (19.2, 178.8),       (8.6, 132.1), (12.3, 153.6), (9.3, 137.2), (13.0, 161.0),       (18.3, 173.8), (15.7, 176.3), (13.0, 161.3), (13.3, 160.0),       (18.8, 174.6), (14.4, 166.6), (14.0, 164.9), (19.9, 173.9),       (8.8, 134.5), (16.3, 171.4), (8.0, 133.0), (12.6, 153.2),       (7.9, 126.4), (7.6, 131.2), (13.4, 161.0), (15.7, 172.7),       (10.7, 144.1), (18.9, 175.7), (15.6, 173.4), (17.6, 175.3),       (17.8, 176.7), (19.0, 173.0), (10.2, 142.1), (10.7, 143.5),       (11.5, 147.2), (8.4, 130.6), (9.6, 139.7), (12.0, 151.4),       (12.1, 147.8), (8.3, 131.0), (9.4, 134.2), (7.3, 123.5),       (13.7, 163.3), (11.2, 145.9), (13.8, 164.2), (19.6, 175.9),       (19.0, 172.2), (14.7, 169.1), (15.8, 173.9), (10.8, 145.0)],       dtype=[('len', '<f8'), ('hei', '<f8')])

(2) 利用 loadtxt讀入

a=np.loadtxt("height.csv", dtype={'names': ('width', 'height'), 'formats': (np.double, np.double)},delimiter={' ,',' '}, skiprows=0)---------------------------------------------------------------------------ValueError                                Traceback (most recent call last)<ipython-input-7-fd748fa040a8> in <module>()----> 1 a=np.loadtxt("height.csv", dtype={'names': ('width', 'height'), 'formats': (np.double, np.double)},delimiter={' ,',' '}, skiprows=0) C:\Python27\lib\site-packages\numpy\lib\npyio.pyc in loadtxt(fname, dtype, comments, delimiter, converters, skiprows, usecols, unpack, ndmin)    858     859             # Convert each value according to its column and store--> 860             items = [conv(val) for (conv, val) in zip(converters, vals)]    861             # Then pack it according to the dtype's nesting    862             items = pack_items(items, packing) ValueError: could not convert string to float: 嚜� 18.0, 173.4

loadtxt更慘, 讀第一筆就掛掉了!!!

無論我把18.0前面的空格刪掉, 或是自己重新key-in, 都還是有問題!

原來是檔案height.csv加了BOM

利用Notepad++編碼, 轉換至UTF-8碼格式(檔首無BOM), 並再次儲存檔案

(1) 拿掉BOM後, 再次利用 genfromtxt 讀入

a=np.genfromtxt("height.csv", delimiter=',', dtype=['double','double'], names=('len', 'hei')) aOut[20]: array([(18.0, 173.4), (7.8, 126.2), (8.5, 131.2), (12.5, 155.0),       (7.4, 126.8), (15.0, 170.4), (7.1, 121.8), (15.2, 169.3),       (19.2, 176.8), (16.6, 175.2), (18.7, 175.9), (13.3, 160.4),       (19.1, 174.6), (15.1, 169.9), (16.7, 173.1), (12.7, 155.8),       (19.3, 175.4), (18.6, 174.4), (11.8, 148.5), (15.5, 172.5),       (17.2, 175.2), (18.3, 175.6), (7.1, 123.1), (18.5, 170.8),       (7.4, 125.0), (7.4, 128.4), (9.8, 140.9), (16.8, 175.8),       (10.0, 142.8), (10.9, 146.3), (9.4, 137.2), (13.5, 163.2),       (15.8, 174.7), (18.4, 174.3), (10.4, 143.6), (12.4, 153.3),       (7.1, 127.2), (16.2, 171.9), (12.2, 156.6), (9.4, 135.4),       (16.6, 172.4), (18.6, 176.8), (9.9, 140.2), (11.0, 148.0),       (18.3, 173.0), (18.9, 172.0), (10.1, 143.1), (13.7, 165.0),       (15.2, 169.9), (12.5, 153.6), (15.9, 178.2), (10.4, 143.7),       (17.2, 173.9), (11.5, 151.1), (12.5, 154.1), (19.2, 178.8),       (8.6, 132.1), (12.3, 153.6), (9.3, 137.2), (13.0, 161.0),       (18.3, 173.8), (15.7, 176.3), (13.0, 161.3), (13.3, 160.0),       (18.8, 174.6), (14.4, 166.6), (14.0, 164.9), (19.9, 173.9),       (8.8, 134.5), (16.3, 171.4), (8.0, 133.0), (12.6, 153.2),       (7.9, 126.4), (7.6, 131.2), (13.4, 161.0), (15.7, 172.7),       (10.7, 144.1), (18.9, 175.7), (15.6, 173.4), (17.6, 175.3),       (17.8, 176.7), (19.0, 173.0), (10.2, 142.1), (10.7, 143.5),       (11.5, 147.2), (8.4, 130.6), (9.6, 139.7), (12.0, 151.4),       (12.1, 147.8), (8.3, 131.0), (9.4, 134.2), (7.3, 123.5),       (13.7, 163.3), (11.2, 145.9), (13.8, 164.2), (19.6, 175.9),       (19.0, 172.2), (14.7, 169.1), (15.8, 173.9), (10.8, 145.0)],       dtype=[('len', '<f8'), ('hei', '<f8')])

(2) 拿掉BOM後, 再次利用 loadtxt讀入

a=np.loadtxt("height.csv", dtype={'names': ('width', 'height'), 'formats': (np.double, np.double)},delimiter=',', skiprows=0) aOut[23]: array([(18.0, 173.4), (7.8, 126.2), (8.5, 131.2), (12.5, 155.0),       (7.4, 126.8), (15.0, 170.4), (7.1, 121.8), (15.2, 169.3),       (19.2, 176.8), (16.6, 175.2), (18.7, 175.9), (13.3, 160.4),       (19.1, 174.6), (15.1, 169.9), (16.7, 173.1), (12.7, 155.8),       (19.3, 175.4), (18.6, 174.4), (11.8, 148.5), (15.5, 172.5),       (17.2, 175.2), (18.3, 175.6), (7.1, 123.1), (18.5, 170.8),       (7.4, 125.0), (7.4, 128.4), (9.8, 140.9), (16.8, 175.8),       (10.0, 142.8), (10.9, 146.3), (9.4, 137.2), (13.5, 163.2),       (15.8, 174.7), (18.4, 174.3), (10.4, 143.6), (12.4, 153.3),       (7.1, 127.2), (16.2, 171.9), (12.2, 156.6), (9.4, 135.4),       (16.6, 172.4), (18.6, 176.8), (9.9, 140.2), (11.0, 148.0),       (18.3, 173.0), (18.9, 172.0), (10.1, 143.1), (13.7, 165.0),       (15.2, 169.9), (12.5, 153.6), (15.9, 178.2), (10.4, 143.7),       (17.2, 173.9), (11.5, 151.1), (12.5, 154.1), (19.2, 178.8),       (8.6, 132.1), (12.3, 153.6), (9.3, 137.2), (13.0, 161.0),       (18.3, 173.8), (15.7, 176.3), (13.0, 161.3), (13.3, 160.0),       (18.8, 174.6), (14.4, 166.6), (14.0, 164.9), (19.9, 173.9),       (8.8, 134.5), (16.3, 171.4), (8.0, 133.0), (12.6, 153.2),       (7.9, 126.4), (7.6, 131.2), (13.4, 161.0), (15.7, 172.7),       (10.7, 144.1), (18.9, 175.7), (15.6, 173.4), (17.6, 175.3),       (17.8, 176.7), (19.0, 173.0), (10.2, 142.1), (10.7, 143.5),       (11.5, 147.2), (8.4, 130.6), (9.6, 139.7), (12.0, 151.4),       (12.1, 147.8), (8.3, 131.0), (9.4, 134.2), (7.3, 123.5),       (13.7, 163.3), (11.2, 145.9), (13.8, 164.2), (19.6, 175.9),       (19.0, 172.2), (14.7, 169.1), (15.8, 173.9), (10.8, 145.0)],       dtype=[('width', '<f8'), ('height', '<f8')])

--------------------------------------------------------

(1) 使用genfromtxt()

b=np.genfromtxt('testData.csv', delimiter=',', dtype=None, names=('sepal length', 'sepal width', 'petal length', 'petal width', 'label')) bOut[31]: array([(5.1, 3.5, 1.4, 0.2, 'Iris-setosa'),       (4.9, 3.0, 1.4, 0.2, 'Iris-setosa'),       (5.8, 2.7, 4.1, 1.0, 'Iris-versicolor'),       (6.2, 2.2, 4.5, 1.5, 'Iris-versicolor'),       (6.4, 3.1, 5.5, 1.8, 'Iris-virginica'),       (6.0, 3.0, 4.8, 1.8, 'Iris-virginica')],       dtype=[('sepal_length', '<f8'), ('sepal_width', '<f8'), ('petal_length', '<f8'), ('petal_width', '<f8'), ('label', 'S15')])

(2) 使用loadtxt()

b=np.loadtxt("testData.csv", dtype={'names': ('sepal length', 'sepal width', 'petal length', 'petal width', 'label'), 'formats': (np.float, np.float, np.float, np.float, '|S15')},delimiter=',', skiprows=0) bOut[35]: array([(5.1, 3.5, 1.4, 0.2, 'Iris-setosa'),       (4.9, 3.0, 1.4, 0.2, 'Iris-setosa'),       (5.8, 2.7, 4.1, 1.0, 'Iris-versicolor'),       (6.2, 2.2, 4.5, 1.5, 'Iris-versicolor'),       (6.4, 3.1, 5.5, 1.8, 'Iris-virginica'),       (6.0, 3.0, 4.8, 1.8, 'Iris-virginica')],       dtype=[('sepal length', '<f8'), ('sepal width', '<f8'), ('petal length', '<f8'), ('petal width', '<f8'), ('label', 'S15')]) b[0]Out[36]: (5.1, 3.5, 1.4, 0.2, 'Iris-setosa') b[0][0]Out[37]: 5.0999999999999996 b[0][4]Out[38]: 'Iris-setosa'

(3) 讀取其中2行資料

 data = np.loadtxt('testData.csv', delimiter=',', usecols=[0, 2]) dataOut[47]: array([[ 5.1,  1.4],       [ 4.9,  1.4],       [ 5.8,  4.1],       [ 6.2,  4.5],       [ 6.4,  5.5],       [ 6. ,  4.8]])

輸出檔案

np.savetxt("testData14.csv", data)

顯然上面儲存的格式並不討喜

 data = np.loadtxt('testData.csv', delimiter=',', usecols=[0, 2]) dataOut[4]: array([[ 5.1,  1.4],       [ 4.9,  1.4],       [ 5.8,  4.1],       [ 6.2,  4.5],       [ 6.4,  5.5],       [ 6. ,  4.8]]) savetxt("twoData.csv", data, delimiter=",", fmt="%.2f, %.2f")

書出結果如下:

參考資料

1. Python科學計算

2. Converting integer to string in Python?

3. Check if a Python list item contains a string inside another string

4. Loading text file containing both float and string using numpy.loadtxt

5. BOM BOM BOM

me1237guy

天天向上

me1237guy 發表在痞客邦留言(0) 人氣()

天天向上

Learning Python N-Dimensional Array - Part III

Learning Python N-Dimensional Array-Part II

接下來, 利用Python 內建函式庫計算求和, 平均值和方差

歷史上的今天

留言列表

站方公告

活動快報

痞客邦...

我的好友

熱門文章

文章分類

最新文章

最新留言

動態訂閱

文章精選

文章搜尋

新聞交換(RSS)

誰來我家

參觀人氣

QR Code

POWERED BY

天天向上

Learning Python N-Dimensional Array - Part III

Learning Python N-Dimensional Array-Part II

接下來, 利用Python 內建函式庫計算求和, 平均值 和方差

歷史上的今天

留言列表

站方公告

活動快報

痞客邦...

我的好友

熱門文章

文章分類

最新文章

最新留言

動態訂閱

文章精選

文章搜尋

新聞交換(RSS)

誰來我家

參觀人氣

QR Code

POWERED BY

接下來, 利用Python 內建函式庫計算求和, 平均值和方差