close

接續這兩篇

Learning Python N-Dimensional Array–Part I


a = np.random.randint(0, 10, size=(4, 5))
 
a
Out[11]: 
array([[7, 5, 1, 1, 2],
       [8, 8, 2, 8, 2],
       [0, 3, 2, 7, 6],
       [3, 9, 7, 8, 2]])

沿著axis0, axis1進行加總

np.sum(a, axis=0)
Out[13]: array([18, 25, 12, 24, 12])
 
np.sum(a, axis=1)
Out[14]: array([16, 28, 18, 29])

所有元素加總

np.sum(a)
Out[12]: 91

計算a沿著axis=0, 1, 和全部元素的平均值

n [15]: a
Out[15]: 
array([[7, 5, 1, 1, 2],
       [8, 8, 2, 8, 2],
       [0, 3, 2, 7, 6],
       [3, 9, 7, 8, 2]])
 
In [16]: np.mean(a, axis=0)
Out[16]: array([ 4.5 ,  6.25,  3.  ,  6.  ,  3.  ])
 
In [17]: np.mean(a, axis=1)
Out[17]: array([ 3.2,  5.6,  3.6,  5.8])
 
In [18]: np.mean(a)
Out[18]: 4.5499999999999998

計算a沿著axis=0, 1, 和全部元素的方差

 
var(a, axis=0)
Out[19]: array([ 10.25  ,   5.6875,   5.5   ,   8.5   ,   3.    ])
 
var(a, axis=1)
Out[20]: array([ 5.76,  8.64,  6.64,  7.76])
 
var(a)
Out[21]: 8.5474999999999994

計算a沿著axis=0, 1, 和全部元素的標準差

std(a, axis=0)
Out[22]: array([ 3.20156212,  2.384848  ,  2.34520788,  2.91547595,  1.73205081])
 
std(a, axis=1)
Out[23]: array([ 2.4       ,  2.93938769,  2.57681975,  2.78567766])
 
std(a)
Out[24]: 2.9236107812087435

計算a沿著axis=0, 1, 和全部元素的最大值

a
Out[28]: 
array([[7, 5, 1, 1, 2],
       [8, 8, 2, 8, 2],
       [0, 3, 2, 7, 6],
       [3, 9, 7, 8, 2]])
 
np.max(a, axis=0)
Out[29]: array([8, 9, 7, 8, 6])
 
np.max(a, axis=1)
Out[30]: array([7, 8, 7, 9])
 
np.max(a)
Out[31]: 9

最大的索引值

 
np.argmax(a)
Out[32]: 16

承上, 已經知道二微陣列最大值索引, 如何手動取得對應的最大值?

arr 為二維陣列a的全部索引值, 利用布林索引方式,

取出最大索引值對應的數值

 
arr = np.arange(20).reshape((4, 5))
 
arr
Out[54]: 
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])
 
a
Out[57]: 
array([[7, 5, 1, 1, 2],
       [8, 8, 2, 8, 2],
       [0, 3, 2, 7, 6],
       [3, 9, 7, 8, 2]])
 
a[arr==np.argmax(a)]
Out[58]: array([9])
 

更簡單的方式 revel()

Return a flattened array. 將a轉成1維, 並傳入對應1維的最大索引值

a.ravel()[np.argmax(a)]
Out[70]: 9

或是將1維索引轉回2維索引值unravel_index()

其中16為最大索引值(一維)

其對應二維最大索引值為(3, 1), 即3列1行

 
ind = np.unravel_index( 16, a.shape)
 
ind
Out[78]: (3, 1)
 
a
Out[79]: 
array([[7, 5, 1, 1, 2],
       [8, 8, 2, 8, 2],
       [0, 3, 2, 7, 6],
       [3, 9, 7, 8, 2]])

a列方向排序

a
Out[80]: 
array([[7, 5, 1, 1, 2],
       [8, 8, 2, 8, 2],
       [0, 3, 2, 7, 6],
       [3, 9, 7, 8, 2]])
 
a.sort(axis=0)
 
a
Out[86]: 
array([[0, 3, 1, 1, 2],
       [3, 5, 2, 7, 2],
       [7, 8, 2, 8, 2],
       [8, 9, 7, 8, 6]])

a方向排序

 
a.sort(axis=1)
 
a
Out[88]: 
array([[0, 1, 1, 2, 3],
       [2, 2, 3, 5, 7],
       [2, 2, 7, 8, 8],
       [6, 7, 8, 8, 9]])

回傳a陣列unique value 和 index

a = np.random.randint(10, size = (3,4))
 
a
Out[107]: 
array([[6, 7, 9, 2],
       [4, 6, 6, 3],
       [1, 7, 4, 7]])
 
x, ind = np.unique(a, return_index=True)
 
x
Out[111]: array([1, 2, 3, 4, 6, 7, 9])
 
ind
Out[112]: array([8, 3, 7, 4, 0, 1, 2])

a=[1, 1, 2, 2, 2, 3, 4, 4]

 
a=array([1,1, 2, 2, 2, 3,4,4])
 
a
Out[128]: array([1, 1, 2, 2, 2, 3, 4, 4])

計算 unique(a) 計算每個unique元素出現次數

出現次數會以0~4進行出現頻率計算

np.bincount(a)
Out[129]: array([0, 2, 3, 1, 2])
 
np.unique(a)
Out[130]: array([1, 2, 3, 4])

重新定義a

a =[10, 1, 1, 2, 2, 2, 3, 4, 4] 介於0~10

出現次數會以0~10進行出現頻率計算

a=array([10, 1,1, 2, 2, 2, 3,4,4])
 
np.bincount(a)
Out[132]: array([0, 2, 3, 1, 2, 0, 0, 0, 0, 0, 1])

隱藏陣列

import numpy.ma as ma
 
x = np([1, 2, 3, 5, 7, 4, 3, 2, 8, 0])
 
mask = x<5
 
mask
Out[138]: array([ True,  True,  True, False, False,  True,  True,  True, False,  True], dtype=bool)
 
mx = ma.array(x, mask=mask)
 
mx
Out[140]: 
masked_array(data = [-- -- -- 5 7 -- -- -- 8 --],
             mask = [ True  True  True False False  True  True  True False  True],
       fill_value = 999999)

計算mean

y =mx.mean()
 
y
Out[145]: 6.666666666666667

np.mean()也可以求得隱藏陣列的平均值

 
z=np.mean(x[~mask])
 
z
Out[149]: 6.666666666666667

0: 0.1+1.2 = 1.3

1: 0.3+0.5 + 0.8=1.6

2: 0.2+0.4 = 0.6

 
x = np.array([0, 1, 2, 2, 1, 1, 0])
 
w = np.array([0.1, 0.3, 0.2, 0.4, 0.5, 0.8, 1.2])
 
np.bincount(x, w)
Out[5]: array([ 1.3,  1.6,  0.6])

histogram()

0~0.2: 19 筆; 0.2~0.4: 22 筆; 以此類推…

a = np.random.rand(100)
 
a
Out[7]: 
array([ 0.00872626,  0.36558685,  0.68205683,  0.42889437,  0.97332189,
        0.51793388,  0.33187748,  0.39076355,  0.65933135,  0.46039967,
        0.4531602 ,  0.46032272,  0.2162054 ,  0.99834442,  0.99574178,
        0.84728636,  0.14715493,  0.11302403,  0.72354382,  0.42097522,
        0.14111554,  0.37278971,  0.5613764 ,  0.29387561,  0.34060089,
        0.87344041,  0.63322027,  0.52276657,  0.20584798,  0.41653945,
        0.1914504 ,  0.25949296,  0.97079113,  0.42865701,  0.40900406,
        0.99593667,  0.14859718,  0.32781547,  0.86623437,  0.09069545,
        0.58958441,  0.43301911,  0.07623798,  0.55077995,  0.32233891,
        0.22505729,  0.24731831,  0.75467141,  0.86785649,  0.26346466,
        0.47383062,  0.59548231,  0.17756108,  0.4445461 ,  0.09928862,
        0.19127033,  0.32028578,  0.376644  ,  0.43897254,  0.38783224,
        0.60179702,  0.52129171,  0.46613597,  0.54652293,  0.35948433,
        0.21664976,  0.95731711,  0.99504905,  0.59254467,  0.42166526,
        0.20300776,  0.81326924,  0.92572197,  0.8328689 ,  0.01605331,
        0.41221627,  0.97628396,  0.33769637,  0.13246742,  0.64917933,
        0.41369906,  0.0193705 ,  0.58424844,  0.37340307,  0.57927891,
        0.47939027,  0.99614169,  0.13922503,  0.72049764,  0.17861748,
        0.14567132,  0.15476356,  0.17753868,  0.93961619,  0.95160203,
        0.60532905,  0.7173731 ,  0.74216991,  0.41021054,  0.78688659])
 
b = np.histogram(a, bins=5, range=(0, 1))
 
b
Out[9]: (array([19, 22, 29, 12, 18]), array([ 0. ,  0.2,  0.4,  0.6,  0.8,  1. ]))

字串向量[2]

先來看基本的數值轉字串

str(10)
Out[1]: '10'
 
str(10.2)
Out[2]: '10.2'
 
str('a')
Out[3]: 'a'
 
str('a')+str(10)
Out[4]: 'a10'
 
'a'+str(10)
Out[5]: 'a10'

字串轉數值

int('10')
Out[6]: 10

如果浮點字串, 用int()會發生錯誤, 須改用float()

 
int("10.2")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-8-81225a994e2f> in <module>()
----> 1 int("10.2")
 
float('10')
Out[13]: 10.0
 

float和double差異

float("10.2")
Out[9]: 10.2
 
double('10.2')
Out[11]: 10.199999999999999

產生一個1x10字串向量(每個元素為’’)

 
strs = ['' for x in arange(10)]
 
strs
Out[15]: ['', '', '', '', '', '', '', '', '', '']

將0~9數值轉字串並儲存在strs[i], 接著print出來

for i in arange(10):
    strs[i] = str(i)
    print strs[i]
    
0
1
2
3
4
5
6
7
8
9

修改strs[1]內容

strs[1]='abc'
 
strs
Out[30]: ['0', 'abc', '2', '3', '4', '5', '6', '7', '8', '9']

搜尋字串’abc’

 
matching = [ s for s in strs if "abc" in s]
 
matching
Out[37]: ['abc']

接下來, 練習常用到的功能:讀寫檔案

書本的範例[1], 讀取一個2行的資料 height.csv

image

(1) 利用 genfromtxt 讀入

仔細觀察, 你將會發現第一筆資料沒有正確

a=np.genfromtxt("height.csv", delimiter=',', dtype=['double','double'], names=('len', 'hei'))
 
a
Out[6]: 
array([(nan, 173.4), (7.8, 126.2), (8.5, 131.2), (12.5, 155.0),
       (7.4, 126.8), (15.0, 170.4), (7.1, 121.8), (15.2, 169.3),
       (19.2, 176.8), (16.6, 175.2), (18.7, 175.9), (13.3, 160.4),
       (19.1, 174.6), (15.1, 169.9), (16.7, 173.1), (12.7, 155.8),
       (19.3, 175.4), (18.6, 174.4), (11.8, 148.5), (15.5, 172.5),
       (17.2, 175.2), (18.3, 175.6), (7.1, 123.1), (18.5, 170.8),
       (7.4, 125.0), (7.4, 128.4), (9.8, 140.9), (16.8, 175.8),
       (10.0, 142.8), (10.9, 146.3), (9.4, 137.2), (13.5, 163.2),
       (15.8, 174.7), (18.4, 174.3), (10.4, 143.6), (12.4, 153.3),
       (7.1, 127.2), (16.2, 171.9), (12.2, 156.6), (9.4, 135.4),
       (16.6, 172.4), (18.6, 176.8), (9.9, 140.2), (11.0, 148.0),
       (18.3, 173.0), (18.9, 172.0), (10.1, 143.1), (13.7, 165.0),
       (15.2, 169.9), (12.5, 153.6), (15.9, 178.2), (10.4, 143.7),
       (17.2, 173.9), (11.5, 151.1), (12.5, 154.1), (19.2, 178.8),
       (8.6, 132.1), (12.3, 153.6), (9.3, 137.2), (13.0, 161.0),
       (18.3, 173.8), (15.7, 176.3), (13.0, 161.3), (13.3, 160.0),
       (18.8, 174.6), (14.4, 166.6), (14.0, 164.9), (19.9, 173.9),
       (8.8, 134.5), (16.3, 171.4), (8.0, 133.0), (12.6, 153.2),
       (7.9, 126.4), (7.6, 131.2), (13.4, 161.0), (15.7, 172.7),
       (10.7, 144.1), (18.9, 175.7), (15.6, 173.4), (17.6, 175.3),
       (17.8, 176.7), (19.0, 173.0), (10.2, 142.1), (10.7, 143.5),
       (11.5, 147.2), (8.4, 130.6), (9.6, 139.7), (12.0, 151.4),
       (12.1, 147.8), (8.3, 131.0), (9.4, 134.2), (7.3, 123.5),
       (13.7, 163.3), (11.2, 145.9), (13.8, 164.2), (19.6, 175.9),
       (19.0, 172.2), (14.7, 169.1), (15.8, 173.9), (10.8, 145.0)], 
      dtype=[('len', '<f8'), ('hei', '<f8')])

(2) 利用 loadtxt讀入

a=np.loadtxt("height.csv", dtype={'names': ('width', 'height'), 'formats': (np.double, np.double)},delimiter={' ,',' '}, skiprows=0)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-7-fd748fa040a8> in <module>()
----> 1 a=np.loadtxt("height.csv", dtype={'names': ('width', 'height'), 'formats': (np.double, np.double)},delimiter={' ,',' '}, skiprows=0)
 
C:\Python27\lib\site-packages\numpy\lib\npyio.pyc in loadtxt(fname, dtype, comments, delimiter, converters, skiprows, usecols, unpack, ndmin)
    858 
    859             # Convert each value according to its column and store
--> 860             items = [conv(val) for (conv, val) in zip(converters, vals)]
    861             # Then pack it according to the dtype's nesting
    862             items = pack_items(items, packing)
 
ValueError: could not convert string to float: 嚜� 18.0, 173.4

loadtxt更慘, 讀第一筆就掛掉了!!!

無論我把18.0前面的空格刪掉, 或是自己重新key-in, 都還是有問題!

原來是檔案height.csv加了BOM

利用Notepad++編碼, 轉換至UTF-8碼格式(檔首無BOM), 並再次儲存檔案

image

(1) 拿掉BOM後, 再次利用 genfromtxt 讀入

a=np.genfromtxt("height.csv", delimiter=',', dtype=['double','double'], names=('len', 'hei'))
 
a
Out[20]: 
array([(18.0, 173.4), (7.8, 126.2), (8.5, 131.2), (12.5, 155.0),
       (7.4, 126.8), (15.0, 170.4), (7.1, 121.8), (15.2, 169.3),
       (19.2, 176.8), (16.6, 175.2), (18.7, 175.9), (13.3, 160.4),
       (19.1, 174.6), (15.1, 169.9), (16.7, 173.1), (12.7, 155.8),
       (19.3, 175.4), (18.6, 174.4), (11.8, 148.5), (15.5, 172.5),
       (17.2, 175.2), (18.3, 175.6), (7.1, 123.1), (18.5, 170.8),
       (7.4, 125.0), (7.4, 128.4), (9.8, 140.9), (16.8, 175.8),
       (10.0, 142.8), (10.9, 146.3), (9.4, 137.2), (13.5, 163.2),
       (15.8, 174.7), (18.4, 174.3), (10.4, 143.6), (12.4, 153.3),
       (7.1, 127.2), (16.2, 171.9), (12.2, 156.6), (9.4, 135.4),
       (16.6, 172.4), (18.6, 176.8), (9.9, 140.2), (11.0, 148.0),
       (18.3, 173.0), (18.9, 172.0), (10.1, 143.1), (13.7, 165.0),
       (15.2, 169.9), (12.5, 153.6), (15.9, 178.2), (10.4, 143.7),
       (17.2, 173.9), (11.5, 151.1), (12.5, 154.1), (19.2, 178.8),
       (8.6, 132.1), (12.3, 153.6), (9.3, 137.2), (13.0, 161.0),
       (18.3, 173.8), (15.7, 176.3), (13.0, 161.3), (13.3, 160.0),
       (18.8, 174.6), (14.4, 166.6), (14.0, 164.9), (19.9, 173.9),
       (8.8, 134.5), (16.3, 171.4), (8.0, 133.0), (12.6, 153.2),
       (7.9, 126.4), (7.6, 131.2), (13.4, 161.0), (15.7, 172.7),
       (10.7, 144.1), (18.9, 175.7), (15.6, 173.4), (17.6, 175.3),
       (17.8, 176.7), (19.0, 173.0), (10.2, 142.1), (10.7, 143.5),
       (11.5, 147.2), (8.4, 130.6), (9.6, 139.7), (12.0, 151.4),
       (12.1, 147.8), (8.3, 131.0), (9.4, 134.2), (7.3, 123.5),
       (13.7, 163.3), (11.2, 145.9), (13.8, 164.2), (19.6, 175.9),
       (19.0, 172.2), (14.7, 169.1), (15.8, 173.9), (10.8, 145.0)], 
      dtype=[('len', '<f8'), ('hei', '<f8')])

(2) 拿掉BOM後, 再次利用 loadtxt讀入

a=np.loadtxt("height.csv", dtype={'names': ('width', 'height'), 'formats': (np.double, np.double)},delimiter=',', skiprows=0)
 
a
Out[23]: 
array([(18.0, 173.4), (7.8, 126.2), (8.5, 131.2), (12.5, 155.0),
       (7.4, 126.8), (15.0, 170.4), (7.1, 121.8), (15.2, 169.3),
       (19.2, 176.8), (16.6, 175.2), (18.7, 175.9), (13.3, 160.4),
       (19.1, 174.6), (15.1, 169.9), (16.7, 173.1), (12.7, 155.8),
       (19.3, 175.4), (18.6, 174.4), (11.8, 148.5), (15.5, 172.5),
       (17.2, 175.2), (18.3, 175.6), (7.1, 123.1), (18.5, 170.8),
       (7.4, 125.0), (7.4, 128.4), (9.8, 140.9), (16.8, 175.8),
       (10.0, 142.8), (10.9, 146.3), (9.4, 137.2), (13.5, 163.2),
       (15.8, 174.7), (18.4, 174.3), (10.4, 143.6), (12.4, 153.3),
       (7.1, 127.2), (16.2, 171.9), (12.2, 156.6), (9.4, 135.4),
       (16.6, 172.4), (18.6, 176.8), (9.9, 140.2), (11.0, 148.0),
       (18.3, 173.0), (18.9, 172.0), (10.1, 143.1), (13.7, 165.0),
       (15.2, 169.9), (12.5, 153.6), (15.9, 178.2), (10.4, 143.7),
       (17.2, 173.9), (11.5, 151.1), (12.5, 154.1), (19.2, 178.8),
       (8.6, 132.1), (12.3, 153.6), (9.3, 137.2), (13.0, 161.0),
       (18.3, 173.8), (15.7, 176.3), (13.0, 161.3), (13.3, 160.0),
       (18.8, 174.6), (14.4, 166.6), (14.0, 164.9), (19.9, 173.9),
       (8.8, 134.5), (16.3, 171.4), (8.0, 133.0), (12.6, 153.2),
       (7.9, 126.4), (7.6, 131.2), (13.4, 161.0), (15.7, 172.7),
       (10.7, 144.1), (18.9, 175.7), (15.6, 173.4), (17.6, 175.3),
       (17.8, 176.7), (19.0, 173.0), (10.2, 142.1), (10.7, 143.5),
       (11.5, 147.2), (8.4, 130.6), (9.6, 139.7), (12.0, 151.4),
       (12.1, 147.8), (8.3, 131.0), (9.4, 134.2), (7.3, 123.5),
       (13.7, 163.3), (11.2, 145.9), (13.8, 164.2), (19.6, 175.9),
       (19.0, 172.2), (14.7, 169.1), (15.8, 173.9), (10.8, 145.0)], 
      dtype=[('width', '<f8'), ('height', '<f8')])

--------------------------------------------------------

(1) 使用genfromtxt()

b=np.genfromtxt('testData.csv', delimiter=',', dtype=None, names=('sepal length', 'sepal width', 'petal length', 'petal width', 'label'))
 
b
Out[31]: 
array([(5.1, 3.5, 1.4, 0.2, 'Iris-setosa'),
       (4.9, 3.0, 1.4, 0.2, 'Iris-setosa'),
       (5.8, 2.7, 4.1, 1.0, 'Iris-versicolor'),
       (6.2, 2.2, 4.5, 1.5, 'Iris-versicolor'),
       (6.4, 3.1, 5.5, 1.8, 'Iris-virginica'),
       (6.0, 3.0, 4.8, 1.8, 'Iris-virginica')], 
      dtype=[('sepal_length', '<f8'), ('sepal_width', '<f8'), ('petal_length', '<f8'), ('petal_width', '<f8'), ('label', 'S15')])

(2) 使用loadtxt()

b=np.loadtxt("testData.csv", dtype={'names': ('sepal length', 'sepal width', 'petal length', 'petal width', 'label'), 'formats': (np.float, np.float, np.float, np.float, '|S15')},delimiter=',', skiprows=0)
 
b
Out[35]: 
array([(5.1, 3.5, 1.4, 0.2, 'Iris-setosa'),
       (4.9, 3.0, 1.4, 0.2, 'Iris-setosa'),
       (5.8, 2.7, 4.1, 1.0, 'Iris-versicolor'),
       (6.2, 2.2, 4.5, 1.5, 'Iris-versicolor'),
       (6.4, 3.1, 5.5, 1.8, 'Iris-virginica'),
       (6.0, 3.0, 4.8, 1.8, 'Iris-virginica')], 
      dtype=[('sepal length', '<f8'), ('sepal width', '<f8'), ('petal length', '<f8'), ('petal width', '<f8'), ('label', 'S15')])
 
b[0]
Out[36]: (5.1, 3.5, 1.4, 0.2, 'Iris-setosa')
 
b[0][0]
Out[37]: 5.0999999999999996
 
b[0][4]
Out[38]: 'Iris-setosa'

(3) 讀取其中2行資料

 
data = np.loadtxt('testData.csv', delimiter=',', usecols=[0, 2])
 
data
Out[47]: 
array([[ 5.1,  1.4],
       [ 4.9,  1.4],
       [ 5.8,  4.1],
       [ 6.2,  4.5],
       [ 6.4,  5.5],
       [ 6. ,  4.8]])

輸出檔案

np.savetxt("testData14.csv", data)

image

顯然上面儲存的格式並不討喜

 
data = np.loadtxt('testData.csv', delimiter=',', usecols=[0, 2])
 
data
Out[4]: 
array([[ 5.1,  1.4],
       [ 4.9,  1.4],
       [ 5.8,  4.1],
       [ 6.2,  4.5],
       [ 6.4,  5.5],
       [ 6. ,  4.8]])
 
savetxt("twoData.csv", data, delimiter=",", fmt="%.2f, %.2f")

書出結果如下:

image



參考資料

1. Python科學計算

2. Converting integer to string in Python?

3. Check if a Python list item contains a string inside another string

4. Loading text file containing both float and string using numpy.loadtxt

5. BOM BOM BOM

全站熱搜
創作者介紹
創作者 me1237guy 的頭像
me1237guy

天天向上

me1237guy 發表在 痞客邦 留言(0) 人氣()