接續這兩篇
Learning Python N-Dimensional Array–Part I
Learning Python N-Dimensional Array-Part II
接下來, 利用Python 內建函式庫計算求和, 平均值 和方差
a = np.random.randint(0, 10, size=(4, 5))
a
Out[11]:
array([[7, 5, 1, 1, 2],[8, 8, 2, 8, 2],
[0, 3, 2, 7, 6],
[3, 9, 7, 8, 2]])
沿著axis0, axis1進行加總
np.sum(a, axis=0)
Out[13]: array([18, 25, 12, 24, 12])np.sum(a, axis=1)
Out[14]: array([16, 28, 18, 29])所有元素加總
np.sum(a)
Out[12]: 91
計算a沿著axis=0, 1, 和全部元素的平均值
n [15]: a
Out[15]:
array([[7, 5, 1, 1, 2],[8, 8, 2, 8, 2],
[0, 3, 2, 7, 6],
[3, 9, 7, 8, 2]])
In [16]: np.mean(a, axis=0)
Out[16]: array([ 4.5 , 6.25, 3. , 6. , 3. ])In [17]: np.mean(a, axis=1)
Out[17]: array([ 3.2, 5.6, 3.6, 5.8])In [18]: np.mean(a)
Out[18]: 4.5499999999999998
計算a沿著axis=0, 1, 和全部元素的方差
var(a, axis=0)Out[19]: array([ 10.25 , 5.6875, 5.5 , 8.5 , 3. ])var(a, axis=1)Out[20]: array([ 5.76, 8.64, 6.64, 7.76])var(a)Out[21]: 8.5474999999999994
計算a沿著axis=0, 1, 和全部元素的標準差
std(a, axis=0)
Out[22]: array([ 3.20156212, 2.384848 , 2.34520788, 2.91547595, 1.73205081])std(a, axis=1)
Out[23]: array([ 2.4 , 2.93938769, 2.57681975, 2.78567766])std(a)
Out[24]: 2.9236107812087435
計算a沿著axis=0, 1, 和全部元素的最大值
a
Out[28]:
array([[7, 5, 1, 1, 2],[8, 8, 2, 8, 2],
[0, 3, 2, 7, 6],
[3, 9, 7, 8, 2]])
np.max(a, axis=0)
Out[29]: array([8, 9, 7, 8, 6])np.max(a, axis=1)
Out[30]: array([7, 8, 7, 9])np.max(a)
Out[31]: 9
最大的索引值
np.argmax(a)
Out[32]: 16
承上, 已經知道二微陣列最大值索引, 如何手動取得對應的最大值?
arr 為二維陣列a的全部索引值, 利用布林索引方式,
取出最大索引值對應的數值
arr = np.arange(20).reshape((4, 5))
arr
Out[54]:
array([[ 0, 1, 2, 3, 4],[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]])
a
Out[57]:
array([[7, 5, 1, 1, 2],[8, 8, 2, 8, 2],
[0, 3, 2, 7, 6],
[3, 9, 7, 8, 2]])
a[arr==np.argmax(a)]
Out[58]: array([9])更簡單的方式 revel()
Return a flattened array. 將a轉成1維, 並傳入對應1維的最大索引值
a.ravel()[np.argmax(a)]
Out[70]: 9
或是將1維索引轉回2維索引值unravel_index()
其中16為最大索引值(一維)
其對應二維最大索引值為(3, 1), 即3列1行
ind = np.unravel_index( 16, a.shape)
ind
Out[78]: (3, 1)
a
Out[79]:
array([[7, 5, 1, 1, 2],[8, 8, 2, 8, 2],
[0, 3, 2, 7, 6],
[3, 9, 7, 8, 2]])
a列方向排序
a
Out[80]:
array([[7, 5, 1, 1, 2],[8, 8, 2, 8, 2],
[0, 3, 2, 7, 6],
[3, 9, 7, 8, 2]])
a.sort(axis=0)
a
Out[86]:
array([[0, 3, 1, 1, 2],[3, 5, 2, 7, 2],
[7, 8, 2, 8, 2],
[8, 9, 7, 8, 6]])
a行方向排序
a.sort(axis=1)
a
Out[88]:
array([[0, 1, 1, 2, 3],[2, 2, 3, 5, 7],
[2, 2, 7, 8, 8],
[6, 7, 8, 8, 9]])
回傳a陣列unique value 和 index
a = np.random.randint(10, size = (3,4))
a
Out[107]:
array([[6, 7, 9, 2],[4, 6, 6, 3],
[1, 7, 4, 7]])
x, ind = np.unique(a, return_index=True)
x
Out[111]: array([1, 2, 3, 4, 6, 7, 9])ind
Out[112]: array([8, 3, 7, 4, 0, 1, 2])a=[1, 1, 2, 2, 2, 3, 4, 4]
a=array([1,1, 2, 2, 2, 3,4,4])a
Out[128]: array([1, 1, 2, 2, 2, 3, 4, 4])計算 unique(a) 計算每個unique元素出現次數
出現次數會以0~4進行出現頻率計算
np.bincount(a)
Out[129]: array([0, 2, 3, 1, 2])np.unique(a)
Out[130]: array([1, 2, 3, 4])重新定義a
a =[10, 1, 1, 2, 2, 2, 3, 4, 4] 介於0~10
出現次數會以0~10進行出現頻率計算
a=array([10, 1,1, 2, 2, 2, 3,4,4])np.bincount(a)
Out[132]: array([0, 2, 3, 1, 2, 0, 0, 0, 0, 0, 1])隱藏陣列
import numpy.ma as max = np([1, 2, 3, 5, 7, 4, 3, 2, 8, 0])
mask = x<5
mask
Out[138]: array([ True, True, True, False, False, True, True, True, False, True], dtype=bool)
mx = ma.array(x, mask=mask)mx
Out[140]:
masked_array(data = [-- -- -- 5 7 -- -- -- 8 --],
mask = [ True True True False False True True True False True],
fill_value = 999999)
計算mean
y =mx.mean()
y
Out[145]: 6.666666666666667
np.mean()也可以求得隱藏陣列的平均值
z=np.mean(x[~mask])
z
Out[149]: 6.666666666666667
0: 0.1+1.2 = 1.3
1: 0.3+0.5 + 0.8=1.6
2: 0.2+0.4 = 0.6
x = np.array([0, 1, 2, 2, 1, 1, 0])w = np.array([0.1, 0.3, 0.2, 0.4, 0.5, 0.8, 1.2])np.bincount(x, w)
Out[5]: array([ 1.3, 1.6, 0.6])histogram()
0~0.2: 19 筆; 0.2~0.4: 22 筆; 以此類推…
a = np.random.rand(100)
a
Out[7]:
array([ 0.00872626, 0.36558685, 0.68205683, 0.42889437, 0.97332189,0.51793388, 0.33187748, 0.39076355, 0.65933135, 0.46039967,
0.4531602 , 0.46032272, 0.2162054 , 0.99834442, 0.99574178,
0.84728636, 0.14715493, 0.11302403, 0.72354382, 0.42097522,
0.14111554, 0.37278971, 0.5613764 , 0.29387561, 0.34060089,
0.87344041, 0.63322027, 0.52276657, 0.20584798, 0.41653945,
0.1914504 , 0.25949296, 0.97079113, 0.42865701, 0.40900406,
0.99593667, 0.14859718, 0.32781547, 0.86623437, 0.09069545,
0.58958441, 0.43301911, 0.07623798, 0.55077995, 0.32233891,
0.22505729, 0.24731831, 0.75467141, 0.86785649, 0.26346466,
0.47383062, 0.59548231, 0.17756108, 0.4445461 , 0.09928862,
0.19127033, 0.32028578, 0.376644 , 0.43897254, 0.38783224,
0.60179702, 0.52129171, 0.46613597, 0.54652293, 0.35948433,
0.21664976, 0.95731711, 0.99504905, 0.59254467, 0.42166526,
0.20300776, 0.81326924, 0.92572197, 0.8328689 , 0.01605331,
0.41221627, 0.97628396, 0.33769637, 0.13246742, 0.64917933,
0.41369906, 0.0193705 , 0.58424844, 0.37340307, 0.57927891,
0.47939027, 0.99614169, 0.13922503, 0.72049764, 0.17861748,
0.14567132, 0.15476356, 0.17753868, 0.93961619, 0.95160203,
0.60532905, 0.7173731 , 0.74216991, 0.41021054, 0.78688659])
b = np.histogram(a, bins=5, range=(0, 1))
b
Out[9]: (array([19, 22, 29, 12, 18]), array([ 0. , 0.2, 0.4, 0.6, 0.8, 1. ]))
字串向量[2]
先來看基本的數值轉字串
str(10)
Out[1]: '10'str(10.2)
Out[2]: '10.2'str('a')Out[3]: 'a'str('a')+str(10)Out[4]: 'a10''a'+str(10)Out[5]: 'a10'字串轉數值
int('10')
Out[6]: 10
如果浮點字串, 用int()會發生錯誤, 須改用float()
int("10.2")
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-8-81225a994e2f> in <module>()
----> 1 int("10.2")
float('10')
Out[13]: 10.0
float和double差異
float("10.2")
Out[9]: 10.2
double('10.2')
Out[11]: 10.199999999999999
產生一個1x10字串向量(每個元素為’’)
strs = ['' for x in arange(10)]
strs
Out[15]: ['', '', '', '', '', '', '', '', '', '']
將0~9數值轉字串並儲存在strs[i], 接著print出來
for i in arange(10):strs[i] = str(i)
print strs[i]0
1
2
3
4
5
6
7
8
9
修改strs[1]內容
strs[1]='abc'strs
Out[30]: ['0', 'abc', '2', '3', '4', '5', '6', '7', '8', '9']
搜尋字串’abc’
matching = [ s for s in strs if "abc" in s]
matching
Out[37]: ['abc']接下來, 練習常用到的功能:讀寫檔案
書本的範例[1], 讀取一個2行的資料 height.csv
(1) 利用 genfromtxt 讀入
仔細觀察, 你將會發現第一筆資料沒有正確
a=np.genfromtxt("height.csv", delimiter=',', dtype=['double','double'], names=('len', 'hei'))
a
Out[6]:
array([(nan, 173.4), (7.8, 126.2), (8.5, 131.2), (12.5, 155.0),(7.4, 126.8), (15.0, 170.4), (7.1, 121.8), (15.2, 169.3),
(19.2, 176.8), (16.6, 175.2), (18.7, 175.9), (13.3, 160.4),
(19.1, 174.6), (15.1, 169.9), (16.7, 173.1), (12.7, 155.8),
(19.3, 175.4), (18.6, 174.4), (11.8, 148.5), (15.5, 172.5),
(17.2, 175.2), (18.3, 175.6), (7.1, 123.1), (18.5, 170.8),
(7.4, 125.0), (7.4, 128.4), (9.8, 140.9), (16.8, 175.8),
(10.0, 142.8), (10.9, 146.3), (9.4, 137.2), (13.5, 163.2),
(15.8, 174.7), (18.4, 174.3), (10.4, 143.6), (12.4, 153.3),
(7.1, 127.2), (16.2, 171.9), (12.2, 156.6), (9.4, 135.4),
(16.6, 172.4), (18.6, 176.8), (9.9, 140.2), (11.0, 148.0),
(18.3, 173.0), (18.9, 172.0), (10.1, 143.1), (13.7, 165.0),
(15.2, 169.9), (12.5, 153.6), (15.9, 178.2), (10.4, 143.7),
(17.2, 173.9), (11.5, 151.1), (12.5, 154.1), (19.2, 178.8),
(8.6, 132.1), (12.3, 153.6), (9.3, 137.2), (13.0, 161.0),
(18.3, 173.8), (15.7, 176.3), (13.0, 161.3), (13.3, 160.0),
(18.8, 174.6), (14.4, 166.6), (14.0, 164.9), (19.9, 173.9),
(8.8, 134.5), (16.3, 171.4), (8.0, 133.0), (12.6, 153.2),
(7.9, 126.4), (7.6, 131.2), (13.4, 161.0), (15.7, 172.7),
(10.7, 144.1), (18.9, 175.7), (15.6, 173.4), (17.6, 175.3),
(17.8, 176.7), (19.0, 173.0), (10.2, 142.1), (10.7, 143.5),
(11.5, 147.2), (8.4, 130.6), (9.6, 139.7), (12.0, 151.4),
(12.1, 147.8), (8.3, 131.0), (9.4, 134.2), (7.3, 123.5),
(13.7, 163.3), (11.2, 145.9), (13.8, 164.2), (19.6, 175.9),
(19.0, 172.2), (14.7, 169.1), (15.8, 173.9), (10.8, 145.0)],
dtype=[('len', '<f8'), ('hei', '<f8')])
(2) 利用 loadtxt讀入
a=np.loadtxt("height.csv", dtype={'names': ('width', 'height'), 'formats': (np.double, np.double)},delimiter={' ,',' '}, skiprows=0)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-7-fd748fa040a8> in <module>()
----> 1 a=np.loadtxt("height.csv", dtype={'names': ('width', 'height'), 'formats': (np.double, np.double)},delimiter={' ,',' '}, skiprows=0)
C:\Python27\lib\site-packages\numpy\lib\npyio.pyc in loadtxt(fname, dtype, comments, delimiter, converters, skiprows, usecols, unpack, ndmin)
858
859 # Convert each value according to its column and store--> 860 items = [conv(val) for (conv, val) in zip(converters, vals)] 861 # Then pack it according to the dtype's nesting862 items = pack_items(items, packing)
ValueError: could not convert string to float: 嚜� 18.0, 173.4
loadtxt更慘, 讀第一筆就掛掉了!!!
無論我把18.0前面的空格刪掉, 或是自己重新key-in, 都還是有問題!
原來是檔案height.csv加了BOM
利用Notepad++編碼, 轉換至UTF-8碼格式(檔首無BOM), 並再次儲存檔案
(1) 拿掉BOM後, 再次利用 genfromtxt 讀入
a=np.genfromtxt("height.csv", delimiter=',', dtype=['double','double'], names=('len', 'hei'))
a
Out[20]:
array([(18.0, 173.4), (7.8, 126.2), (8.5, 131.2), (12.5, 155.0),(7.4, 126.8), (15.0, 170.4), (7.1, 121.8), (15.2, 169.3),
(19.2, 176.8), (16.6, 175.2), (18.7, 175.9), (13.3, 160.4),
(19.1, 174.6), (15.1, 169.9), (16.7, 173.1), (12.7, 155.8),
(19.3, 175.4), (18.6, 174.4), (11.8, 148.5), (15.5, 172.5),
(17.2, 175.2), (18.3, 175.6), (7.1, 123.1), (18.5, 170.8),
(7.4, 125.0), (7.4, 128.4), (9.8, 140.9), (16.8, 175.8),
(10.0, 142.8), (10.9, 146.3), (9.4, 137.2), (13.5, 163.2),
(15.8, 174.7), (18.4, 174.3), (10.4, 143.6), (12.4, 153.3),
(7.1, 127.2), (16.2, 171.9), (12.2, 156.6), (9.4, 135.4),
(16.6, 172.4), (18.6, 176.8), (9.9, 140.2), (11.0, 148.0),
(18.3, 173.0), (18.9, 172.0), (10.1, 143.1), (13.7, 165.0),
(15.2, 169.9), (12.5, 153.6), (15.9, 178.2), (10.4, 143.7),
(17.2, 173.9), (11.5, 151.1), (12.5, 154.1), (19.2, 178.8),
(8.6, 132.1), (12.3, 153.6), (9.3, 137.2), (13.0, 161.0),
(18.3, 173.8), (15.7, 176.3), (13.0, 161.3), (13.3, 160.0),
(18.8, 174.6), (14.4, 166.6), (14.0, 164.9), (19.9, 173.9),
(8.8, 134.5), (16.3, 171.4), (8.0, 133.0), (12.6, 153.2),
(7.9, 126.4), (7.6, 131.2), (13.4, 161.0), (15.7, 172.7),
(10.7, 144.1), (18.9, 175.7), (15.6, 173.4), (17.6, 175.3),
(17.8, 176.7), (19.0, 173.0), (10.2, 142.1), (10.7, 143.5),
(11.5, 147.2), (8.4, 130.6), (9.6, 139.7), (12.0, 151.4),
(12.1, 147.8), (8.3, 131.0), (9.4, 134.2), (7.3, 123.5),
(13.7, 163.3), (11.2, 145.9), (13.8, 164.2), (19.6, 175.9),
(19.0, 172.2), (14.7, 169.1), (15.8, 173.9), (10.8, 145.0)],
dtype=[('len', '<f8'), ('hei', '<f8')])
(2) 拿掉BOM後, 再次利用 loadtxt讀入
a=np.loadtxt("height.csv", dtype={'names': ('width', 'height'), 'formats': (np.double, np.double)},delimiter=',', skiprows=0)
a
Out[23]:
array([(18.0, 173.4), (7.8, 126.2), (8.5, 131.2), (12.5, 155.0),(7.4, 126.8), (15.0, 170.4), (7.1, 121.8), (15.2, 169.3),
(19.2, 176.8), (16.6, 175.2), (18.7, 175.9), (13.3, 160.4),
(19.1, 174.6), (15.1, 169.9), (16.7, 173.1), (12.7, 155.8),
(19.3, 175.4), (18.6, 174.4), (11.8, 148.5), (15.5, 172.5),
(17.2, 175.2), (18.3, 175.6), (7.1, 123.1), (18.5, 170.8),
(7.4, 125.0), (7.4, 128.4), (9.8, 140.9), (16.8, 175.8),
(10.0, 142.8), (10.9, 146.3), (9.4, 137.2), (13.5, 163.2),
(15.8, 174.7), (18.4, 174.3), (10.4, 143.6), (12.4, 153.3),
(7.1, 127.2), (16.2, 171.9), (12.2, 156.6), (9.4, 135.4),
(16.6, 172.4), (18.6, 176.8), (9.9, 140.2), (11.0, 148.0),
(18.3, 173.0), (18.9, 172.0), (10.1, 143.1), (13.7, 165.0),
(15.2, 169.9), (12.5, 153.6), (15.9, 178.2), (10.4, 143.7),
(17.2, 173.9), (11.5, 151.1), (12.5, 154.1), (19.2, 178.8),
(8.6, 132.1), (12.3, 153.6), (9.3, 137.2), (13.0, 161.0),
(18.3, 173.8), (15.7, 176.3), (13.0, 161.3), (13.3, 160.0),
(18.8, 174.6), (14.4, 166.6), (14.0, 164.9), (19.9, 173.9),
(8.8, 134.5), (16.3, 171.4), (8.0, 133.0), (12.6, 153.2),
(7.9, 126.4), (7.6, 131.2), (13.4, 161.0), (15.7, 172.7),
(10.7, 144.1), (18.9, 175.7), (15.6, 173.4), (17.6, 175.3),
(17.8, 176.7), (19.0, 173.0), (10.2, 142.1), (10.7, 143.5),
(11.5, 147.2), (8.4, 130.6), (9.6, 139.7), (12.0, 151.4),
(12.1, 147.8), (8.3, 131.0), (9.4, 134.2), (7.3, 123.5),
(13.7, 163.3), (11.2, 145.9), (13.8, 164.2), (19.6, 175.9),
(19.0, 172.2), (14.7, 169.1), (15.8, 173.9), (10.8, 145.0)],
dtype=[('width', '<f8'), ('height', '<f8')])
--------------------------------------------------------
(1) 使用genfromtxt()
b=np.genfromtxt('testData.csv', delimiter=',', dtype=None, names=('sepal length', 'sepal width', 'petal length', 'petal width', 'label'))
b
Out[31]:
array([(5.1, 3.5, 1.4, 0.2, 'Iris-setosa'),
(4.9, 3.0, 1.4, 0.2, 'Iris-setosa'), (5.8, 2.7, 4.1, 1.0, 'Iris-versicolor'), (6.2, 2.2, 4.5, 1.5, 'Iris-versicolor'), (6.4, 3.1, 5.5, 1.8, 'Iris-virginica'), (6.0, 3.0, 4.8, 1.8, 'Iris-virginica')], dtype=[('sepal_length', '<f8'), ('sepal_width', '<f8'), ('petal_length', '<f8'), ('petal_width', '<f8'), ('label', 'S15')])
(2) 使用loadtxt()
b=np.loadtxt("testData.csv", dtype={'names': ('sepal length', 'sepal width', 'petal length', 'petal width', 'label'), 'formats': (np.float, np.float, np.float, np.float, '|S15')},delimiter=',', skiprows=0)
b
Out[35]:
array([(5.1, 3.5, 1.4, 0.2, 'Iris-setosa'),
(4.9, 3.0, 1.4, 0.2, 'Iris-setosa'), (5.8, 2.7, 4.1, 1.0, 'Iris-versicolor'), (6.2, 2.2, 4.5, 1.5, 'Iris-versicolor'), (6.4, 3.1, 5.5, 1.8, 'Iris-virginica'), (6.0, 3.0, 4.8, 1.8, 'Iris-virginica')], dtype=[('sepal length', '<f8'), ('sepal width', '<f8'), ('petal length', '<f8'), ('petal width', '<f8'), ('label', 'S15')])
b[0]
Out[36]: (5.1, 3.5, 1.4, 0.2, 'Iris-setosa')b[0][0]
Out[37]: 5.0999999999999996
b[0][4]
Out[38]: 'Iris-setosa'(3) 讀取其中2行資料
data = np.loadtxt('testData.csv', delimiter=',', usecols=[0, 2])
data
Out[47]:
array([[ 5.1, 1.4],[ 4.9, 1.4],
[ 5.8, 4.1],
[ 6.2, 4.5],
[ 6.4, 5.5],
[ 6. , 4.8]])
輸出檔案
np.savetxt("testData14.csv", data)顯然上面儲存的格式並不討喜
data = np.loadtxt('testData.csv', delimiter=',', usecols=[0, 2])
data
Out[4]:
array([[ 5.1, 1.4],[ 4.9, 1.4],
[ 5.8, 4.1],
[ 6.2, 4.5],
[ 6.4, 5.5],
[ 6. , 4.8]])
savetxt("twoData.csv", data, delimiter=",", fmt="%.2f, %.2f")
書出結果如下:
參考資料
1. Python科學計算
2. Converting integer to string in Python?
3. Check if a Python list item contains a string inside another string
4. Loading text file containing both float and string using numpy.loadtxt
5. BOM BOM BOM




留言列表
