目標:特徵降維處理主成分分析APA
方法:
關聯表:user_id---->aisle
交叉表:構造每個用戶購買了哪些物品細分類別的商品及數量
降維處理:主成分分析APA
數據來源:https://www.kaggle.com/c/instacart-market-basket-analysis/data
·order_products_prior.csv:訂單與商品信息
。字段:order_id,product_id,add_to_cart_order,reordered
。解釋:訂單id,產品id,加入購物車訂單,再次訂購(不止一次訂購)
·products.csv:商品信息
。字段:product_id,product_name,aisle_id,department_id
。解釋:產品id,產品名稱,物品類別id,產品大分類id
·orders.csv:用戶的訂單信息
。字段:order_id,user_id,eval_set,order_number,order_dow,order_hour_of_day,days_since_prior_order
。解釋:訂單編號,用戶編號,評價等級,訂單數量,星期幾,當天的購買時段h,距離預定日期的天數
·aisles.csv:商品所屬具體物品類別
。字段:aisle_id,aisle
。解釋:物品細分類別id,物品細分類別名稱
import numpy as np
import pandas as pd
aisles = pd.read_csv(r"E:\instacart-market-basket-analysis\aisles.csv",sep=",",encoding="utf-8")
orders = pd.read_csv(r"E:\instacart-market-basket-analysis\orders.csv",sep=",",encoding="utf-8")
products = pd.read_csv(r"E:\instacart-market-basket-analysis\products.csv",sep=",",encoding="utf-8")
order_products_prior = pd.read_csv(r"E:\instacart-market-basket-analysis\order_products__prior.csv",sep=",",encoding="utf-8")
display(aisles.head(3))
display(orders.head(3))
display(products.head(3))
display(order_products_prior.head(3))
|
aisle_id |
aisle |
0 |
1 |
prepared soups salads |
1 |
2 |
specialty cheeses |
2 |
3 |
energy granola bars |
|
order_id |
user_id |
eval_set |
order_number |
order_dow |
order_hour_of_day |
days_since_prior_order |
0 |
2539329 |
1 |
prior |
1 |
2 |
8 |
NaN |
1 |
2398795 |
1 |
prior |
2 |
3 |
7 |
15.0 |
2 |
473747 |
1 |
prior |
3 |
3 |
12 |
21.0 |
|
product_id |
product_name |
aisle_id |
department_id |
0 |
1 |
Chocolate Sandwich Cookies |
61 |
19 |
1 |
2 |
All-Seasons Salt |
104 |
13 |
2 |
3 |
Robust Golden Unsweetened Oolong Tea |
94 |
7 |
|
order_id |
product_id |
add_to_cart_order |
reordered |
0 |
2 |
33120 |
1 |
1 |
1 |
2 |
28985 |
2 |
1 |
2 |
2 |
9327 |
3 |
0 |
import time
data01 = pd.merge(orders,order_products_prior,how='inner',on=["order_id","order_id"])
time.sleep(15)
data02 = pd.merge(data01,products,on=["product_id","product_id"])
data03 = pd.merge(data02,aisles,on=["aisle_id","aisle_id"])
time.sleep(3)
display(data03.shape,data03.tail(10000))
(32434489, 14)
|
order_id |
user_id |
eval_set |
order_number |
order_dow |
order_hour_of_day |
days_since_prior_order |
product_id |
add_to_cart_order |
reordered |
product_name |
aisle_id |
department_id |
aisle |
32424489 |
2542240 |
75675 |
prior |
12 |
5 |
12 |
5.0 |
44471 |
7 |
1 |
Free & Clear Unscented Baby Wipes |
82 |
18 |
baby accessories |
32424490 |
3260483 |
75675 |
prior |
16 |
0 |
9 |
14.0 |
44471 |
21 |
1 |
Free & Clear Unscented Baby Wipes |
82 |
18 |
baby accessories |
32424491 |
2196407 |
75675 |
prior |
30 |
0 |
11 |
12.0 |
44471 |
9 |
1 |
Free & Clear Unscented Baby Wipes |
82 |
18 |
baby accessories |
32424492 |
532672 |
75675 |
prior |
38 |
5 |
13 |
7.0 |
44471 |
20 |
1 |
Free & Clear Unscented Baby Wipes |
82 |
18 |
baby accessories |
32424493 |
1705047 |
75675 |
prior |
39 |
5 |
13 |
0.0 |
44471 |
20 |
1 |
Free & Clear Unscented Baby Wipes |
82 |
18 |
baby accessories |
32424494 |
998672 |
75675 |
prior |
48 |
5 |
14 |
11.0 |
44471 |
13 |
1 |
Free & Clear Unscented Baby Wipes |
82 |
18 |
baby accessories |
32424495 |
2149746 |
75675 |
prior |
49 |
6 |
9 |
8.0 |
44471 |
6 |
1 |
Free & Clear Unscented Baby Wipes |
82 |
18 |
baby accessories |
32424496 |
483804 |
75804 |
prior |
12 |
6 |
15 |
4.0 |
44471 |
19 |
0 |
Free & Clear Unscented Baby Wipes |
82 |
18 |
baby accessories |
32424497 |
1783191 |
76027 |
prior |
6 |
4 |
16 |
13.0 |
44471 |
13 |
0 |
Free & Clear Unscented Baby Wipes |
82 |
18 |
baby accessories |
32424498 |
3074202 |
76027 |
prior |
7 |
2 |
15 |
5.0 |
44471 |
8 |
1 |
Free & Clear Unscented Baby Wipes |
82 |
18 |
baby accessories |
32424499 |
431155 |
76081 |
prior |
8 |
0 |
14 |
16.0 |
44471 |
8 |
0 |
Free & Clear Unscented Baby Wipes |
82 |
18 |
baby accessories |
32424500 |
2879529 |
76238 |
prior |
36 |
6 |
10 |
6.0 |
44471 |
25 |
0 |
Free & Clear Unscented Baby Wipes |
82 |
18 |
baby accessories |
32424501 |
1652877 |
76238 |
prior |
39 |
5 |
10 |
6.0 |
44471 |
10 |
1 |
Free & Clear Unscented Baby Wipes |
82 |
18 |
baby accessories |
32424502 |
737972 |
76466 |
prior |
20 |
0 |
10 |
7.0 |
44471 |
7 |
0 |
Free & Clear Unscented Baby Wipes |
82 |
18 |
baby accessories |
32424503 |
3154632 |
76556 |
prior |
80 |
3 |
18 |
2.0 |
44471 |
7 |
0 |
Free & Clear Unscented Baby Wipes |
82 |
18 |
baby accessories |
32424504 |
1776861 |
76576 |
prior |
7 |
0 |
15 |
7.0 |
44471 |
2 |
0 |
Free & Clear Unscented Baby Wipes |
82 |
18 |
baby accessories |
32424505 |
2695824 |
76726 |
prior |
4 |
0 |
11 |
28.0 |
44471 |
26 |
0 |
Free & Clear Unscented Baby Wipes |
82 |
18 |
baby accessories |
32424506 |
3176388 |
76823 |
prior |
1 |
6 |
12 |
NaN |
44471 |
19 |
0 |
Free & Clear Unscented Baby Wipes |
82 |
18 |
baby accessories |
32424507 |
1441764 |
76866 |
prior |
13 |
0 |
16 |
25.0 |
44471 |
7 |
0 |
Free & Clear Unscented Baby Wipes |
82 |
18 |
baby accessories |
32424508 |
2888446 |
76868 |
prior |
17 |
5 |
10 |
16.0 |
44471 |
19 |
0 |
Free & Clear Unscented Baby Wipes |
82 |
18 |
baby accessories |
32424509 |
2670733 |
77148 |
prior |
19 |
1 |
9 |
12.0 |
44471 |
24 |
0 |
Free & Clear Unscented Baby Wipes |
82 |
18 |
baby accessories |
32424510 |
2328300 |
77187 |
prior |
1 |
1 |
9 |
NaN |
44471 |
1 |
0 |
Free & Clear Unscented Baby Wipes |
82 |
18 |
baby accessories |
32424511 |
1923581 |
77229 |
prior |
21 |
3 |
11 |
17.0 |
44471 |
2 |
0 |
Free & Clear Unscented Baby Wipes |
82 |
18 |
baby accessories |
32424512 |
2042750 |
77229 |
prior |
24 |
0 |
14 |
12.0 |
44471 |
6 |
1 |
Free & Clear Unscented Baby Wipes |
82 |
18 |
baby accessories |
32424513 |
2685754 |
77238 |
prior |
2 |
0 |
9 |
6.0 |
44471 |
5 |
0 |
Free & Clear Unscented Baby Wipes |
82 |
18 |
baby accessories |
32424514 |
1401197 |
77265 |
prior |
6 |
1 |
5 |
9.0 |
44471 |
8 |
0 |
Free & Clear Unscented Baby Wipes |
82 |
18 |
baby accessories |
32424515 |
2917195 |
77265 |
prior |
10 |
4 |
20 |
5.0 |
44471 |
4 |
1 |
Free & Clear Unscented Baby Wipes |
82 |
18 |
baby accessories |
32424516 |
1321674 |
77265 |
prior |
31 |
0 |
10 |
11.0 |
44471 |
2 |
1 |
Free & Clear Unscented Baby Wipes |
82 |
18 |
baby accessories |
32424517 |
1268589 |
77265 |
prior |
37 |
1 |
18 |
29.0 |
44471 |
7 |
1 |
Free & Clear Unscented Baby Wipes |
82 |
18 |
baby accessories |
32424518 |
3044303 |
77280 |
prior |
23 |
4 |
23 |
1.0 |
44471 |
3 |
0 |
Free & Clear Unscented Baby Wipes |
82 |
18 |
baby accessories |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
32434459 |
814403 |
161964 |
prior |
10 |
6 |
12 |
5.0 |
26478 |
20 |
0 |
Frozen Apple Juice |
113 |
1 |
frozen juice |
32434460 |
503516 |
175436 |
prior |
4 |
5 |
16 |
13.0 |
26478 |
18 |
0 |
Frozen Apple Juice |
113 |
1 |
frozen juice |
32434461 |
385156 |
183189 |
prior |
4 |
1 |
23 |
22.0 |
26478 |
2 |
0 |
Frozen Apple Juice |
113 |
1 |
frozen juice |
32434462 |
471382 |
85005 |
prior |
7 |
5 |
0 |
13.0 |
24344 |
1 |
0 |
Frozen Concentrate Non-Alcoholic Pina Colada |
113 |
1 |
frozen juice |
32434463 |
1833016 |
92263 |
prior |
5 |
2 |
13 |
8.0 |
24344 |
2 |
0 |
Frozen Concentrate Non-Alcoholic Pina Colada |
113 |
1 |
frozen juice |
32434464 |
2624885 |
136840 |
prior |
2 |
6 |
10 |
4.0 |
24344 |
11 |
0 |
Frozen Concentrate Non-Alcoholic Pina Colada |
113 |
1 |
frozen juice |
32434465 |
1604793 |
136840 |
prior |
6 |
5 |
10 |
3.0 |
24344 |
17 |
1 |
Frozen Concentrate Non-Alcoholic Pina Colada |
113 |
1 |
frozen juice |
32434466 |
3154099 |
136840 |
prior |
16 |
2 |
16 |
3.0 |
24344 |
4 |
1 |
Frozen Concentrate Non-Alcoholic Pina Colada |
113 |
1 |
frozen juice |
32434467 |
3135581 |
151840 |
prior |
70 |
0 |
9 |
1.0 |
24344 |
6 |
0 |
Frozen Concentrate Non-Alcoholic Pina Colada |
113 |
1 |
frozen juice |
32434468 |
3297537 |
181495 |
prior |
2 |
1 |
14 |
15.0 |
24344 |
9 |
0 |
Frozen Concentrate Non-Alcoholic Pina Colada |
113 |
1 |
frozen juice |
32434469 |
823196 |
181495 |
prior |
3 |
1 |
14 |
0.0 |
24344 |
1 |
1 |
Frozen Concentrate Non-Alcoholic Pina Colada |
113 |
1 |
frozen juice |
32434470 |
2471510 |
107801 |
prior |
8 |
6 |
15 |
4.0 |
5500 |
19 |
0 |
Blended Juice Beverage, Mango Orange |
113 |
1 |
frozen juice |
32434471 |
2181814 |
135090 |
prior |
5 |
3 |
14 |
10.0 |
5500 |
3 |
0 |
Blended Juice Beverage, Mango Orange |
113 |
1 |
frozen juice |
32434472 |
962734 |
167413 |
prior |
1 |
1 |
12 |
NaN |
5500 |
9 |
0 |
Blended Juice Beverage, Mango Orange |
113 |
1 |
frozen juice |
32434473 |
2928960 |
167413 |
prior |
4 |
0 |
12 |
10.0 |
5500 |
3 |
1 |
Blended Juice Beverage, Mango Orange |
113 |
1 |
frozen juice |
32434474 |
1393242 |
167413 |
prior |
5 |
0 |
12 |
7.0 |
5500 |
21 |
1 |
Blended Juice Beverage, Mango Orange |
113 |
1 |
frozen juice |
32434475 |
2601337 |
181750 |
prior |
13 |
0 |
20 |
30.0 |
5500 |
2 |
0 |
Blended Juice Beverage, Mango Orange |
113 |
1 |
frozen juice |
32434476 |
2125702 |
109046 |
prior |
3 |
3 |
16 |
8.0 |
2642 |
3 |
0 |
Frozen Concentrated Orange Juice With Added Ca... |
113 |
1 |
frozen juice |
32434477 |
2849065 |
138824 |
prior |
1 |
6 |
13 |
NaN |
2642 |
20 |
0 |
Frozen Concentrated Orange Juice With Added Ca... |
113 |
1 |
frozen juice |
32434478 |
2634996 |
138824 |
prior |
6 |
0 |
16 |
28.0 |
2642 |
15 |
1 |
Frozen Concentrated Orange Juice With Added Ca... |
113 |
1 |
frozen juice |
32434479 |
1857751 |
181888 |
prior |
2 |
0 |
7 |
10.0 |
2642 |
5 |
0 |
Frozen Concentrated Orange Juice With Added Ca... |
113 |
1 |
frozen juice |
32434480 |
2131276 |
181888 |
prior |
7 |
1 |
11 |
8.0 |
2642 |
6 |
1 |
Frozen Concentrated Orange Juice With Added Ca... |
113 |
1 |
frozen juice |
32434481 |
1466142 |
181888 |
prior |
9 |
3 |
14 |
16.0 |
2642 |
4 |
1 |
Frozen Concentrated Orange Juice With Added Ca... |
113 |
1 |
frozen juice |
32434482 |
1022794 |
204495 |
prior |
48 |
0 |
9 |
5.0 |
2642 |
9 |
0 |
Frozen Concentrated Orange Juice With Added Ca... |
113 |
1 |
frozen juice |
32434483 |
3249444 |
204495 |
prior |
50 |
6 |
14 |
4.0 |
2642 |
8 |
1 |
Frozen Concentrated Orange Juice With Added Ca... |
113 |
1 |
frozen juice |
32434484 |
2231925 |
204495 |
prior |
51 |
1 |
15 |
9.0 |
2642 |
8 |
1 |
Frozen Concentrated Orange Juice With Added Ca... |
113 |
1 |
frozen juice |
32434485 |
327001 |
204495 |
prior |
53 |
2 |
8 |
7.0 |
2642 |
1 |
1 |
Frozen Concentrated Orange Juice With Added Ca... |
113 |
1 |
frozen juice |
32434486 |
1997103 |
110030 |
prior |
4 |
2 |
16 |
5.0 |
24189 |
8 |
0 |
Tropical Fruit Smoothie Tasty American Favorites |
113 |
1 |
frozen juice |
32434487 |
1362143 |
113181 |
prior |
33 |
3 |
17 |
5.0 |
24189 |
12 |
0 |
Tropical Fruit Smoothie Tasty American Favorites |
113 |
1 |
frozen juice |
32434488 |
777464 |
179210 |
prior |
7 |
5 |
15 |
20.0 |
24189 |
16 |
0 |
Tropical Fruit Smoothie Tasty American Favorites |
113 |
1 |
frozen juice |
10000 rows × 14 columns
data04 = pd.crosstab(data03["user_id"],data03["aisle"])
display(data04.shape,data04.head(10))
(206209, 134)
aisle |
air fresheners candles |
asian foods |
baby accessories |
baby bath body care |
baby food formula |
bakery desserts |
baking ingredients |
baking supplies decor |
beauty |
beers coolers |
... |
spreads |
tea |
tofu meat alternatives |
tortillas flat bread |
trail mix snack mix |
trash bags liners |
vitamins supplements |
water seltzer sparkling water |
white wines |
yogurt |
user_id |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
... |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
2 |
0 |
3 |
0 |
0 |
0 |
0 |
2 |
0 |
0 |
0 |
... |
3 |
1 |
1 |
0 |
0 |
0 |
0 |
2 |
0 |
42 |
3 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
... |
4 |
1 |
0 |
0 |
0 |
0 |
0 |
2 |
0 |
0 |
4 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
... |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
1 |
0 |
0 |
5 |
0 |
2 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
... |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
3 |
6 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
... |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
7 |
0 |
0 |
0 |
0 |
0 |
0 |
2 |
0 |
0 |
0 |
... |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
5 |
8 |
0 |
1 |
0 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
... |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
9 |
0 |
0 |
0 |
0 |
6 |
0 |
2 |
0 |
0 |
0 |
... |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
2 |
0 |
19 |
10 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
... |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
2 |
10 rows × 134 columns
from sklearn.decomposition import PCA
import pandas as pd
import numpy as np
data = data04
transfer = PCA(n_components=0.9)
xi = transfer.fit_transform(data)
print(xi.shape,transfer.explained_variance_ratio_)
Fi=[ ]
for i in range(1,xi.shape[1]+1):
F="F" + str(i)
Fi.append(F)
data02 = pd.DataFrame(xi,columns=Fi)
display(data02.head(3))
(206209, 27) [0.48237998 0.09585824 0.05185877 0.03590181 0.0293466 0.02393094
0.01899492 0.0183208 0.01487788 0.0134451 0.01121877 0.01102918
0.01052171 0.00980307 0.00832174 0.00726185 0.00712991 0.00683061
0.00640343 0.00580483 0.00534075 0.00487297 0.00477908 0.00462158
0.00444346 0.00413755 0.00408034]
|
F1 |
F2 |
F3 |
F4 |
F5 |
F6 |
F7 |
F8 |
F9 |
F10 |
... |
F18 |
F19 |
F20 |
F21 |
F22 |
F23 |
F24 |
F25 |
F26 |
F27 |
0 |
-24.215659 |
2.429427 |
-2.466370 |
-0.145686 |
0.269042 |
-1.432932 |
2.140677 |
-2.738031 |
-2.714316 |
-1.743135 |
... |
-3.225987 |
-4.580076 |
0.777403 |
-3.699129 |
1.907214 |
2.995386 |
0.772923 |
0.686800 |
1.694394 |
-2.343230 |
1 |
6.463208 |
36.751116 |
8.382553 |
15.097530 |
-6.920938 |
-0.978375 |
6.011567 |
3.787725 |
-8.180749 |
-9.040861 |
... |
-0.737606 |
-0.737402 |
0.740042 |
-0.091338 |
5.151285 |
-4.584815 |
-3.237894 |
4.121213 |
2.446897 |
-4.283485 |
2 |
-7.990302 |
2.404383 |
-11.030064 |
0.672230 |
-0.442368 |
-2.823272 |
-6.284140 |
6.512509 |
-2.148634 |
-1.585257 |
... |
5.434733 |
-3.604842 |
4.282794 |
-0.445834 |
3.039337 |
-1.469566 |
-2.946656 |
1.775345 |
-0.444194 |
0.786666 |
3 rows × 27 columns