特徵工程案例--(合併表,交叉表、主成分分析)

目標:特徵降維處理主成分分析APA

方法:

關聯表:user_id---->aisle

交叉表:構造每個用戶購買了哪些物品細分類別的商品及數量

降維處理:主成分分析APA

數據來源:https://www.kaggle.com/c/instacart-market-basket-analysis/data

·order_products_prior.csv:訂單與商品信息
    。字段:order_id,product_id,add_to_cart_order,reordered
    。解釋:訂單id,產品id,加入購物車訂單,再次訂購(不止一次訂購)
·products.csv:商品信息
    。字段:product_id,product_name,aisle_id,department_id
    。解釋:產品id,產品名稱,物品類別id,產品大分類id
·orders.csv:用戶的訂單信息
    。字段:order_id,user_id,eval_set,order_number,order_dow,order_hour_of_day,days_since_prior_order
    。解釋:訂單編號,用戶編號,評價等級,訂單數量,星期幾,當天的購買時段h,距離預定日期的天數
·aisles.csv:商品所屬具體物品類別
    。字段:aisle_id,aisle 
    。解釋:物品細分類別id,物品細分類別名稱
import numpy as np
import pandas as pd
#獲取數據
aisles = pd.read_csv(r"E:\instacart-market-basket-analysis\aisles.csv",sep=",",encoding="utf-8")
orders = pd.read_csv(r"E:\instacart-market-basket-analysis\orders.csv",sep=",",encoding="utf-8")
products = pd.read_csv(r"E:\instacart-market-basket-analysis\products.csv",sep=",",encoding="utf-8")
order_products_prior = pd.read_csv(r"E:\instacart-market-basket-analysis\order_products__prior.csv",sep=",",encoding="utf-8")
#查驗數據
display(aisles.head(3))
display(orders.head(3))
display(products.head(3))
display(order_products_prior.head(3))
aisle_id aisle
0 1 prepared soups salads
1 2 specialty cheeses
2 3 energy granola bars
order_id user_id eval_set order_number order_dow order_hour_of_day days_since_prior_order
0 2539329 1 prior 1 2 8 NaN
1 2398795 1 prior 2 3 7 15.0
2 473747 1 prior 3 3 12 21.0
product_id product_name aisle_id department_id
0 1 Chocolate Sandwich Cookies 61 19
1 2 All-Seasons Salt 104 13
2 3 Robust Golden Unsweetened Oolong Tea 94 7
order_id product_id add_to_cart_order reordered
0 2 33120 1 1
1 2 28985 2 1
2 2 9327 3 0
import time
#關聯表:user_id---->aisle
data01 = pd.merge(orders,order_products_prior,how='inner',on=["order_id","order_id"])
time.sleep(15)
data02 = pd.merge(data01,products,on=["product_id","product_id"])
data03 = pd.merge(data02,aisles,on=["aisle_id","aisle_id"])
time.sleep(3)
display(data03.shape,data03.tail(10000))
(32434489, 14)
order_id user_id eval_set order_number order_dow order_hour_of_day days_since_prior_order product_id add_to_cart_order reordered product_name aisle_id department_id aisle
32424489 2542240 75675 prior 12 5 12 5.0 44471 7 1 Free & Clear Unscented Baby Wipes 82 18 baby accessories
32424490 3260483 75675 prior 16 0 9 14.0 44471 21 1 Free & Clear Unscented Baby Wipes 82 18 baby accessories
32424491 2196407 75675 prior 30 0 11 12.0 44471 9 1 Free & Clear Unscented Baby Wipes 82 18 baby accessories
32424492 532672 75675 prior 38 5 13 7.0 44471 20 1 Free & Clear Unscented Baby Wipes 82 18 baby accessories
32424493 1705047 75675 prior 39 5 13 0.0 44471 20 1 Free & Clear Unscented Baby Wipes 82 18 baby accessories
32424494 998672 75675 prior 48 5 14 11.0 44471 13 1 Free & Clear Unscented Baby Wipes 82 18 baby accessories
32424495 2149746 75675 prior 49 6 9 8.0 44471 6 1 Free & Clear Unscented Baby Wipes 82 18 baby accessories
32424496 483804 75804 prior 12 6 15 4.0 44471 19 0 Free & Clear Unscented Baby Wipes 82 18 baby accessories
32424497 1783191 76027 prior 6 4 16 13.0 44471 13 0 Free & Clear Unscented Baby Wipes 82 18 baby accessories
32424498 3074202 76027 prior 7 2 15 5.0 44471 8 1 Free & Clear Unscented Baby Wipes 82 18 baby accessories
32424499 431155 76081 prior 8 0 14 16.0 44471 8 0 Free & Clear Unscented Baby Wipes 82 18 baby accessories
32424500 2879529 76238 prior 36 6 10 6.0 44471 25 0 Free & Clear Unscented Baby Wipes 82 18 baby accessories
32424501 1652877 76238 prior 39 5 10 6.0 44471 10 1 Free & Clear Unscented Baby Wipes 82 18 baby accessories
32424502 737972 76466 prior 20 0 10 7.0 44471 7 0 Free & Clear Unscented Baby Wipes 82 18 baby accessories
32424503 3154632 76556 prior 80 3 18 2.0 44471 7 0 Free & Clear Unscented Baby Wipes 82 18 baby accessories
32424504 1776861 76576 prior 7 0 15 7.0 44471 2 0 Free & Clear Unscented Baby Wipes 82 18 baby accessories
32424505 2695824 76726 prior 4 0 11 28.0 44471 26 0 Free & Clear Unscented Baby Wipes 82 18 baby accessories
32424506 3176388 76823 prior 1 6 12 NaN 44471 19 0 Free & Clear Unscented Baby Wipes 82 18 baby accessories
32424507 1441764 76866 prior 13 0 16 25.0 44471 7 0 Free & Clear Unscented Baby Wipes 82 18 baby accessories
32424508 2888446 76868 prior 17 5 10 16.0 44471 19 0 Free & Clear Unscented Baby Wipes 82 18 baby accessories
32424509 2670733 77148 prior 19 1 9 12.0 44471 24 0 Free & Clear Unscented Baby Wipes 82 18 baby accessories
32424510 2328300 77187 prior 1 1 9 NaN 44471 1 0 Free & Clear Unscented Baby Wipes 82 18 baby accessories
32424511 1923581 77229 prior 21 3 11 17.0 44471 2 0 Free & Clear Unscented Baby Wipes 82 18 baby accessories
32424512 2042750 77229 prior 24 0 14 12.0 44471 6 1 Free & Clear Unscented Baby Wipes 82 18 baby accessories
32424513 2685754 77238 prior 2 0 9 6.0 44471 5 0 Free & Clear Unscented Baby Wipes 82 18 baby accessories
32424514 1401197 77265 prior 6 1 5 9.0 44471 8 0 Free & Clear Unscented Baby Wipes 82 18 baby accessories
32424515 2917195 77265 prior 10 4 20 5.0 44471 4 1 Free & Clear Unscented Baby Wipes 82 18 baby accessories
32424516 1321674 77265 prior 31 0 10 11.0 44471 2 1 Free & Clear Unscented Baby Wipes 82 18 baby accessories
32424517 1268589 77265 prior 37 1 18 29.0 44471 7 1 Free & Clear Unscented Baby Wipes 82 18 baby accessories
32424518 3044303 77280 prior 23 4 23 1.0 44471 3 0 Free & Clear Unscented Baby Wipes 82 18 baby accessories
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
32434459 814403 161964 prior 10 6 12 5.0 26478 20 0 Frozen Apple Juice 113 1 frozen juice
32434460 503516 175436 prior 4 5 16 13.0 26478 18 0 Frozen Apple Juice 113 1 frozen juice
32434461 385156 183189 prior 4 1 23 22.0 26478 2 0 Frozen Apple Juice 113 1 frozen juice
32434462 471382 85005 prior 7 5 0 13.0 24344 1 0 Frozen Concentrate Non-Alcoholic Pina Colada 113 1 frozen juice
32434463 1833016 92263 prior 5 2 13 8.0 24344 2 0 Frozen Concentrate Non-Alcoholic Pina Colada 113 1 frozen juice
32434464 2624885 136840 prior 2 6 10 4.0 24344 11 0 Frozen Concentrate Non-Alcoholic Pina Colada 113 1 frozen juice
32434465 1604793 136840 prior 6 5 10 3.0 24344 17 1 Frozen Concentrate Non-Alcoholic Pina Colada 113 1 frozen juice
32434466 3154099 136840 prior 16 2 16 3.0 24344 4 1 Frozen Concentrate Non-Alcoholic Pina Colada 113 1 frozen juice
32434467 3135581 151840 prior 70 0 9 1.0 24344 6 0 Frozen Concentrate Non-Alcoholic Pina Colada 113 1 frozen juice
32434468 3297537 181495 prior 2 1 14 15.0 24344 9 0 Frozen Concentrate Non-Alcoholic Pina Colada 113 1 frozen juice
32434469 823196 181495 prior 3 1 14 0.0 24344 1 1 Frozen Concentrate Non-Alcoholic Pina Colada 113 1 frozen juice
32434470 2471510 107801 prior 8 6 15 4.0 5500 19 0 Blended Juice Beverage, Mango Orange 113 1 frozen juice
32434471 2181814 135090 prior 5 3 14 10.0 5500 3 0 Blended Juice Beverage, Mango Orange 113 1 frozen juice
32434472 962734 167413 prior 1 1 12 NaN 5500 9 0 Blended Juice Beverage, Mango Orange 113 1 frozen juice
32434473 2928960 167413 prior 4 0 12 10.0 5500 3 1 Blended Juice Beverage, Mango Orange 113 1 frozen juice
32434474 1393242 167413 prior 5 0 12 7.0 5500 21 1 Blended Juice Beverage, Mango Orange 113 1 frozen juice
32434475 2601337 181750 prior 13 0 20 30.0 5500 2 0 Blended Juice Beverage, Mango Orange 113 1 frozen juice
32434476 2125702 109046 prior 3 3 16 8.0 2642 3 0 Frozen Concentrated Orange Juice With Added Ca... 113 1 frozen juice
32434477 2849065 138824 prior 1 6 13 NaN 2642 20 0 Frozen Concentrated Orange Juice With Added Ca... 113 1 frozen juice
32434478 2634996 138824 prior 6 0 16 28.0 2642 15 1 Frozen Concentrated Orange Juice With Added Ca... 113 1 frozen juice
32434479 1857751 181888 prior 2 0 7 10.0 2642 5 0 Frozen Concentrated Orange Juice With Added Ca... 113 1 frozen juice
32434480 2131276 181888 prior 7 1 11 8.0 2642 6 1 Frozen Concentrated Orange Juice With Added Ca... 113 1 frozen juice
32434481 1466142 181888 prior 9 3 14 16.0 2642 4 1 Frozen Concentrated Orange Juice With Added Ca... 113 1 frozen juice
32434482 1022794 204495 prior 48 0 9 5.0 2642 9 0 Frozen Concentrated Orange Juice With Added Ca... 113 1 frozen juice
32434483 3249444 204495 prior 50 6 14 4.0 2642 8 1 Frozen Concentrated Orange Juice With Added Ca... 113 1 frozen juice
32434484 2231925 204495 prior 51 1 15 9.0 2642 8 1 Frozen Concentrated Orange Juice With Added Ca... 113 1 frozen juice
32434485 327001 204495 prior 53 2 8 7.0 2642 1 1 Frozen Concentrated Orange Juice With Added Ca... 113 1 frozen juice
32434486 1997103 110030 prior 4 2 16 5.0 24189 8 0 Tropical Fruit Smoothie Tasty American Favorites 113 1 frozen juice
32434487 1362143 113181 prior 33 3 17 5.0 24189 12 0 Tropical Fruit Smoothie Tasty American Favorites 113 1 frozen juice
32434488 777464 179210 prior 7 5 15 20.0 24189 16 0 Tropical Fruit Smoothie Tasty American Favorites 113 1 frozen juice

10000 rows × 14 columns

#構造交叉表user_id---->aisle
data04 = pd.crosstab(data03["user_id"],data03["aisle"])
display(data04.shape,data04.head(10))
(206209, 134)
aisle air fresheners candles asian foods baby accessories baby bath body care baby food formula bakery desserts baking ingredients baking supplies decor beauty beers coolers ... spreads tea tofu meat alternatives tortillas flat bread trail mix snack mix trash bags liners vitamins supplements water seltzer sparkling water white wines yogurt
user_id
1 0 0 0 0 0 0 0 0 0 0 ... 1 0 0 0 0 0 0 0 0 1
2 0 3 0 0 0 0 2 0 0 0 ... 3 1 1 0 0 0 0 2 0 42
3 0 0 0 0 0 0 0 0 0 0 ... 4 1 0 0 0 0 0 2 0 0
4 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 1 0 0 0 1 0 0
5 0 2 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 3
6 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 2 0 0 0 ... 0 0 0 0 0 0 0 0 0 5
8 0 1 0 0 0 0 1 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 6 0 2 0 0 0 ... 0 0 0 0 0 0 0 2 0 19
10 0 1 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 2

10 rows × 134 columns

# 主成分分析,保留n.n% 的信息
from sklearn.decomposition import PCA
import pandas as pd
import numpy as np
 
# 1、數據:使用上面代碼生成的data04
data = data04

#2.實例化一個轉換器類
transfer = PCA(n_components=0.9) #實例化一個轉換器類
    # n_components: ·小數:表示保留百分之多少的信息 ·整數:減少到多少特徵
#3.#調用fit_transform()
xi = transfer.fit_transform(data) #調用fit_transform()
#查看構成新的幾個變量,查看單個變量的方差貢獻率
print(xi.shape,transfer.explained_variance_ratio_)  
#4.輸出新構造出來的主成分變量
Fi=[ ]
for i in range(1,xi.shape[1]+1):
    F="F" + str(i)
    Fi.append(F)
data02 = pd.DataFrame(xi,columns=Fi)
display(data02.head(3))
(206209, 27) [0.48237998 0.09585824 0.05185877 0.03590181 0.0293466  0.02393094
 0.01899492 0.0183208  0.01487788 0.0134451  0.01121877 0.01102918
 0.01052171 0.00980307 0.00832174 0.00726185 0.00712991 0.00683061
 0.00640343 0.00580483 0.00534075 0.00487297 0.00477908 0.00462158
 0.00444346 0.00413755 0.00408034]
F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 ... F18 F19 F20 F21 F22 F23 F24 F25 F26 F27
0 -24.215659 2.429427 -2.466370 -0.145686 0.269042 -1.432932 2.140677 -2.738031 -2.714316 -1.743135 ... -3.225987 -4.580076 0.777403 -3.699129 1.907214 2.995386 0.772923 0.686800 1.694394 -2.343230
1 6.463208 36.751116 8.382553 15.097530 -6.920938 -0.978375 6.011567 3.787725 -8.180749 -9.040861 ... -0.737606 -0.737402 0.740042 -0.091338 5.151285 -4.584815 -3.237894 4.121213 2.446897 -4.283485
2 -7.990302 2.404383 -11.030064 0.672230 -0.442368 -2.823272 -6.284140 6.512509 -2.148634 -1.585257 ... 5.434733 -3.604842 4.282794 -0.445834 3.039337 -1.469566 -2.946656 1.775345 -0.444194 0.786666

3 rows × 27 columns

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章