使用說明

1.cd /opt/zy
在這個目錄下以root用戶權限執行命令
2.
在SAP查詢的時候
Tcode:ZMMR0005
Purchase Org *
PO Creating：2017/3/1 (開始日期） 2017/6/31(結束日期）
Vendor
1000341
plant *

這樣查詢處理的結果代表發貨日期在20170301-20170631的所有記錄，不管到達日期在那個月

從SAP導出數據表格，存爲txt形式以”\t”分隔
用rz命令把導出的文件上傳到/opt/zy目錄下，
3.執行命令注意參數必須嚴格符合XXXXXXXXtoYYYYYYYY的格式,代表startdate to enddate
example:
[root@slave1 zy]# bash try2.sh 20170301to20170632
4.去Hue裏查詢分析結果
SELECT * from saplifttime WHERE querypocredatestart=’XXXXXXXX’[and querypocredateend=’YYYYYYYY’];

5.如果想看原數據，去pcg.sap表，命令如下:
SELECT * from sap WHERE querypocredatestart=’20170301’;

運行結果截圖：

技術實現說明

用shell 腳本調用python腳本

shell 腳本 try2.sh

#!/bin/sh
#echo $1
daterange=$1#賦值給daterange這個變量是因爲後面截取字符串要用到，否則我不會寫
python3 /opt/zy/runtask.py $1 #運行python腳本
startdate=${daterange:0:8}   #截取查詢的開始日期
#echo $startdate
enddate=${daterange:10:18}   #截取查詢的結束日期
#echo $enddate
sed -i '1,3d' /opt/zy/$1.txt   #刪除前三行，因爲前三行是空行
sed 's/.\{1\}//' $1.txt>$1regular.txt #刪除第一列，因爲第一列是空列
hdfs dfs -put -f /opt/zy/$1regular.txt /user/hive/pcg-data/zhouyi6_files #把服務器上的本地文件上傳到hadoop集羣上
hive -e "LOAD DATA INPATH '/user/hive/pcg-data/zhouyi6_files/$1regular.txt' INTO TABLE pcg.sap partition(querypocredatestart=$startdate,querypocredateend=$enddate)" #把文件的數據載入表 
rm $1.txt #刪除本地原文件，只保留格式處理後的文件

備註：
1.因爲sed命令不修改文件本身，所以要把修改後的結果存入新文件 +regular後綴的
2.sed -i，-i代表不把刪除前三行後的結果顯示在命令行上
3.hdfs dfs -put -f
-f option will overwrite the destination if it already exists.
4.運行這個腳本的前提是已經創建了pcg.sap表，建表語句如下：

CREATE TABLE SAP(`PO Cre Date` string,
`Vendor` string, 
`WW Partner` string, 
`Name of Vendor` string,
`PO Cre by` string, 
`Purch Doc Type` string,
`Purch Order` string,
`PO Item` string,
`Deletion Indicator in PO Item` string, 
`Request Shipment Day` string,
`Material` string,
`Short Text` string, 
`Plant` string, 
`Issuing Stor location` string,
`Receive Stor loaction` string, 
`PO item change date` string, 
`Delivery Priority` string,
`PO Qty` string,
`Total GR Qty` string,
`Still to be delivered` string,
`Delivery Note` string,
`Delivery Note Type (ASN or DN)` string, 
`Delivery Note item` string,
`Delivery Note qty` string, 
`Delivery Note Creation Date` string,
`Delivery Note ACK Date` string, 
`Incoterm` string, 
`Part Battery Indicator` string,
`BOL/AWBill` string, 
`Purchase order type` string, 
`Gr Date`string) 
partitioned by (`queryPoCreDateStart` string,`queryPoCreDateEnd` string)
row format delimited fields terminated by "\t" stored as textfile

python腳本

import  pandas as pd
import  sys
data = pd.read_csv(sys.argv[1]+".txt", sep="\t")
#print(data.columns)
data['Delivery Note Creation Date']=pd.to_datetime(data['Delivery Note Creation Date'],format='%d.%m.%Y')
data['Gr Date']=pd.to_datetime(data['Gr Date'],format='%d.%m.%Y')
data=data.drop(data[data['Delivery Note Creation Date'].isnull()].index.tolist())#刪除某列爲空值所在的行
data=data.drop(data[data['Gr Date'].isnull()].index.tolist())#刪除某列爲空值所在的行
data['delta']=(data['Gr Date']-data['Delivery Note Creation Date']).apply(lambda  x:x.days)#相差的時間
print(data['delta'].describe())
#sql_content="insert into table saplifttime values(%,%s,%s,%s,%s,%s,%s,%s,%s,%s)"%\
import hdfs
from impala.dbapi import connect
filename=sys.argv[1]+".txt"
hdfspath='/user/hive/pcg-data/zhouyi6_files'
client=hdfs.Client("http://10.100.208.222:50070")#50070
#8888是我登錄WEB 操作界面時候的接口
#print(client.status("/user/zhouyi",strict=True))#查看路徑信息
#print(client.list("/user/zhouyi"))#查看文件夾下的文件
#client.upload(hdfs_path=hdfspath,local_path="/opt/zy/"+filename,overwrite=True)
# overwrtie=True means Delete any uploaded files if an error occurs during theupload.
conn = connect(host='10.100.208.222', port=21050,database='pcg')
cur = conn.cursor()
stdate,edate=sys.argv[1].split("to")
#print(sys.argv[1])
cnt=str(data['delta'].describe()[0])
mean=str(data['delta'].describe()[1])
std=str(data['delta'].describe()[2])
mini=str(data['delta'].describe()[3])
twentyfive=str(data['delta'].describe()[4])
fifty=str(data['delta'].describe()[5])
seventyfive=str(data['delta'].describe()[6])
maxm=str(data['delta'].describe()[7])
args=[stdate,edate,cnt,mean,std,mini,twentyfive,fifty,seventyfive,maxm]
print(args)

#對的SQL
#sql_content="insert into table saplifttime values("+str(5555)+",'20200607','22','4.2','9.88','1','2','5','10','9999999999999')"
sql_content="insert into table saplifttime values(?,?,?,?,?,?,?,?,?,?)"
cur.execute(sql_content,args)#把運算結果插到表pcg.saplifttime裏

備註：
1.執行cur.execute的前提是已經建好pcg.saplifttime的表，建表語句如下：

CREATE TABLE SAPLifttime(querypocredatestart STRING,querypocredateend STRING,cnt STRING,mean STRING,std STRING,minimum STRING,25percent STRING,50percent STRING,75percent STRING,maxmum STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t" STORED AS Textfile

2.計算邏輯：
第一步：把字段“Delivery Note Creation Date”視爲貨物發出日期，如果爲空則刪除該行
第二步：把字段“Gr Date”視爲貨物到達日期，如果爲空則刪除該行
第三步：貨物在途時間= Gr Date - Delivery Note Creation Date
第四步：對貨物在途時間求cnt,mean,std,minimum,25%,50%,75%,maxmum

踩過的坑：
1.我的表字段都是STRING類型，values的佔位符問題，我一開始試過%s,%d,總與python裏對應的值格式不匹配。後來用?佔位就好了
2.cur.excute(sql.args)這樣寫的好處在於看起來清晰，不用拼接特別長的sql字符串了，非常容易拼錯

shell 腳本執行python腳本，連接hive提交數據寫入表

使用說明

技術實現說明

shell 腳本 try2.sh

python腳本

微服務實踐之使用 Visual Studio 2022 調試Dapr 應用程序

Leetcode 動態規劃最長公共子序列+ 198HouseRobbingI+II +746Min Cost Climbing Stairs

期末考覈SpringBoot進階之Web進階

Leetcode665 (Array) Non-decreasing Array +Leetcode674 Longest Continuous Increasing Subsequence

Kaggle Titanic 模型優化提升第三彈

739 Daily Temperatures

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結