第六屆中國軟件杯WIFI探針數據分析

Hello Spark_WIFIProbe_Analyse

基於Hadoop,Spark的WIFI探針大數據分析。

使用Scala語言 版本2.11.0

導入數據庫,tanzhen
把lib包內容放入/home/example中

//分析數據

spark-submit --master spark://master:7077 --name DataAnalyse --class DataAnalyse --executor-memory 1G --total-executor-cores 2 --jars /home/examples/mysql.jar  /home/examples/WIFIAnalyse.jar

//存json數據

spark-submit --master spark://master:7077 --name JsonTanZhen --class JsonTanZhen --executor-memory 1G --total-executor-cores 2 --jars /home/examples/mysql.jar  /home/examples/WIFIAnalyse.jar hdfs://master:55555/input/data*.txt

探針數據分析–完整版

1客流量:店鋪或區域整體客流及趨勢
2入店量:進入店鋪或區域的客流及趨勢
3入店率:進⼊店鋪或區域的客流佔全部客流的比例及趨勢
4駐店時長:進⼊店鋪的顧客在店內的停留時長
5跳出率:進⼊店鋪後很快離店的顧客及佔比(佔總體客流)
6深訪率:進⼊店鋪深度訪問的顧客及佔⽐(佔總體客流)(可以根據定位軌跡或者停留時長判定)
7新老顧客:一定時間段內首次/兩次以上進⼊店鋪的顧客
8來訪週期:進⼊店鋪或區域的顧客距離上次來店的間隔
9顧客活躍度:按顧客距離上次來訪間隔,劃分爲不同活躍度(高活躍度、中活躍度、低活躍度、沉睡活躍度)


客流量


1.id店鋪昨日客流量,sql=”select count(distinct mac) as count from data where to_days(now())-to_days(time) = 1 and id=?”

2.id店鋪七天客流量,sql=”select count(distinct mac) as count from data where DATE_SUB(CURDATE(), INTERVAL 7 DAY) <= date(time) and id=?”

3.id店鋪月客流量,sql=”select count(distinct mac) as count from data where DATE_SUB(CURDATE(), INTERVAL 30 DAY) <= date(time) and id=?”

4.id店鋪上月客流量,sql=”select count(distinct mac) as count from data where PERIOD_DIFF(date_format(now(),’%Y%m’) , date_format(time, ‘%Y%m’ ) ) =1 and id=?”


入店量


5.id店鋪昨日入店量,sql=”select count(distinct mac) as count from data where to_days(now())-to_days(time) = 1 and ranges<=300 and id=?”

6.id店鋪七日入店量,sql=”select count(distinct mac) as count from data where DATE_SUB(CURDATE(), INTERVAL 7 DAY) <= date(time) and ranges<=300 and id=?”

7.id店鋪月入店量,sql=”select count(distinct mac) as count from data where DATE_SUB(CURDATE(), INTERVAL 30 DAY) <= date(time) and ranges<=300 and id=?”

8.id店鋪上月入店量,sql=”select count(distinct mac) as count from data where PERIOD_DIFF(date_format(now(),’%Y%m’) , date_format(time, ‘%Y%m’ ) ) =1 and ranges<=300 and id=?”


入店率

客戶端計算


駐店時長


9.id店鋪昨日用戶停留時間分段,sql=”select case when cha>=0 and cha<15 then ‘a’ when cha>=15 and cha<30 then ‘b’ when cha>=30 and cha<45 then ‘c’ when cha>=45 and cha<60 then ‘d’ when cha>=60 then ‘e’ end as type,count(*) as count from (SELECT max(minute(time))-min(minute(time)) as cha from data where to_days(now())-to_days(time) = 1 and ranges<=300 and id=? GROUP by mac) as total group by (case when cha>=0 and cha<15 then ‘a’ when cha>=15 and cha<30 then ‘b’ when cha>=30 and cha<45 then ‘c’ when cha>=45 and cha<60 then ‘d’ when cha>=60 then ‘e’ end)”


新老顧客


10.id店鋪昨日老顧客,sql=”select count(* )as count from (select mac,count(*) as count from
(select mac from data where to_days(now())-to_days(time) = 1 and ranges<=300 and id=? group by mac
union all
select mac from data where to_days(now())-to_days(time) > 1 and ranges<=300 and id=? group by mac)as total group by mac having count>1)as totals”


11.id店鋪昨日新顧客,sql=”select count(* )as count from (select mac,count(*) as count from
(select mac from data where to_days(now())-to_days(time) = 1 and ranges<=300 and id=? group by mac
union all
select mac from data where to_days(now())-to_days(time) > 1 and ranges<=300 and id=? group by mac)as total group by mac having count=1)as totals”


跳出人數


12.id店鋪昨日跳出率,sql=”select count(*) as count from (SELECT max(minute(time))-min(minute(time)) as cha from data where to_days(now())-to_days(time) = 1 and ranges<=300 and id=? GROUP by mac)as total where cha>=0 and cha<5”


深訪人數


13.id店鋪昨日深訪率,sql=”select count(*) as count from (SELECT max(minute(time))-min(minute(time)) as cha from data where to_days(now())-to_days(time) = 1 and ranges<=300 and id=? GROUP by mac)as total where cha>=30”


來訪週期


14.id店鋪來訪週期,
前七天活躍度,每天的人數

sql=”select case when cha>=1 and cha<2 then ‘a’ when cha>=2 and cha<3 then ‘b’ when cha>=3 and cha<4 then ‘c’ when cha>=4 and cha<7 then ‘d’ when cha>=7 then ‘e’ end type,count(* ) as count from (select mac,count(*) as cha from
(select mac from data where to_days(now())-to_days(time) = 1 and ranges<=300 and id=? group by mac
union all
select mac from data where to_days(now())-to_days(time) = 2 and ranges<=300 and id=? group by mac
union all
select mac from data where to_days(now())-to_days(time) = 3 and ranges<=300 and id=? group by mac
union all
select mac from data where to_days(now())-to_days(time) = 4 and ranges<=300 and id=? group by mac
union all
select mac from data where to_days(now())-to_days(time) = 5 and ranges<=300 and id=? group by mac
union all
select mac from data where to_days(now())-to_days(time) = 6 and ranges<=300 and id=? group by mac
union all
select mac from data where to_days(now())-to_days(time) = 7 and ranges<=300 and id=? group by mac)as total group by mac)as totals group by (case when cha>=1 and cha<2 then ‘a’ when cha>=2 and cha<3 then ‘b’ when cha>=3 and cha<4 then ‘c’ when cha>=4 and cha<7 then ‘d’ when cha>=7 then ‘e’ end)”


sql=”select mac,count(*) as count from
(select mac from data where to_days(now())-to_days(time) = 1 and ranges<=300 and id=? group by mac
union all
select mac from data where to_days(now())-to_days(time) = 2 and ranges<=300 and id=? group by mac
union all
select mac from data where to_days(now())-to_days(time) = 3 and ranges<=300 and id=? group by mac
union all
select mac from data where to_days(now())-to_days(time) = 4 and ranges<=300 and id=? group by mac
union all
select mac from data where to_days(now())-to_days(time) = 5 and ranges<=300 and id=? group by mac
union all
select mac from data where to_days(now())-to_days(time) = 6 and ranges<=300 and id=? group by mac
union all
select mac from data where to_days(now())-to_days(time) = 7 and ranges<=300 and id=? group by mac)as total group by mac”


顧客活躍度


客戶端計算


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章