Flink用戶畫像

我們要使用的幾個組件爲Hadoop 2.6,HBase 1.0.0,MySQL 8,zookeeper 3.4.5,kafka 2.1.0,Flink 1.13,Canal 1.1.5。爲了方便,這裏都使用僞集羣和單機安裝。

Hadoop 2.6的簡單安裝

hadoop-env.sh

export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_172.jdk/Contents/Home

core-site.xml

<configuration>
<property>
    <name>fs.defaultFS</name>
    <value>hdfs://127.0.0.1:9000</value>
</property>
<property>
    <name>hadoop.tmp.dir</name>
    <value>/Users/admin/Downloads/hadoop2</value>
</property>
</configuration>

hdfs-site.xml

<configuration>
	<property>
		<name>dfs.replication</name>
		<value>1</value>
	</property>
	<property>
		<name>dfs.permissions</name>
		<value>false</value>
	</property>
</configuration>

在bin目錄下執行

./hdfs namenode -format

在sbin目錄下執行

./start-dfs.sh

訪問地址

http://127.0.0.1:50070/

zookeeper 3.4.5安裝

zoo.cfg

dataDir=/Users/admin/Downloads/zookeeper/data

在bin目錄下執行

./zkServer.sh start

HBase 1.0.0安裝

hbase-env.sh

export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_172.jdk/Contents/Home

hbase-site.xml

<configuration>
	<property>
		<name>hbase.rootdir</name>
		<value>hdfs://127.0.0.1:9000/hbase</value>
	</property>
	<property>
		<name>hbase.cluster.distributed</name>
		<value>true</value>
	</property>
	<property>
		<name>hbase.zookeeper.quorum</name>
		<value>localhost</value>
	</property>
	<property>
		<name>dfs.replication</name>
		<value>1</value>
	</property>
</configuration>

在bin目錄下執行

./start-hbase.sh

訪問地址

http://127.0.0.1:60010/

kafka 2.1.0安裝

server.properties

log.dirs=/Users/admin/Downloads/kafka-logs

在bin目錄下執行

./kafka-server-start.sh ../config/server.properties 
./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
./kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test

MySQL 8 docker安裝

新建文件夾,我這裏爲mysql-bin,新建文件my.cnf,內容如下

[client]
socket = /var/sock/mysqld/mysqld.sock
[mysql]
socket = /var/sock/mysqld/mysqld.sock
[mysqld]
skip-host-cache
skip-name-resolve
datadir = /var/lib/mysql
user = mysql
port = 3306
bind-address = 0.0.0.0
socket = /var/sock/mysqld/mysqld.sock
pid-file = /var/run/mysqld/mysqld.pid
general_log_file = /var/log/mysql/query.log
slow_query_log_file = /var/log/mysql/slow.log
log-error = /var/log/mysql/error.log
log-bin=mysql-bin
binlog-format=ROW
server-id=1
!includedir /etc/my.cnf.d/
!includedir /etc/mysql/conf.d/
!includedir /etc/mysql/docker-default.d/

啓動命令

docker run -d --name mysql -e MYSQL_ROOT_PASSWORD=abcd123 -p 3306:3306 -v /Users/admin/Downloads/mysql-bin/my.cnf:/etc/my.cnf docker.io/cytopia/mysql-8.0

新建數據庫protrait

新建表

DROP TABLE IF EXISTS `user_info`;
CREATE TABLE `user_info` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `account` varchar(255) DEFAULT NULL,
  `password` varchar(255) DEFAULT NULL,
  `sex` varchar(255) DEFAULT NULL,
  `age` int(11) DEFAULT NULL,
  `phone` varchar(255) DEFAULT NULL,
  `status` int(255) DEFAULT NULL COMMENT '會員狀態,0、普通會員,1、白銀會員,2、黃金會員',
  `wechat_account` varchar(255) DEFAULT NULL,
  `zhifubao_account` varchar(255) DEFAULT NULL,
  `email` varchar(255) DEFAULT NULL,
  `create_time` datetime DEFAULT NULL,
  `update_time` datetime DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8mb4;

SET FOREIGN_KEY_CHECKS = 1;

Canal 1.1.5安裝

canal.properties

canal.zkServers = 127.0.0.1:2181
canal.serverMode = kafka

在conf/example目錄下的instance.properties

canal.instance.master.address=127.0.0.1:3306
canal.instance.dbUsername=root
canal.instance.dbPassword=abcd123
canal.instance.defaultDatabaseName=portrait
canal.mq.topic=test

在bin目錄下執行

./startup.sh

此時當我們在數據庫中插入一條數據的時候

insert into user_info (account,password,sex,age,phone,status,wechat_account,zhifubao_account,email,create_time,update_time) 
values ('abcd','1234','男',24,'13873697762',0,'火名之月','abstart','[email protected]','2021-09-10','2021-10-11')

在kafka的消費端查看爲

[2021-11-05 15:13:05,173] INFO [GroupMetadataManager brokerId=0] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
{
"data":[
{
"id":"8",
"account":"abcd",
"password":"1234",
"sex":"男",
"age":"24",
"phone":"13873697762",
"status":"0",
"wechat_account":"火名之月",
"zhifubao_account":"abstart",
"email":"[email protected]",
"create_time":"2021-09-10 00:00:00",
"update_time":"2021-10-11 00:00:00"
}
],
"database":"portrait",
"es":1636096762000,
"id":11,
"isDdl":false,
"mysqlType":{
"id":"bigint(0)",
"account":"varchar(255)",
"password":"varchar(255)",
"sex":"varchar(255)",
"age":"int(0)",
"phone":"varchar(255)",
"status":"int(255)",
"wechat_account":"varchar(255)",
"zhifubao_account":"varchar(255)",
"email":"varchar(255)",
"create_time":"datetime(0)",
"update_time":"datetime(0)"
},
"old":null,
"pkNames":[
"id"
],
"sql":"",
"sqlType":{
"id":-5,
"account":12,
"password":12,
"sex":12,
"age":4,
"phone":12,
"status":4,
"wechat_account":12,
"zhifubao_account":12,
"email":12,
"create_time":93,
"update_time":93
},
"table":"user_info",
"ts":1636096762605,
"type":"INSERT"
}

Flink流式處理消息

Java版依賴,有關Flink的詳細內容請參考Flink技術整理 ,由於這裏使用的是1.13.0,而之前使用的是1.7.2,有一些API已經不可用了。

<properties>
   <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
   <flink.version>1.13.0</flink.version>
   <alink.version>1.4.0</alink.version>
   <fastjson.version>1.2.74</fastjson.version>
   <java.version>1.8</java.version>
   <scala.version>2.11.12</scala.version>
   <hadoop.version>2.6.0</hadoop.version>
   <hbase.version>1.0.0</hbase.version>
   <scala.binary.version>2.11</scala.binary.version>
   <maven.compiler.source>${java.version}</maven.compiler.source>
   <maven.compiler.target>${java.version}</maven.compiler.target>
</properties>
<dependencies>
   <!-- Apache Flink dependencies -->
   <!-- These dependencies are provided, because they should not be packaged into the JAR file. -->
   <dependency>
      <groupId>org.apache.flink</groupId>
      <artifactId>flink-java</artifactId>
      <version>${flink.version}</version>
      <scope>provided</scope>
   </dependency>
   <dependency>
      <groupId>com.alibaba.alink</groupId>
      <artifactId>alink_core_flink-1.13_2.11</artifactId>
      <version>${alink.version}</version>
   </dependency>
   <dependency>
      <groupId>ru.yandex.clickhouse</groupId>
      <artifactId>clickhouse-jdbc</artifactId>
      <version>0.1.40</version>
   </dependency>
   <dependency>
      <groupId>com.alibaba</groupId>
      <artifactId>fastjson</artifactId>
      <version>${fastjson.version}</version>
   </dependency>
   <dependency>
      <groupId>com.google.guava</groupId>
      <artifactId>guava</artifactId>
      <version>15.0</version>
   </dependency>
   <dependency>
      <groupId>org.apache.commons</groupId>
      <artifactId>commons-compress</artifactId>
      <version>1.21</version>
   </dependency>
   <dependency>
      <groupId>org.apache.flink</groupId>
      <artifactId>flink-streaming-java_${scala.binary.version}</artifactId>
      <version>${flink.version}</version>
   </dependency>
   <dependency>
      <groupId>org.apache.flink</groupId>
      <artifactId>flink-clients_${scala.binary.version}</artifactId>
      <version>${flink.version}</version>
   </dependency>
   <dependency>
      <groupId>org.apache.flink</groupId>
      <artifactId>flink-scala_${scala.binary.version}</artifactId>
      <version>${flink.version}</version>
   </dependency>
   <dependency>
      <groupId>org.apache.hbase</groupId>
      <artifactId>hbase-client</artifactId>
      <version>${hbase.version}</version>
   </dependency>
   <dependency>
      <groupId>org.apache.flink</groupId>
      <artifactId>flink-streaming-scala_${scala.binary.version}</artifactId>
      <version>${flink.version}</version>
   </dependency>
   <dependency>
      <groupId>org.scala-lang</groupId>
      <artifactId>scala-library</artifactId>
      <version>${scala.version}</version>
   </dependency>
   <dependency>
      <groupId>org.apache.flink</groupId>
      <artifactId>flink-table-planner_${scala.binary.version}</artifactId>
      <version>${flink.version}</version>
   </dependency>
   <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-client</artifactId>
      <version>${hadoop.version}</version>
   </dependency>
   <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-common</artifactId>
      <version>${hadoop.version}</version>
   </dependency>
   <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-hdfs</artifactId>
      <version>${hadoop.version}</version>
   </dependency>
   <dependency>
      <groupId>org.apache.flink</groupId>
      <artifactId>flink-connector-kafka_2.11</artifactId>
      <version>${flink.version}</version>
   </dependency>
   <dependency>
      <groupId>org.apache.kafka</groupId>
      <artifactId>kafka-clients</artifactId>
      <version>1.1.1</version>
   </dependency>
   <dependency>
      <groupId>org.apache.flink</groupId>
      <artifactId>flink-statebackend-rocksdb_2.11</artifactId>
      <version>${flink.version}</version>
   </dependency>
   <dependency>
      <groupId>org.apache.flink</groupId>
      <artifactId>flink-connector-elasticsearch6_2.11</artifactId>
      <version>${flink.version}</version>
   </dependency>
   <dependency>
      <groupId>mysql</groupId>
      <artifactId>mysql-connector-java</artifactId>
      <version>8.0.11</version>
   </dependency>
   <dependency>
      <groupId>org.projectlombok</groupId>
      <artifactId>lombok</artifactId>
      <version>1.18.16</version>
      <optional>true</optional>
   </dependency>
   <dependency>
      <groupId>org.slf4j</groupId>
      <artifactId>slf4j-log4j12</artifactId>
      <version>1.7.7</version>
      <scope>runtime</scope>
   </dependency>
   <dependency>
      <groupId>log4j</groupId>
      <artifactId>log4j</artifactId>
      <version>1.2.17</version>
      <scope>runtime</scope>
   </dependency>
</dependencies>

我們先使用Flink來讀取Kafka消息

public class Test {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        Properties properties = new Properties();
        properties.setProperty("bootstrap.servers","127.0.0.1:9092");
        properties.setProperty("group.id","portrait");
        FlinkKafkaConsumer<String> myConsumer = new FlinkKafkaConsumer<>("test",
                new SimpleStringSchema(),properties);

        DataStreamSource<String> data = env.addSource(myConsumer);
        env.enableCheckpointing(5000);
        data.print();
        env.execute("portrait test");
    }
}

運行結果

16:39:42,070 INFO  org.apache.kafka.clients.consumer.internals.AbstractCoordinator  - [Consumer clientId=consumer-21, groupId=portrait] Discovered group coordinator admindembp.lan:9092 (id: 2147483647 rack: null)
15> {"data":[{"id":"8","account":"abcd","password":"1234","sex":"男","age":"24","phone":"13873697762","status":"0","wechat_account":"火名之月","zhifubao_account":"abstart","email":"[email protected]","create_time":"2021-09-10 00:00:00","update_time":"2021-10-11 00:00:00"}],"database":"portrait","es":1636096762000,"id":11,"isDdl":false,"mysqlType":{"id":"bigint(0)","account":"varchar(255)","password":"varchar(255)","sex":"varchar(255)","age":"int(0)","phone":"varchar(255)","status":"int(255)","wechat_account":"varchar(255)","zhifubao_account":"varchar(255)","email":"varchar(255)","create_time":"datetime(0)","update_time":"datetime(0)"},"old":null,"pkNames":["id"],"sql":"","sqlType":{"id":-5,"account":12,"password":12,"sex":12,"age":4,"phone":12,"status":4,"wechat_account":12,"zhifubao_account":12,"email":12,"create_time":93,"update_time":93},"table":"user_info","ts":1636096762605,"type":"INSERT"}

現在我們將該數據解析並存儲到HBase中

新建一個UserInfo的實體類

@Data
@ToString
public class UserInfo {
    private Long id;
    private String account;
    private String password;
    private String sex;
    private Integer age;
    private String phone;
    private Integer status;
    private String wechatAccount;
    private String zhifubaoAccount;
    private String email;
    private Date createTime;
    private Date updateTime;
}

一個HBase工具類

@Slf4j
public class HbaseUtil {
    private static Admin admin = null;
    private static Connection conn = null;

    static {
        Configuration conf = HBaseConfiguration.create();
        conf.set("hbase.rootdir","hdfs://127.0.0.1:9000/hbase");
        conf.set("hbase.zookeeper.quorum","127.0.0.1");
        conf.set("hbase.client.scanner.timeout.period","600000");
        conf.set("hbase.rpc.timeout","600000");
        try {
            conn = ConnectionFactory.createConnection(conf);
            admin = conn.getAdmin();
        }catch (IOException e) {
            e.printStackTrace();
        }
    }

    public static void createTable(String tableName,String famliyname) throws IOException {
        HTableDescriptor tab = new HTableDescriptor(tableName);
        HColumnDescriptor colDesc = new HColumnDescriptor(famliyname);
        tab.addFamily(colDesc);
        admin.createTable(tab);
        log.info("over");
    }

    public static void put(String tablename, String rowkey, String famliyname, Map<String,String> datamap) throws IOException {
        Table table = conn.getTable(TableName.valueOf(tablename));
        byte[] rowkeybyte = Bytes.toBytes(rowkey);
        Put put = new Put(rowkeybyte);
        if (datamap != null) {
            Set<Map.Entry<String,String>> set = datamap.entrySet();
            for (Map.Entry<String,String> entry : set) {
                String key = entry.getKey();
                Object value = entry.getValue();
                put.addColumn(Bytes.toBytes(famliyname),Bytes.toBytes(key),
                        Bytes.toBytes(value + ""));
            }
        }
        table.put(put);
        table.close();
        log.info("OK");
    }

    public static String getdata(String tablename,String rowkey,
                                 String famliyname,String colmn) throws IOException {
        Table table = conn.getTable(TableName.valueOf(tablename));
        byte[] rowkeybyte = Bytes.toBytes(rowkey);
        Get get = new Get(rowkeybyte);
        Result result = table.get(get);
        byte[] resultbytes = result.getValue(famliyname.getBytes(),colmn.getBytes());
        if (resultbytes == null) {
            return null;
        }
        return new String(resultbytes);
    }

    public static void putdata(String tablename,String rowkey,
                               String famliyname,String colum,
                               String data) throws IOException {
        Table table = conn.getTable(TableName.valueOf(tablename));
        Put put = new Put(rowkey.getBytes());
        put.addColumn(famliyname.getBytes(),colum.getBytes(),data.getBytes());
        table.put(put);
    }

    public static void main(String[] args) throws IOException {
//        createTable("testinfo","time");
        putdata("testinfo","1","time","info","ty");
//        Map<String,String> datamap = new HashMap<>();
//        datamap.put("info1","ty1");
//        datamap.put("info2","ty2");
//        put("testinfo","2","time",datamap);
        String result = getdata("testinfo","1","time","info");
        log.info(result);
    }
}

在HBase的bin目錄下執行

./hbase shell
create "user_info","info"
@Slf4j
public class TranferAnaly {

    @SuppressWarnings("unchecked")
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        Properties properties = new Properties();
        properties.setProperty("bootstrap.servers","127.0.0.1:9092");
        properties.setProperty("group.id","portrait");
        FlinkKafkaConsumer<String> myConsumer = new FlinkKafkaConsumer<>("test",
                new SimpleStringSchema(),properties);

        DataStreamSource<String> data = env.addSource(myConsumer);
        env.enableCheckpointing(5000);
        DataStream<String> map = data.map(s -> {
            JSONObject jsonObject = JSONObject.parseObject(s);
            String type = jsonObject.getString("type");
            String table = jsonObject.getString("table");
            String database = jsonObject.getString("database");
            String data1 = jsonObject.getString("data");
            List<UserInfo> list = JSONObject.parseArray(data1,UserInfo.class);
            log.info(list.toString());
            for (UserInfo userInfo : list) {
                String tablename = table;
                String rowkey = userInfo.getId() + "";
                String famliyname = "info";
                Map<String,String> datamap = JSONObject.parseObject(JSONObject.toJSONString(userInfo),Map.class);
                datamap.put("database",database);
                datamap.put("typebefore",HbaseUtil.getdata(tablename,rowkey,famliyname,"typecurrent"));
                datamap.put("typecurrent",type);
                HbaseUtil.put(tablename,rowkey,famliyname,datamap);
            }
            return null;
        });
//        map.print();
        env.execute("portrait test");
    }
}

在HBase中查詢,即爲

scan 'user_info'
ROW                                                          COLUMN+CELL                                                                                                                                                                      
 12                                                          column=info:account, timestamp=1636105093607, value=abcd                                                                                                                         
 12                                                          column=info:age, timestamp=1636105093607, value=24                                                                                                                               
 12                                                          column=info:createTime, timestamp=1636105093607, value=1631203200000                                                                                                             
 12                                                          column=info:database, timestamp=1636105093607, value=portrait                                                                                                                    
 12                                                          column=info:email, timestamp=1636105093607, [email protected]                                                                                                                  
 12                                                          column=info:id, timestamp=1636105093607, value=12                                                                                                                                
 12                                                          column=info:password, timestamp=1636105093607, value=1234                                                                                                                        
 12                                                          column=info:phone, timestamp=1636105093607, value=13873697762                                                                                                                    
 12                                                          column=info:sex, timestamp=1636105093607, value=\xE7\x94\xB7                                                                                                                     
 12                                                          column=info:status, timestamp=1636105093607, value=0                                                                                                                             
 12                                                          column=info:typebefore, timestamp=1636105093607, value=null                                                                                                                      
 12                                                          column=info:typecurrent, timestamp=1636105093607, value=INSERT                                                                                                                   
 12                                                          column=info:updateTime, timestamp=1636105093607, value=1633881600000                                                                                                             
 12                                                          column=info:wechatAccount, timestamp=1636105093607, value=\xE7\x81\xAB\xE5\x90\x8D\xE4\xB9\x8B\xE6\x9C\x88                                                                       
 12                                                          column=info:zhifubaoAccount, timestamp=1636105093607, value=abstart                                                                                                              
1 row(s) in 0.0110 seconds

現在我們再將數據傳遞出去

在kafka的bin目錄下執行,建立一個新的topic

./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic user_info
./kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic user_info

新增加一個Kafka工具類

@Slf4j
public class KafkaUtil {
    private static Properties getProps() {
        Properties props = new Properties();
        props.put("bootstrap.servers","127.0.0.1:9092");
        props.put("acks","all");
        props.put("retries",2);
        props.put("linger.ms",1000);
        props.put("client.id","producer-syn-1");
        props.put("key.serializer","org.apache.kafka.common.serialization.StringSerializer");
        props.put("value.serializer","org.apache.kafka.common.serialization.StringSerializer");
        return props;
    }

    public static void sendData(String topicName,String data) throws ExecutionException, InterruptedException {
        KafkaProducer<String,String> producer = new KafkaProducer<>(getProps());
        ProducerRecord<String,String> record = new ProducerRecord<>(topicName,data);
        Future<RecordMetadata> metadataFuture = producer.send(record);
        RecordMetadata recordMetadata = metadataFuture.get();
        log.info("topic:" + recordMetadata.topic());
        log.info("partition:" + recordMetadata.partition());
        log.info("offset:" + recordMetadata.offset());
    }
}

然後將消息發送出去

@Slf4j
public class TranferAnaly {

    @SuppressWarnings("unchecked")
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        Properties properties = new Properties();
        properties.setProperty("bootstrap.servers","127.0.0.1:9092");
        properties.setProperty("group.id","portrait");
        FlinkKafkaConsumer<String> myConsumer = new FlinkKafkaConsumer<>("test",
                new SimpleStringSchema(),properties);

        DataStreamSource<String> data = env.addSource(myConsumer);
        env.enableCheckpointing(5000);
        DataStream<String> map = data.map(s -> {
            JSONObject jsonObject = JSONObject.parseObject(s);
            String type = jsonObject.getString("type");
            String table = jsonObject.getString("table");
            String database = jsonObject.getString("database");
            String data1 = jsonObject.getString("data");
            List<UserInfo> list = JSONObject.parseArray(data1,UserInfo.class);
            List<Map<String,String>> listdata = new ArrayList<>();
            log.info(list.toString());
            for (UserInfo userInfo : list) {
                String tablename = table;
                String rowkey = userInfo.getId() + "";
                String famliyname = "info";
                Map<String,String> datamap = JSONObject.parseObject(JSONObject.toJSONString(userInfo),Map.class);
                datamap.put("database",database);
                datamap.put("typebefore",HbaseUtil.getdata(tablename,rowkey,famliyname,"typecurrent"));
                datamap.put("typecurrent",type);
                datamap.put("tablename",table);
                HbaseUtil.put(tablename,rowkey,famliyname,datamap);
                listdata.add(datamap);
            }
            return JSONObject.toJSONString(listdata);
        });
        map.addSink(new SinkFunction<String>() {
            @Override
            public void invoke(String value, Context context) throws Exception {
                List<Map> data = JSONObject.parseArray(value,Map.class);
                for (Map<String,String> map : data) {
                    String tablename = map.get("tablename");
                    KafkaUtil.sendData(tablename,JSONObject.toJSONString(map));
                }
            }
        });
        env.execute("portrait test");
    }
}

查看Kafka消費端

[2021-11-05 20:11:26,869] INFO [GroupCoordinator 0]: Assignment received from leader for group console-consumer-47692 for generation 1 (kafka.coordinator.group.GroupCoordinator)
{"wechatAccount":"火名之月","sex":"男","zhifubaoAccount":"abstart","updateTime":1633881600000,"password":"1234","database":"portrait","createTime":1631203200000,"phone":"13873697762","typecurrent":"INSERT","id":15,"tablename":"user_info","account":"abcd","age":24,"email":"[email protected]","status":0}

創建用戶畫像Years標籤

創建一個年代標籤實體類

@Data
public class Years {
    private Long userid;
    private String yearsFlag;
    private Long numbers = 0L;
    private String groupField;
}

創建一個YearsUntil工具類

public class YearsUntil {
    public static String getYears(Integer age) {
        Calendar calendar = Calendar.getInstance();
        calendar.setTime(new Date());
        calendar.add(Calendar.YEAR,-age);
        Date newDate = calendar.getTime();
        DateFormat dateFormat = new SimpleDateFormat("yyyy");
        String newDateString = dateFormat.format(newDate);
        Integer newDateInteger = Integer.parseInt(newDateString);
        String yearBaseType = "未知";
        if (newDateInteger >= 1940 && newDateInteger < 1950) {
            yearBaseType = "40後";
        }else if (newDateInteger >= 1950 && newDateInteger < 1960) {
            yearBaseType = "50後";
        }else if (newDateInteger >= 1960 && newDateInteger < 1970) {
            yearBaseType = "60後";
        }else if (newDateInteger >= 1970 && newDateInteger < 1980) {
            yearBaseType = "70後";
        }else if (newDateInteger >= 1980 && newDateInteger < 1990) {
            yearBaseType = "80後";
        }else if (newDateInteger >= 1990 && newDateInteger < 2000) {
            yearBaseType = "90後";
        }else if (newDateInteger >= 2000 && newDateInteger < 2010) {
            yearBaseType = "00後";
        }else if (newDateInteger >= 2010 && newDateInteger < 2020) {
            yearBaseType = "10後";
        }
        return yearBaseType;
    }
}

創建一個ClickUntil接口

public interface ClickUntil {
    void saveData(String tablename,Map<String,String> data,Set<String> fields);
    ResultSet getQueryResult(String database, String sql) throws Exception;
}

一個實現類

public class DefaultClickUntil implements ClickUntil {
    private static ClickUntil instance = new DefaultClickUntil();

    public static ClickUntil createInstance() {
        return instance;
    }

    private DefaultClickUntil() {

    }

    @Override
    public void saveData(String tablename, Map<String, String> data, Set<String> fields) {

    }

    @Override
    public ResultSet getQueryResult(String database, String sql) throws Exception {
        return null;
    }
}

此處我們不實現接口方法,後續會有其他實現類來代替。

一個ClickUntilFactory工廠類

public class ClickUntilFactory {
    public static ClickUntil createClickUntil() {
        return DefaultClickUntil.createInstance();
    }
}

一個DateUntil工具類

public class DateUntil {
    public static String getByInterMinute(String timeInfo) {
        Long timeMillons = Long.parseLong(timeInfo);
        Date date = new Date(timeMillons);
        DateFormat dateFormatMinute = new SimpleDateFormat("mm");
        DateFormat dateFormatHour = new SimpleDateFormat("yyyyMMddHH");
        String minute = dateFormatMinute.format(date);
        String hour = dateFormatHour.format(date);
        Long minuteLong = Long.parseLong(minute);
        String replaceMinute = "";
        if (minuteLong >= 0 && minuteLong < 5) {
            replaceMinute = "05";
        }else if (minuteLong >= 5 && minuteLong < 10) {
            replaceMinute = "10";
        }else if (minuteLong >= 10 && minuteLong < 15) {
            replaceMinute = "15";
        }else if (minuteLong >= 15 && minuteLong < 20) {
            replaceMinute = "20";
        }else if (minuteLong >= 20 && minuteLong < 25) {
            replaceMinute = "25";
        }else if (minuteLong >= 25 && minuteLong < 30) {
            replaceMinute = "30";
        }else if (minuteLong >= 30 && minuteLong < 35) {
            replaceMinute = "35";
        }else if (minuteLong >= 35 && minuteLong < 40) {
            replaceMinute = "40";
        }else if (minuteLong >= 40 && minuteLong < 45) {
            replaceMinute = "45";
        }else if (minuteLong >= 45 && minuteLong < 50) {
            replaceMinute = "50";
        }else if (minuteLong >= 50 && minuteLong < 55) {
            replaceMinute = "55";
        }else if (minuteLong >= 55 && minuteLong < 60) {
            replaceMinute = "60";
        }
        return hour + replaceMinute;
    }

    public static Long getCurrentFiveMinuteInterStart(Long visitTime) throws ParseException {
        String timeString = getByInterMinute(visitTime + "");
        DateFormat dateFormat = new SimpleDateFormat("yyyyMMddHHmm");
        Date date = dateFormat.parse(timeString);
        return date.getTime();
    }
}

一個YearsAnalyMap的實現MapFunction接口的轉換類

public class YearsAnalyMap implements MapFunction<String,Years> {
    private ClickUntil clickUntil = ClickUntilFactory.createClickUntil();

    @Override
    @SuppressWarnings("unchecked")
    public Years map(String s) throws Exception {
        Map<String,String> datamap = JSONObject.parseObject(s,Map.class);
        String typecurrent = datamap.get("typecurrent");
        Years years = new Years();
        if (typecurrent.equals("INSERT")) {
            UserInfo userInfo = JSONObject.parseObject(s,UserInfo.class);
            String yearLabel = YearsUntil.getYears(userInfo.getAge());
            Map<String,String> mapdata = new HashMap<>();
            mapdata.put("userid",userInfo.getId() + "");
            mapdata.put("yearlabel",yearLabel);
            Set<String> fields = new HashSet<>();
            fields.add("userid");
            clickUntil.saveData("user_info",mapdata,fields);
            String fiveMinute = DateUntil.getByInterMinute(System.currentTimeMillis() + "");
            String groupField = "yearlable==" + fiveMinute + "==" + yearLabel;
            Long numbers = 1L;
            years.setGroupField(groupField);
            years.setNumbers(numbers);
        }
        return years;
    }
}

最後是用戶畫像的年份標籤的Flink流處理

public class YearsAnaly {
    public static void main(String[] args) {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        Properties properties = new Properties();
        properties.setProperty("bootstrap.servers","127.0.0.1:9092");
        properties.setProperty("group.id","portrait");
        FlinkKafkaConsumer<String> myConsumer = new FlinkKafkaConsumer<>("user_info",
                new SimpleStringSchema(),properties);
        DataStreamSource<String> data = env.addSource(myConsumer);
        env.enableCheckpointing(5000);
        data.map(new YearsAnalyMap());
    }
}

現在我們將年份標籤每5分鐘進行一次彙總統計數量,並進行存儲(Sink)。

新增一個YearsAnalyReduce實現了ReduceFunction接口的統計類

public class YearsAnalyReduce implements ReduceFunction<Years> {
    @Override
    public Years reduce(Years years, Years t1) throws Exception {
        Long numbers1 = 0L;
        String groupField = "";
        if (years != null) {
            numbers1 = years.getNumbers();
            groupField = years.getGroupField();
        }
        Long numbers2 = 0L;
        if (t1 != null) {
            numbers2 = t1.getNumbers();
            groupField = t1.getGroupField();
        }
        if (StringUtils.isNotBlank(groupField)) {
            Years years1 = new Years();
            years1.setGroupField(groupField);
            years1.setNumbers(numbers1 + numbers2);
            return years1;
        }
        return null;
    }
}

一個YearsAnalySink實現了SinkFunction接口的存儲類

public class YearsAnalySink implements SinkFunction<Years> {
    private ClickUntil clickUntil = ClickUntilFactory.createClickUntil();

    @Override
    public void invoke(Years value, Context context) throws Exception {
        if (value != null) {
            String groupField = value.getGroupField();
            String[] groupFields = groupField.split("==");
            String timeinfo = groupFields[1];
            String yearlabel = groupFields[2];
            Long numbers = value.getNumbers();
            String tablename = "yearslabel_info";
            Map<String, String> dataMap = new HashMap<>();
            dataMap.put("timeinfo",timeinfo);
            dataMap.put("yearslabel",yearlabel);
            dataMap.put("numbers",numbers + "");
            Set<String> fields = new HashSet<>();
            fields.add("timeinfo");
            fields.add("numbers");
            clickUntil.saveData(tablename,dataMap,fields);
        }
    }
}

然後是Flink流處理

public class YearsAnaly {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        Properties properties = new Properties();
        properties.setProperty("bootstrap.servers","127.0.0.1:9092");
        properties.setProperty("group.id","portrait");
        FlinkKafkaConsumer<String> myConsumer = new FlinkKafkaConsumer<>("user_info",
                new SimpleStringSchema(),properties);
        DataStreamSource<String> data = env.addSource(myConsumer);
        env.enableCheckpointing(5000);
        DataStream<Years> map = data.map(new YearsAnalyMap());
        DataStream<Years> reduce = map.keyBy(Years::getGroupField).timeWindowAll(Time.minutes(5))
                .reduce(new YearsAnalyReduce());
        reduce.addSink(new YearsAnalySink());
        env.execute("portrait years");
    }
}

這麼做是爲了看看不同的時間段內用戶的年代標籤會產生什麼樣的變化。

創建用戶畫像手機運營商標籤

創建一個手機運營商工具類CarrierUntil

public class CarrierUntil {
    /**
     * 中國電信號碼格式驗證 手機段: 133,153,180,181,189,177,1700,173,199
     **/
    private static final String CHINA_TELECOM_PATTERN = "(^1(33|53|77|73|99|8[019])\\d{8}$)|(^1700\\d{7}$)";

    /**
     * 中國聯通號碼格式驗證 手機段:130,131,132,155,156,185,186,145,176,1709
     **/
    private static final String CHINA_UNICOM_PATTERN = "(^1(3[0-2]|4[5]|5[56]|7[6]|8[56])\\d{8}$)|(^1709\\d{7}$)";

    /**
     * 中國移動號碼格式驗證
     * 手機段:134,135,136,137,138,139,150,151,152,157,158,159,182,183,184,187,188,147,178,1705
     **/
    private static final String CHINA_MOBILE_PATTERN = "(^1(3[4-9]|4[7]|5[0-27-9]|7[8]|8[2-478])\\d{8}$)|(^1705\\d{7}$)";

    /**
     * 0、未知 1、移動 2、聯通 3、電信
     * @param telphone
     * @return
     */
    public static Integer getCarrierByTel(String telphone) {
        boolean b1 = StringUtils.isNotBlank(telphone) && match(CHINA_MOBILE_PATTERN, telphone);
        if (b1) {
            return 1;
        }
        b1 = StringUtils.isNotBlank(telphone) && match(CHINA_UNICOM_PATTERN, telphone);
        if (b1) {
            return 2;
        }
        b1 = StringUtils.isNotBlank(telphone) && match(CHINA_TELECOM_PATTERN, telphone);
        if (b1) {
            return 3;
        }
        return 0;
    }

    private static boolean match(String regex, String tel) {
        return Pattern.matches(regex, tel);
    }
}

一個運營商標籤實體類

@Data
public class Carrier {
    private Long userid;
    private String carrierName;
    private Long numbers = 0L;
    private String groupField;
}

一個CarrierAnalyMap實現了MapFunction接口的轉換類

public class CarrierAnalyMap implements MapFunction<String,Carrier> {
    private ClickUntil clickUntil = ClickUntilFactory.createClickUntil();

    @Override
    @SuppressWarnings("unchecked")
    public Carrier map(String s) throws Exception {
        Map<String,String> datamap = JSONObject.parseObject(s,Map.class);
        String typecurrent = datamap.get("typecurrent");
        Carrier carrier = new Carrier();
        if (typecurrent.equals("INSERT")) {
            UserInfo userInfo = JSONObject.parseObject(s,UserInfo.class);
            String telphone = userInfo.getPhone();
            Integer carrierInteger = CarrierUntil.getCarrierByTel(telphone);
            String carrierLabel = "";
            switch (carrierInteger) {
                case 0:
                    carrierLabel = "未知";
                    break;
                case 1:
                    carrierLabel = "移動";
                    break;
                case 2:
                    carrierLabel = "聯通";
                    break;
                case 3:
                    carrierLabel = "電信";
                    break;
                default:
                    break;
            }
            Map<String,String> mapdata = new HashMap<>();
            mapdata.put("userid",userInfo.getId() + "");
            mapdata.put("carrierlabel",carrierLabel);
            Set<String> fields = new HashSet<>();
            fields.add("userid");
            clickUntil.saveData("user_info",mapdata,fields);
            String fiveMinute = DateUntil.getByInterMinute(System.currentTimeMillis() + "");
            String groupField = "carrierlabel==" + fiveMinute + "==" + carrierLabel;
            Long numbers = 1L;
            carrier.setGroupField(groupField);
            carrier.setNumbers(numbers);
        }
        return carrier;
    }
}

一個CarrierAnalyReduce實現了ReduceFunction接口的統計類

public class CarrierAnalyReduce implements ReduceFunction<Carrier> {
    @Override
    public Carrier reduce(Carrier carrier, Carrier t1) throws Exception {
        Long numbers1 = 0L;
        String groupField = "";
        if (carrier != null) {
            numbers1 = carrier.getNumbers();
            groupField = carrier.getGroupField();
        }
        Long numbers2 = 0L;
        if (t1 != null) {
            numbers2 = t1.getNumbers();
            groupField = t1.getGroupField();
        }
        if (StringUtils.isNotBlank(groupField)) {
            Carrier carrier1 = new Carrier();
            carrier1.setGroupField(groupField);
            carrier1.setNumbers(numbers1 + numbers2);
            return carrier1;
        }
        return null;
    }
}

一個CarrierAnalySink實現了SinkFunction接口的存儲類

public class CarrierAnalySink implements SinkFunction<Carrier> {
    private ClickUntil clickUntil = ClickUntilFactory.createClickUntil();

    @Override
    public void invoke(Carrier value, Context context) throws Exception {
        if (value != null) {
            String groupField = value.getGroupField();
            String[] groupFields = groupField.split("==");
            String timeinfo = groupFields[1];
            String carrierLabel = groupFields[2];
            Long numbers = value.getNumbers();
            String tablename = "carrierlabel_info";
            Map<String, String> dataMap = new HashMap<>();
            dataMap.put("timeinfo",timeinfo);
            dataMap.put("carrierlabel",carrierLabel);
            dataMap.put("numbers",numbers + "");
            Set<String> fields = new HashSet<>();
            fields.add("timeinfo");
            fields.add("numbers");
            clickUntil.saveData(tablename,dataMap,fields);
        }
    }
}

然後是Flink的流處理

public class CarrierAnaly {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        Properties properties = new Properties();
        properties.setProperty("bootstrap.servers","127.0.0.1:9092");
        properties.setProperty("group.id","portrait");
        FlinkKafkaConsumer<String> myConsumer = new FlinkKafkaConsumer<>("user_info",
                new SimpleStringSchema(),properties);
        DataStreamSource<String> data = env.addSource(myConsumer);
        env.enableCheckpointing(5000);
        DataStream<Carrier> map = data.map(new CarrierAnalyMap());
        DataStream<Carrier> reduce = map.keyBy(Carrier::getGroupField).timeWindowAll(Time.minutes(5))
                .reduce(new CarrierAnalyReduce());
        reduce.addSink(new CarrierAnalySink());
        env.execute("portrait carrier");
    }
}

創建用戶畫像會員分類標籤

會員標籤實體類

@Data
public class Member {
    private Long userid;
    private String memberFlag;
    private Long numbers = 0L;
    private String groupField;
}

一個MemberAnalyMap實現了MapFunction接口的轉換類

public class MemberAnalyMap implements MapFunction<String,Member> {
    private ClickUntil clickUntil = ClickUntilFactory.createClickUntil();

    @Override
    @SuppressWarnings("unchecked")
    public Member map(String s) throws Exception {
        Map<String,String> datamap = JSONObject.parseObject(s,Map.class);
        String typecurrent = datamap.get("typecurrent");
        Member member = new Member();
        if (typecurrent.equals("INSERT")) {
            UserInfo userInfo = JSONObject.parseObject(s,UserInfo.class);
            Integer memberInteger = userInfo.getStatus();
            String memberLabel = "";
            switch (memberInteger) {
                case 0:
                    memberLabel = "普通會員";
                    break;
                case 1:
                    memberLabel = "白銀會員";
                    break;
                case 2:
                    memberLabel = "黃金會員";
                    break;
                default:
                    break;
            }
            Map<String,String> mapdata = new HashMap<>();
            mapdata.put("userid",userInfo.getId() + "");
            mapdata.put("memberlabel",memberLabel);
            Set<String> fields = new HashSet<>();
            fields.add("userid");
            clickUntil.saveData("user_info",mapdata,fields);
            String fiveMinute = DateUntil.getByInterMinute(System.currentTimeMillis() + "");
            String groupField = "memberlable==" + fiveMinute + "==" + memberLabel;
            Long numbers = 1L;
            member.setGroupField(groupField);
            member.setNumbers(numbers);
        }
        return member;
    }
}

一個MemberAnalyReduce實現了ReduceFunction接口的統計類

public class MemberAnalyReduce implements ReduceFunction<Member> {
    @Override
    public Member reduce(Member member, Member t1) throws Exception {
        Long numbers1 = 0L;
        String groupField = "";
        if (member != null) {
            numbers1 = member.getNumbers();
            groupField = member.getGroupField();
        }
        Long numbers2 = 0L;
        if (t1 != null) {
            numbers2 = t1.getNumbers();
            groupField = t1.getGroupField();
        }
        if (StringUtils.isNotBlank(groupField)) {
            Member member1 = new Member();
            member1.setGroupField(groupField);
            member1.setNumbers(numbers1 + numbers2);
            return member1;
        }
        return null;
    }
}

一個MemberAnalySink實現了SinkFunction接口的存儲類

public class MemberAnalySink implements SinkFunction<Member> {
    private ClickUntil clickUntil = ClickUntilFactory.createClickUntil();

    @Override
    public void invoke(Member value, Context context) throws Exception {
        if (value != null) {
            String groupField = value.getGroupField();
            String[] groupFields = groupField.split("==");
            String timeinfo = groupFields[1];
            String memberLabel = groupFields[2];
            Long numbers = value.getNumbers();
            String tablename = "memberlabel_info";
            Map<String, String> dataMap = new HashMap<>();
            dataMap.put("timeinfo",timeinfo);
            dataMap.put("memberlabel",memberLabel);
            dataMap.put("numbers",numbers + "");
            Set<String> fields = new HashSet<>();
            fields.add("timeinfo");
            fields.add("numbers");
            clickUntil.saveData(tablename,dataMap,fields);
        }
    }
}

然後是Flink流處理

public class MemberAnaly {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        Properties properties = new Properties();
        properties.setProperty("bootstrap.servers","127.0.0.1:9092");
        properties.setProperty("group.id","portrait");
        FlinkKafkaConsumer<String> myConsumer = new FlinkKafkaConsumer<>("user_info",
                new SimpleStringSchema(),properties);
        DataStreamSource<String> data = env.addSource(myConsumer);
        env.enableCheckpointing(5000);
        DataStream<Member> map = data.map(new MemberAnalyMap());
        DataStream<Member> reduce = map.keyBy(Member::getGroupField).timeWindowAll(Time.minutes(5))
                .reduce(new MemberAnalyReduce());
        reduce.addSink(new MemberAnalySink());
        env.execute("portrait member");
    }
}

用戶畫像行爲特徵

這裏我們會分析用戶的幾個行爲,並進行畫像

  1. 瀏覽商品行爲:頻道id、商品id、商品類別id、瀏覽時間、停留時間、用戶id、終端類別(1、PC端,2、微信小程序,3、app)、deviceId。
  2. 收藏商品行爲:頻道id、商品id、商品類別id、操作時間、操作類型(收藏,取消)、用戶id、終端類別(1、PC端,2、微信小程序,3、app)
  3. 購物車行爲:頻道id、商品id、商品類別id、操作時間、操作類型(加入,取消)、用戶id、終端類別(1、PC端,2、微信小程序,3、app)
  4. 關注商品行爲:頻道id、商品id、商品類別id、操作時間、操作類型(關注,取消)、用戶id、終端類別(1、PC端,2、微信小程序,3、app)

定義四種行爲的實體類

/**
 * 瀏覽操作
 */
@Data
public class ScanOpertor {
    /**
     * 頻道id
     */
    private Long channelId;
    /**
     * 商品類型id
     */
    private Long productTypeId;
    /**
     * 商品id
     */
    private Long productId;
    /**
     * 瀏覽時間
     */
    private Long scanTime;
    /**
     * 停留時間
     */
    private Long stayTime;
    /**
     * 用戶id
     */
    private Long userId;
    /**
     * 終端類別
     */
    private Integer deviceType;
    /**
     * 終端id
     */
    private String deviceId;
}
/**
 * 收藏操作
 */
@Data
public class CollectOpertor {
    /**
     * 頻道id
     */
    private Long channelId;
    /**
     * 商品類型id
     */
    private Long productTypeId;
    /**
     * 商品id
     */
    private Long productId;
    /**
     * 操作時間
     */
    private Long opertorTime;
    /**
     * 操作類型
     */
    private Integer opertorType;
    /**
     * 用戶id
     */
    private Long userId;
    /**
     * 終端類別
     */
    private Integer deviceType;
    /**
     * 終端id
     */
    private String deviceId;
}
/**
 * 購物車操作
 */
@Data
public class CartOpertor {
    /**
     * 頻道id
     */
    private Long channelId;
    /**
     * 商品類型id
     */
    private Long productTypeId;
    /**
     * 商品id
     */
    private Long productId;
    /**
     * 操作時間
     */
    private Long opertorTime;
    /**
     * 操作類型
     */
    private Integer opertorType;
    /**
     * 用戶id
     */
    private Long userId;
    /**
     * 終端類別
     */
    private Integer deviceType;
    /**
     * 終端id
     */
    private String deviceId;
}
/**
 * 關注操作
 */
@Data
public class AttentionOpertor {
    /**
     * 頻道id
     */
    private Long channelId;
    /**
     * 商品類型id
     */
    private Long productTypeId;
    /**
     * 商品id
     */
    private Long productId;
    /**
     * 操作時間
     */
    private Long opertorTime;
    /**
     * 操作類型
     */
    private Integer opertorType;
    /**
     * 用戶id
     */
    private Long userId;
    /**
     * 終端類別
     */
    private Integer deviceType;
    /**
     * 終端id
     */
    private String deviceId;
}

在Kafka的bin目錄下執行

./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic scan
./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic collection
./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic cart
./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic attention

新建一個商品表

DROP TABLE IF EXISTS `product`;
CREATE TABLE `product` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `product_type_id` bigint(20) DEFAULT NULL,
  `product_name` varchar(255) DEFAULT NULL,
  `product_title` varchar(255) DEFAULT NULL,
  `product_price` decimal(28,10) DEFAULT NULL,
  `product_desc` varchar(255) DEFAULT NULL,
  `merchant_id` bigint(20) DEFAULT NULL,
  `create_time` datetime DEFAULT NULL,
  `update_time` datetime DEFAULT NULL,
  `product_place` varchar(255) DEFAULT NULL,
  `product_brand` varchar(255) DEFAULT NULL,
  `product_weight` decimal(28,10) DEFAULT NULL,
  `product_specification` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

對應實體類

@Data
public class Product {
    private Long id;
    private Long productTypeId;
    private String productName;
    private String productTitle;
    private BigDecimal productPrice;
    private String productDesc;
    private Long merchantId;
    private Date creteTime;
    private Date updateTime;
    private String productPlace;
    private String productBrand;
    private Double productWeight;
    private String productSpecification;
}

一個商品類型表

DROP TABLE IF EXISTS `product_type`;
CREATE TABLE `product_type` (
  `id` bigint(20) NOT NULL,
  `product_type_name` varchar(255) DEFAULT NULL,
  `product_type_desc` varchar(255) DEFAULT NULL,
  `product_type_parent_id` bigint(20) DEFAULT NULL,
  `product_type_level` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

對應實體類

@Data
public class ProductType {
    private Long id;
    private String productTypeName;
    private String productTypeDesc;
    private Long productTypeParentId;
    private Integer productTypeLevel;
}

一個訂單表

DROP TABLE IF EXISTS `order`;
CREATE TABLE `order` (
  `id` bigint(20) NOT NULL,
  `amount` decimal(28,10) DEFAULT NULL,
  `user_id` bigint(20) DEFAULT NULL,
  `product_id` bigint(20) DEFAULT NULL,
  `product_type_id` int(11) DEFAULT NULL,
  `merchant_id` bigint(20) DEFAULT NULL,
  `create_time` datetime DEFAULT NULL,
  `pay_time` datetime DEFAULT NULL,
  `pay_status` int(11) DEFAULT NULL COMMENT '0、未支付,1、已支付,2、已退款',
  `address` varchar(1000) DEFAULT NULL,
  `telphone` varchar(255) DEFAULT NULL,
  `username` varchar(255) DEFAULT NULL,
  `trade_number` varchar(255) DEFAULT NULL,
  `pay_type` int(255) DEFAULT NULL COMMENT '0、支付寶,1、銀聯,2、微信',
  `number` int(11) DEFAULT NULL,
  `order_status` int(255) DEFAULT NULL COMMENT '0、已提交,1、已支付,2、已取消,3、已刪除',
  `update_time` datetime DEFAULT NULL,
  `advister_id` bigint(20) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

對應實體類

@Data
public class Order {
    private Long id;
    private BigDecimal amount;
    private Long userId;
    private Long productId;
    private Long productTypeId;
    private Long merchantId;
    private Date createTime;
    private Date payTime;
    private Integer payStatus; //支付狀態,0未支付,1已支付,2已退款
    private String address;
    private String telphone;
    private String username;
    private String tradeNumber;
    private Integer payType;
    private Integer number;
    private Integer orderStatus;
    private Date updateTime;
    private Long advisterId;//廣告id
}

在HBase bin目錄下執行

./hbase shell
create "product","info"
create "product_type","info"
create "order","info"

在Kafka的bin目錄下執行

./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic product
./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic product_type
./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic order

由於有多個表,所以TranferAnaly修改如下

@Slf4j
public class TranferAnaly {

    @SuppressWarnings("unchecked")
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        Properties properties = new Properties();
        properties.setProperty("bootstrap.servers","127.0.0.1:9092");
        properties.setProperty("group.id","portrait");
        FlinkKafkaConsumer<String> myConsumer = new FlinkKafkaConsumer<>("test",
                new SimpleStringSchema(),properties);

        DataStreamSource<String> data = env.addSource(myConsumer);
        env.enableCheckpointing(5000);
        DataStream<JSONObject> dataJson = data.map(s -> JSONObject.parseObject(s))
                .filter(json -> json.getString("type").equals("INSERT"));
        DataStream<String> map = dataJson.map(jsonObject -> {
            String type = jsonObject.getString("type");
            String table = jsonObject.getString("table");
            String database = jsonObject.getString("database");
            String data1 = jsonObject.getString("data");
            JSONArray jsonArray = JSONObject.parseArray(data1);
            List<Map<String, String>> listdata = new ArrayList<>();
            for (int i = 0; i < jsonArray.size(); i++) {
                JSONObject jsonObject1 = jsonArray.getJSONObject(i);
                String tablename = table;
                String rowkey = jsonObject1.getString("id");
                String famliyname = "info";
                Map<String, String> datamap = JSONObject.parseObject(JSONObject.toJSONString(jsonObject1), Map.class);
                datamap.put("database", database);
                String typebefore = HbaseUtil.getdata(tablename, rowkey, famliyname, "typecurrent");
                datamap.put("typebefore", typebefore);
                datamap.put("typecurrent", type);
                datamap.put("tablename", table);
                HbaseUtil.put(tablename, rowkey, famliyname, datamap);
                listdata.add(datamap);
            }
            return JSONObject.toJSONString(listdata);
        });
        map.addSink(new SinkFunction<String>() {
            @Override
            public void invoke(String value, Context context) throws Exception {
                List<Map> data = JSONObject.parseArray(value,Map.class);
                for (Map<String,String> map : data) {
                    String tablename = map.get("tablename");
                    KafkaUtil.sendData(tablename,JSONObject.toJSONString(map));
                }
            }
        });
        env.execute("portrait tranfer");
    }
}

新建一個SpringBoot項目來進行業務數據收集

依賴

<properties>
   <java.version>1.8</java.version>
   <fastjson.version>1.2.74</fastjson.version>
</properties>
<dependencies>
   <dependency>
      <groupId>org.springframework.boot</groupId>
      <artifactId>spring-boot-starter-web</artifactId>
   </dependency>
   <dependency>
      <groupId>com.alibaba</groupId>
      <artifactId>fastjson</artifactId>
      <version>${fastjson.version}</version>
   </dependency>
   <dependency>
      <groupId>org.springframework.kafka</groupId>
      <artifactId>spring-kafka</artifactId>
   </dependency>
   <dependency>
      <groupId>org.projectlombok</groupId>
      <artifactId>lombok</artifactId>
      <optional>true</optional>
   </dependency>
   <dependency>
      <groupId>org.springframework.boot</groupId>
      <artifactId>spring-boot-starter-test</artifactId>
      <scope>test</scope>
   </dependency>
</dependencies>

配置文件

spring:
  kafka:
    bootstrap-servers: 127.0.0.1:9092
    producer:
      retries: 0
      batch-size: 16384
      buffer-memory: 33554432
      key-serializer: org.apache.kafka.common.serialization.StringSerializer
      value-serializer: org.apache.kafka.common.serialization.StringSerializer
      acks: -1
    consumer:
      group-id: portrait
      auto-offset-reset: earliest
      enable-auto-commit: false
      auto-commit-interval: 100
      key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
      value-deserializer: org.apache.kafka.common.serialization.StringDeserializer
      max-poll-records: 10
    listener:
      concurrency: 3
      type: batch
      ack-mode: manual

Kafka生產者

@Component
@Slf4j
public class KafkaProducer {
    @Autowired
    private KafkaTemplate<String,String> kafkaTemplate;

    @SuppressWarnings("unchecked")
    public void produce(String topic,String message) {
        try {
            ListenableFuture future = kafkaTemplate.send(topic, message);
            SuccessCallback<SendResult<String,String>> successCallback = new SuccessCallback<SendResult<String, String>>() {
                @Override
                public void onSuccess(@Nullable SendResult<String, String> result) {
                    log.info("發送消息成功");
                }
            };
            FailureCallback failureCallback = new FailureCallback() {
                @Override
                public void onFailure(Throwable ex) {
                    log.error("發送消息失敗",ex);
                    produce(topic,message);
                }
            };
            future.addCallback(successCallback,failureCallback);
        } catch (Exception e) {
            log.error("發送消息異常",e);
        }
    }
}

收集控制類

@RestController
public class DataController {
    @Autowired
    private KafkaProducer kafkaProducer;

    @PostMapping("/revicedata")
    public void reviceData(@RequestBody String data) {
        JSONObject jsonObject = JSONObject.parseObject(data);
        String type = jsonObject.getString("type");
        String topic = "";
        switch (type) {
            case "0":
                topic = "scan";
                break;
            case "1":
                topic = "collection";
                break;
            case "2":
                topic = "cart";
                break;
            case "3":
                topic = "attention";
                break;
            default:
                break;
        }
        kafkaProducer.produce(topic,data);
    }
}

其實這裏只是一個簡單的用戶行爲模擬,我們應該建立日誌微服務來收集所有的用戶行爲。具體可以參考AOP原理與自實現 ,可以根據這裏進行改造,將日誌類由

@Builder
@Data
@NoArgsConstructor
@AllArgsConstructor
public class Log implements Serializable {

   private static final long serialVersionUID = -5398795297842978376L;

   private Long id;
   private String username;
   /** 模塊 */
   private String module;
   /** 參數值 */
   private String params;
   private String remark;
   private Boolean flag;
   private Date createTime;
   private String ip;
   private String area;
}

替換成上面的各種操作類即可。當然還要做一些其他的修改,這裏就不去進行修改了。另外將RabbitMQ改成Kafka即可。

創建用戶畫像商品類別偏好標籤

創建一個商品類型標籤實體類

@Data
public class ProductTypeLabel {
    private Long userid;
    private String productTypeId;
    private Long numbers = 0L;
    private String groupField;
}

在DateUntil工具類中增加一個方法,獲取當前時間的小時數。

public static Long getCurrentHourStart(Long visitTime) throws ParseException {
    Date date = new Date(visitTime);
    DateFormat dateFormat = new SimpleDateFormat("yyyyMMdd HH");
    Date filterTime = dateFormat.parse(dateFormat.format(date));
    return filterTime.getTime();
}

創建一個ProductTypeAnalyMap實現了MapFunction接口的轉換類

public class ProductTypeAnalyMap implements MapFunction<String,ProductTypeLabel> {
    @Override
    public ProductTypeLabel map(String s) throws Exception {
        ScanOpertor scanOpertor = JSONObject.parseObject(s, ScanOpertor.class);
        Long userid = scanOpertor.getUserId();
        Long productTypeId = scanOpertor.getProductTypeId();
        String tablename = "user_info";
        String rowkey = userid + "";
        String famliyname = "info";
        String colum = "producttypelist";
        //獲取歷史用戶偏好商品類型
        String productTypeListString = HbaseUtil.getdata(tablename, rowkey, famliyname, colum);
        List<Map> temp = new ArrayList<>();
        List<Map<String,Long>> result = new ArrayList<>();
        if (StringUtils.isNotBlank(productTypeListString)) {
            temp = JSONObject.parseArray(productTypeListString,Map.class);
        }
        for (Map map : temp) {
            Long productTypeId1 = Long.parseLong(map.get("key").toString());
            Long value = Long.parseLong(map.get("value").toString());
            //如果新的商品類型與歷史商品類型有相同的類型,偏好值+1
            if (productTypeId.equals(productTypeId1)) {
                value++;
                map.put("value",value);
            }
            result.add(map);
        }
        Collections.sort(result,(o1,o2) -> {
            Long value1 = o1.get("value");
            Long value2 = o2.get("value");
            return value2.compareTo(value1);
        });
        if (result.size() > 5) {
            result = result.subList(0,5);
        }
        String data = JSONObject.toJSONString(result);
        HbaseUtil.putdata(tablename,rowkey,famliyname,colum,data);
        ProductTypeLabel productTypeLabel = new ProductTypeLabel();
        //格式:productType==timehour==productTypeId
        String groupField = "productType==" + DateUntil.getCurrentHourStart(System.currentTimeMillis())
                + "==" + productTypeId;
        productTypeLabel.setUserid(userid);
        productTypeLabel.setProductTypeId(productTypeId + "");
        productTypeLabel.setNumbers(1L);
        productTypeLabel.setGroupField(groupField);
        return productTypeLabel;
    }
}

一個ProductTypeAnalyReduce實現了ReduceFunction接口的統計類

public class ProductTypeAnalyReduce implements ReduceFunction<ProductTypeLabel> {
    @Override
    public ProductTypeLabel reduce(ProductTypeLabel productTypeLabel, ProductTypeLabel t1) throws Exception {
        Long numbers1 = 0L;
        String groupField = "";
        if (productTypeLabel != null) {
            numbers1 = productTypeLabel.getNumbers();
            groupField = productTypeLabel.getGroupField();
        }
        Long numbers2 = 0L;
        if (t1 != null) {
            numbers2 = t1.getNumbers();
            groupField = t1.getGroupField();
        }
        if (StringUtils.isNotBlank(groupField)) {
            ProductTypeLabel productTypeLabel1 = new ProductTypeLabel();
            productTypeLabel1.setGroupField(groupField);
            productTypeLabel1.setNumbers(numbers1 + numbers2);
            return productTypeLabel1;
        }
        return null;
    }
}

一個ProductTypeAnalySink實現了SinkFunction接口的存儲類

public class ProductTypeAnalySink implements SinkFunction<ProductTypeLabel> {
    private ClickUntil clickUntil = ClickUntilFactory.createClickUntil();

    @Override
    public void invoke(ProductTypeLabel value, Context context) throws Exception {
        if (value != null) {
            String groupField = value.getGroupField();
            String[] groupFields = groupField.split("==");
            String timeinfo = groupFields[1];
            String productTypeLabel = groupFields[2];
            Long numbers = value.getNumbers();
            String tablename = "producttypelabel_info";
            Map<String, String> dataMap = new HashMap<>();
            dataMap.put("timeinfo",timeinfo);
            dataMap.put("producttypelabel",productTypeLabel);
            dataMap.put("numbers",numbers + "");
            Set<String> fields = new HashSet<>();
            fields.add("timeinfo");
            fields.add("numbers");
            clickUntil.saveData(tablename,dataMap,fields);
        }
    }
}

然後是Flink的流處理

public class ProductTypeAnaly {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        Properties properties = new Properties();
        properties.setProperty("bootstrap.servers","127.0.0.1:9092");
        properties.setProperty("group.id","portrait");
        FlinkKafkaConsumer<String> myConsumer = new FlinkKafkaConsumer<>("scan",
                new SimpleStringSchema(),properties);
        DataStreamSource<String> data = env.addSource(myConsumer);
        env.enableCheckpointing(5000);
        DataStream<ProductTypeLabel> map = data.map(new ProductTypeAnalyMap());
        DataStream<ProductTypeLabel> reduce = map.keyBy(ProductTypeLabel::getGroupField).timeWindowAll(Time.hours(1))
                .reduce(new ProductTypeAnalyReduce());
        reduce.addSink(new ProductTypeAnalySink());
        env.execute("portrait scan");
    }
}

創建用戶畫像糾結商品標籤

創建一個糾結商品標籤實體類

@Data
public class TangleProduct {
    private Long userid;
    private String productId;
    private Long numbers = 0L;
    private String groupField;
}

一個TangleProductAnalyMap實現了MapFunction接口的轉換類

public class TangleProductAnalyMap implements MapFunction<String,TangleProduct> {
    @Override
    public TangleProduct map(String s) throws Exception {
        CartOpertor cartOpertor = JSONObject.parseObject(s, CartOpertor.class);
        Long userid = cartOpertor.getUserId();
        Long productId = cartOpertor.getProductId();
        String tablename = "user_info";
        String rowkey = userid + "";
        String famliyname = "info";
        String colum = "tangleproducts";
        //獲取歷史用戶糾結的商品
        String tangleProducts = HbaseUtil.getdata(tablename, rowkey, famliyname, colum);
        List<Map> temp = new ArrayList<>();
        List<Map<String,Long>> result = new ArrayList<>();
        if (StringUtils.isNotBlank(tangleProducts)) {
            temp = JSONObject.parseArray(tangleProducts,Map.class);
        }
        for (Map map : temp) {
            Long productId1 = Long.parseLong(map.get("key").toString());
            Long value = Long.parseLong(map.get("value").toString());
            //如果新的商品類型與歷史商品類型有相同的類型,偏好值+1
            if (productId.equals(productId1)) {
                value++;
                map.put("value",value);
            }
            result.add(map);
        }
        Collections.sort(result,(o1, o2) -> {
            Long value1 = o1.get("value");
            Long value2 = o2.get("value");
            return value2.compareTo(value1);
        });
        if (result.size() > 5) {
            result = result.subList(0,5);
        }
        String data = JSONObject.toJSONString(result);
        HbaseUtil.putdata(tablename,rowkey,famliyname,colum,data);
        TangleProduct tangleProduct = new TangleProduct();
        //格式:tangleProduct==timehour==productId
        String groupField = "tangleProduct==" + DateUntil.getCurrentHourStart(System.currentTimeMillis())
                + "==" + productId;
        tangleProduct.setUserid(userid);
        tangleProduct.setProductId(productId + "");
        tangleProduct.setNumbers(1L);
        tangleProduct.setGroupField(groupField);
        return tangleProduct;
    }
}

一個TangleProductAnalyReduct實現了ReduceFunction接口的統計類

public class TangleProductAnalyReduct implements ReduceFunction<TangleProduct> {
    @Override
    public TangleProduct reduce(TangleProduct tangleProduct, TangleProduct t1) throws Exception {
        Long numbers1 = 0L;
        String groupField = "";
        if (tangleProduct != null) {
            numbers1 = tangleProduct.getNumbers();
            groupField = tangleProduct.getGroupField();
        }
        Long numbers2 = 0L;
        if (t1 != null) {
            numbers2 = t1.getNumbers();
            groupField = t1.getGroupField();
        }
        if (StringUtils.isNotBlank(groupField)) {
            TangleProduct tangleProduct1 = new TangleProduct();
            tangleProduct1.setGroupField(groupField);
            tangleProduct1.setNumbers(numbers1 + numbers2);
            return tangleProduct1;
        }
        return null;
    }
}

一個TangleProductAnalySink實現了SinkFunction接口的存儲類

public class TangleProductAnalySink implements SinkFunction<TangleProduct> {
    private ClickUntil clickUntil = ClickUntilFactory.createClickUntil();

    @Override
    public void invoke(TangleProduct value, Context context) throws Exception {
        if (value != null) {
            String groupField = value.getGroupField();
            String[] groupFields = groupField.split("==");
            String timeinfo = groupFields[1];
            String tangleProductLabel = groupFields[2];
            Long numbers = value.getNumbers();
            String tablename = "tangleproductlabel_info";
            Map<String, String> dataMap = new HashMap<>();
            dataMap.put("timeinfo",timeinfo);
            dataMap.put("tangleproductlabel",tangleProductLabel);
            dataMap.put("numbers",numbers + "");
            Set<String> fields = new HashSet<>();
            fields.add("timeinfo");
            fields.add("numbers");
            clickUntil.saveData(tablename,dataMap,fields);
        }
    }
}

然後是Flink的流處理

public class TangleProductAnaly {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        Properties properties = new Properties();
        properties.setProperty("bootstrap.servers","127.0.0.1:9092");
        properties.setProperty("group.id","portrait");
        FlinkKafkaConsumer<String> myConsumer = new FlinkKafkaConsumer<>("cart",
                new SimpleStringSchema(),properties);
        DataStreamSource<String> data = env.addSource(myConsumer);
        env.enableCheckpointing(5000);
        DataStream<TangleProduct> map = data.map(new TangleProductAnalyMap());
        DataStream<TangleProduct> reduce = map.keyBy(TangleProduct::getGroupField).timeWindowAll(Time.hours(1))
                .reduce(new TangleProductAnalyReduct());
        reduce.addSink(new TangleProductAnalySink());
        env.execute("portrait cart");
    }
}

對用戶性別的預測,建立用戶畫像的性別標籤

要對用戶性別進行預測,其實是一個二分類的問題,我們可以使用任何一種分類算法(比如邏輯迴歸,樸素貝葉斯分類,GBDT,LightGBM等),但首先我們需要建立我們的訓練數據和測試數據。由於用戶填寫的性別信息可能比較隨意,所以不是所有的用戶性別都是正確的,所以這裏需要對用戶性別進行預測並對測試數據打上用戶畫像性別標籤。

這裏需要幾個指標來構建我們的特徵數據集,特徵如下

用戶id
訂單次數
訂單頻次
瀏覽男裝次數
瀏覽童裝次數
瀏覽老年人服裝次數
瀏覽女裝次數
訂單平均金額
瀏覽商品頻次

標籤數據集即爲

label  0、男,1、女

這裏某用戶的訂單次數,訂單頻次(平均每月次數),訂單平均金額都可以直接在數據庫中獲取統計,而瀏覽次數可以在HBase中獲取。將其拼接後存入文件train.csv中。這裏的數據要求性別是準確的,可以通過某些途徑來準確獲取真實性別。

在hadoop的bin目錄下執行

./hdfs dfs -put /Users/admin/Downloads/train.csv /

對於性別判定不準確的時候存入文件test.csv中,在test.csv中沒有label項。在hadoop的bin目錄下執行

./hdfs dfs -put /Users/admin/Downloads/test.csv /

創建一個性別標籤類

@Data
public class Sex {
    private Long userid;//用戶id
    private Long ordernums;//訂單次數
    private Long orderintenums;//訂單頻次
    private Long manClothes;//瀏覽男裝
    private Long chidrenClothes;//瀏覽童裝
    private Long oldClothes;//瀏覽老人裝
    private Long womenClothes;//瀏覽女裝
    private Double ordermountavg;//訂單平均金額
    private Long productscannums;//瀏覽商品頻次
    private Integer label;//0 男 1 女
    private String groupField;//
    private String sex;
    private Long numbers;
}

一個SexAnalyMap

public class SexAnalyMap implements MapFunction<Tuple10<Long, Long, Long, Long, Long, Long, Long, Double, Long, Integer>,Sex> {

    @Override
    public Sex map(Tuple10<Long, Long, Long, Long, Long, Long, Long, Double, Long, Integer> value) throws Exception {
        Random random = new Random();
        String groupField = "sex==" + random.nextInt(100);
        Sex sex = new Sex();
        sex.setUserid(value.getField(0));
        sex.setOrdernums(value.getField(1));
        sex.setOrderintenums(value.getField(2));
        sex.setManClothes(value.getField(3));
        sex.setChidrenClothes(value.getField(4));
        sex.setOldClothes(value.getField(5));
        sex.setWomenClothes(value.getField(6));
        sex.setOrdermountavg(value.getField(7));
        sex.setProductscannums(value.getField(8));
        sex.setLabel(value.getField(9));
        sex.setGroupField(groupField);
        return sex;
    }
}

DateUntil新增方法

public static Long getCurrentWeekStart(Long visitTime) {
    Calendar cal =Calendar.getInstance();
    if (visitTime != null) {
        cal.setTimeInMillis(visitTime);
    }
    cal.set(Calendar.DAY_OF_WEEK, Calendar.MONDAY);
    cal.set(Calendar.HOUR_OF_DAY, 0);
    cal.set(Calendar.MINUTE, 0);
    cal.set(Calendar.SECOND, 0);
    cal.set(Calendar.MILLISECOND, 0);
    return cal.getTimeInMillis();
}

一個SexSaveMap實現了MapFunction接口的轉換類,將預測出的測試數據集的性別標籤存儲到HBase中。這裏可以按星期查看性別的變化差異。

public class SexSaveMap implements MapFunction<Sex,Sex> {
    @Override
    public Sex map(Sex value) throws Exception {
        if (value.getLabel() == 0) {
            value.setSex("男");
        }else if (value.getLabel() == 1) {
            value.setSex("女");
        }
        String tablename = "user_info";
        String rowkey = value.getUserid() + "";
        String famliyname = "info";
        String colum = "sexlabel";
        HbaseUtil.putdata(tablename,rowkey,famliyname,colum,value.getSex());
        Long timeinfo = DateUntil.getCurrentWeekStart(System.currentTimeMillis());
        String groupField = "sexlabel==" + timeinfo + "==" + value.getSex();
        Long numbers = 1L;
        value.setGroupField(groupField);
        value.setNumbers(numbers);
        return value;
    }
}

一個SexAnalyReduct實現了ReduceFunction接口的統計類

public class SexAnalyReduct implements ReduceFunction<Sex> {
    @Override
    public Sex reduce(Sex value1, Sex value2) throws Exception {
        Long numbers1 = 0L;
        String groupField = "";
        if (value1 != null) {
            numbers1 = value1.getNumbers();
            groupField = value1.getGroupField();
        }
        Long numbers2 = 0L;
        if (value2 != null) {
            numbers2 = value2.getNumbers();
            groupField = value2.getGroupField();
        }
        if (StringUtils.isNotBlank(groupField)) {
            Sex sex = new Sex();
            sex.setGroupField(groupField);
            sex.setNumbers(numbers1 + numbers2);
            return sex;
        }
        return null;
    }
}

然後是Flink的批處理,這裏需要注意的是批處理是沒有Sink接口的。並且使用了Alink的邏輯迴歸來對測試數據集進行性別預測。

public class SexAnaly {
    private static ClickUntil clickUntil = ClickUntilFactory.createClickUntil();

    public static void main(String[] args) throws Exception {
        String filePath = "hdfs://127.0.0.1:9000/train.csv";
        ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
        DataSet<Tuple10<Long, Long, Long, Long, Long, Long, Long, Double, Long, Integer>> fileSourceTrain = env.readCsvFile(filePath).ignoreFirstLine()
                .types(Long.class, Long.class, Long.class, Long.class, Long.class, Long.class,
                        Long.class, Double.class, Long.class, Integer.class);
        DataSet<Sex> mapTrain = fileSourceTrain.map(new SexAnalyMap());
        List<Sex> sexes = mapTrain.collect();
        List<Row> df = sexes.stream().map(sex -> Row.of(sex.getUserid(), sex.getOrdernums(),
                sex.getOrderintenums(), sex.getManClothes(), sex.getChidrenClothes(),
                sex.getOldClothes(), sex.getWomenClothes(), sex.getOrdermountavg(),
                sex.getProductscannums(), sex.getLabel()))
                .collect(Collectors.toList());
        BatchOperator<?> input = new MemSourceBatchOp(df,"f0 long,f1 long,f2 long," +
                "f3 long,f4 long,f5 long,f6 long,f7 double,f8 long,f9 int");
        //對數據進行邏輯迴歸訓練
        BatchOperator<?> lr = new LogisticRegressionTrainBatchOp()
                .setFeatureCols("f0","f1","f2","f3","f4","f5","f6","f7","f8")
                .setLabelCol("f9");
        BatchOperator model = input.link(lr);
        String testFilePath = "hdfs://127.0.0.1:9000/test.csv";
        DataSet<Tuple9<Long, Long, Long, Long, Long, Long, Long, Double, Long>> fileSourceTest = env.readCsvFile(testFilePath).ignoreFirstLine()
                .types(Long.class, Long.class, Long.class, Long.class, Long.class, Long.class,
                Long.class, Double.class, Long.class);
        List<Sex> testSexes = fileSourceTest.map(new SexTestMap()).collect();
        List<Row> testDf = testSexes.stream().map(sex -> Row.of(sex.getUserid(), sex.getOrdernums(),
                sex.getOrderintenums(), sex.getManClothes(), sex.getChidrenClothes(),
                sex.getOldClothes(), sex.getWomenClothes(), sex.getOrdermountavg(),
                sex.getProductscannums()))
                .collect(Collectors.toList());
        BatchOperator<?> testInput = new MemSourceBatchOp(testDf,"f0 long,f1 long,f2 long," +
                "f3 long,f4 long,f5 long,f6 long,f7 double,f8 long");
        BatchOperator dataTest = testInput;
        BatchOperator <?> predictor = new LogisticRegressionPredictBatchOp().setPredictionCol("pred");
        //對測試數據進行預測
        List<Row> predicts = predictor.linkFrom(model, dataTest).collect();
        List<Sex> predictSexes = predicts.stream().map(row -> {
            Sex sex = new Sex();
            sex.setUserid((Long) row.getField(0));
            sex.setOrdernums((Long) row.getField(1));
            sex.setOrderintenums((Long) row.getField(2));
            sex.setManClothes((Long) row.getField(3));
            sex.setChidrenClothes((Long) row.getField(4));
            sex.setOldClothes((Long) row.getField(5));
            sex.setWomenClothes((Long) row.getField(6));
            sex.setOrdermountavg((Double) row.getField(7));
            sex.setProductscannums((Long) row.getField(8));
            sex.setLabel((Integer) row.getField(9));
            return sex;
        }).collect(Collectors.toList());
        DataSet<Sex> predictSource = env.fromCollection(predictSexes);
        DataSet<Sex> mapSave = predictSource.map(new SexSaveMap());
        DataSet<Sex> reduce = mapSave.groupBy(Sex::getGroupField).reduce(new SexAnalyReduct());
        List<Sex> saveList = reduce.collect();
        for (Sex sex : saveList) {
            String groupField = sex.getGroupField();
            String[] groupFields = groupField.split("==");
            String timeinfo = groupFields[1];
            String sexLabel = groupFields[2];
            Long numbers = sex.getNumbers();
            String tablename = "sexlabel_info";
            Map<String, String> dataMap = new HashMap<>();
            dataMap.put("timeinfo", timeinfo);
            dataMap.put("sexlabel", sexLabel);
            dataMap.put("numbers", numbers + "");
            Set<String> fields = new HashSet<>();
            fields.add("timeinfo");
            fields.add("numbers");
            clickUntil.saveData(tablename,dataMap,fields);
        }
    }
}

我的測試數據集中有3條數據,打開HBase,查看user_info的sexlabel列簇可以看到

scan 'user_info',{COLUMNS=>'info:sexlabel'}
ROW                            COLUMN+CELL                                                                             
 1                             column=info:sexlabel, timestamp=1636522706964, value=\xE7\x94\xB7                       
 2                             column=info:sexlabel, timestamp=1636522706954, value=\xE5\xA5\xB3                       
 3                             column=info:sexlabel, timestamp=1636522706966, value=\xE7\x94\xB7                       

ClickHouse Docker安裝部署

ClickHouse是對標Hadoop圈的Hive,是一個更加強大的數據倉庫。

Docker安裝ClickHouse

docker pull yandex/clickhouse-client
docker pull yandex/clickhouse-server
docker run -d --name ck-server -p 8123:8123 -p 9001:9000 -p 9009:9009 --ulimit nofile=262144:262144 -v /Users/admin/Downloads/clickhouse_database/:/var/lib/clickhouse yandex/clickhouse-server

進入ClickHouse

docker exec -it ck-server clickhouse-client

創建一個數據庫test

create database test ENGINE=Ordinary;
use test;

創建兩個表

create table testo2(id UInt16,col1 String,col2 String,create_date date)ENGINE=MergeTree(create_date,(id),8192);
create table test(id UInt16,name String,create_date Date)ENGINE=MergeTree(create_date,(id),8192);

這裏的引擎類型MergeTree要求有一個日期字段,還有主鍵,8192爲索引粒度,爲默認值。

現在在test表中插入四條數據

insert into test(id,name,create_date) values(1,'小白','2021-10-10');
insert into test(id,name,create_date) values(2,'小黃','2021-10-10');
insert into test(id,name,create_date) values(3,'小花','2021-10-10');
insert into test(id,name,create_date) values(4,'小王','2021-10-10');

查詢test表

select * from test;

結果

Query id: 15b1e461-5287-455b-a483-31dd3ed1ae84

┌─id─┬─name─┬─create_date─┐
│  1 │ 小白 │  2021-10-10 │
└────┴──────┴─────────────┘
┌─id─┬─name─┬─create_date─┐
│  3 │ 小花 │  2021-10-10 │
└────┴──────┴─────────────┘
┌─id─┬─name─┬─create_date─┐
│  2 │ 小黃 │  2021-10-10 │
└────┴──────┴─────────────┘
┌─id─┬─name─┬─create_date─┐
│  4 │ 小王 │  2021-10-10 │
└────┴──────┴─────────────┘

4 rows in set. Elapsed: 0.016 sec. 

退出ClickHouse新建文件夾

mkdir ck-config
cd ck-config/
docker cp ck-server:/etc/clickhouse-server/config.xml ./
vim config.xml

查找<listen_host>::</listen_host>,將其兩邊註釋取消,保存config.xml

關閉ck-server,重新啓動docker,以便於外網可以訪問ClickHouse。

docker stop ck-server
docker rm ck-server
docker run -d --name ck-server --ulimit nofile=262144:262144 -p 8123:8123 -p 9001:9000 -p 9009:9009 -v /Users/admin/Downloads/ck-config/config.xml:/etc/clickhouse-server/config.xml -v /Users/admin/Downloads/clickhouse_database/:/var/lib/clickhouse yandex/clickhouse-server

Java依賴

<dependency>
   <groupId>ru.yandex.clickhouse</groupId>
   <artifactId>clickhouse-jdbc</artifactId>
   <version>0.1.40</version>
</dependency>

寫一個測試類連接ClickHouse

public class ClickHouseTest {
    public static void main(String[] args) {
        String sql = "select create_date,count(1) as numbers from test where id != 1 group by create_date";
        exeSql(sql);
    }


    public static void exeSql(String sql){
        String address = "jdbc:clickhouse://127.0.0.1:8123/test";
        Connection connection = null;
        Statement statement = null;
        ResultSet results = null;
        try {
            Class.forName("ru.yandex.clickhouse.ClickHouseDriver");
            connection = DriverManager.getConnection(address);
            statement = connection.createStatement();
            long begin = System.currentTimeMillis();
            results = statement.executeQuery(sql);
            long end = System.currentTimeMillis();
            System.out.println("執行("+sql+")耗時:"+(end-begin)+"ms");
            ResultSetMetaData rsmd = results.getMetaData();
            List<Map> list = new ArrayList();
            while(results.next()){
                Map map = new HashMap();
                for(int i = 1;i<=rsmd.getColumnCount();i++){
                    map.put(rsmd.getColumnName(i),results.getString(rsmd.getColumnName(i)));
                }
                list.add(map);
            }
            for(Map map : list){
                System.err.println(map);
            }
        } catch (Exception e) {
            e.printStackTrace();
        }finally {//關閉連接
            try {
                if(results!=null){
                    results.close();
                }
                if(statement!=null){
                    statement.close();
                }
                if(connection!=null){
                    connection.close();
                }
            } catch (SQLException e) {
                e.printStackTrace();
            }
        }
    }
}

運行結果

{numbers=3, create_date=2021-10-10}
執行(select create_date,count(1) as numbers from test where id != 1 group by create_date)耗時:20ms

之前有一個接口ClickUntil,我們的實現類DefaultClickUntil沒有實現接口方法,現在我們用另一個實現類來取代

@Slf4j
public class ClickHouseUntil implements ClickUntil {
    private static ClickUntil instance = new ClickHouseUntil();

    private ClickHouseUntil() {
    }

    public static ClickUntil createInstance() {
        return instance;
    }

    @Override
    public void saveData(String tablename, Map<String, String> data,Set<String> fields) {
        String resultsql = "insert into ";
        resultsql += tablename +" (";
        String valuesql = "(";
        Set<Map.Entry<String,String>> sets =  data.entrySet();
        for(Map.Entry<String,String> map:sets){
            String fieldName = map.getKey();
            String valuestring = map.getValue();
            resultsql += fieldName + ",";
            if(fields.contains(fieldName)){
                valuesql += valuestring + ",";
            }else {
                valuesql += "'"+valuestring + "'" + ",";
            }

        }
        resultsql = resultsql.substring(0,resultsql.length() - 1) + ")";
        valuesql = valuesql.substring(0,valuesql.length() - 1) + ")";
        resultsql = resultsql + " values "+ valuesql;
        log.info(resultsql);
        try {
            Connection connection = getConnection("jdbc:clickhouse://127.0.0.1:8123/test","ru.yandex.clickhouse.ClickHouseDriver");
            Statement statement = connection.createStatement();
            statement.execute(resultsql);//執行sql語句
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    private Connection getConnection(String addressParam, String driverClassNameParam) throws Exception {
        String address = addressParam;
        Class.forName(driverClassNameParam);
        Connection connection  = DriverManager.getConnection(address);
        return connection;
    }

    @Override
    public ResultSet getQueryResult(String database, String sql) throws Exception {
        Connection connection = getConnection("jdbc:clickhouse://127.0.0.1:8123/" + database,"ru.yandex.clickhouse.ClickHouseDriver");
        Statement statement = connection.createStatement();
        ResultSet resultSet = statement.executeQuery(sql);
        return resultSet;
    }
}

修改工廠類

public class ClickUntilFactory {
    public static ClickUntil createClickUntil() {
        return ClickHouseUntil.createInstance();
    }
}

運行結果,在clickhouse裏查詢

select * from test;

SELECT *
FROM test

Query id: 120a64ed-e2b9-4906-9cfd-e04bba081813

┌─id─┬─name─┬─create_date─┐
│  1 │ 小白 │  2021-10-10 │
│  2 │ 小黃 │  2021-10-10 │
│  3 │ 小花 │  2021-10-10 │
│  4 │ 小王 │  2021-10-10 │
└────┴──────┴─────────────┘
┌──id─┬─name────┬─create_date─┐
│ 111 │ xiaobai │  2018-09-07 │
└─────┴─────────┴─────────────┘

5 rows in set. Elapsed: 0.014 sec. 

創建用戶畫像營銷敏感度標籤

創建一個廣告操作實體類

/**
 * 廣告
 */
@Data
public class AdvisterOpertor {
    private Long advisterId; //廣告id
    private Long productId; //商品id
    private Long clickTime; //點擊時間
    private Long publishTime; //發佈時間
    private Long stayTime; //停留時間
    private Long userId; //用戶id
    private Integer deviceType; //終端類型(0,PC,1,微信小程序,2,app)
    private String deviceId; //終端id
    private Integer advisterType; //廣告類型(0動畫,1純文字,2視頻,3文字加動畫)
    private Integer isStar; //是否有明星(0沒有,1有)
}

一個市場敏感度實體類

/**
 * 市場敏感度
 */
@Data
public class MarketSensitivity {
    private Long userId; //用戶id
    private Long advisterId; //廣告id
    private Integer advisterType; //廣告類型
    private String advisterTypeName; //廣告類型名稱
    private Integer orderNums; //訂單數量
    private Integer adviserNums; //廣告點擊數
    private String groupField;
    private Long timeInfo;
    private String sensitivityFlag; //營銷敏感度標籤
    private Long advisterTypeNums; //相同廣告類型數量
}

在kafka bin目錄下執行

./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic adviser

修改一下之前的SpringBoot項目中的控制類來對廣告日誌進行模擬。

@RestController
public class DataController {
    @Autowired
    private KafkaProducer kafkaProducer;

    @PostMapping("/revicedata")
    public void reviceData(@RequestBody String data) {
        JSONObject jsonObject = JSONObject.parseObject(data);
        String type = jsonObject.getString("type");
        String topic = "";
        switch (type) {
            case "0":
                topic = "scan";
                break;
            case "1":
                topic = "collection";
                break;
            case "2":
                topic = "cart";
                break;
            case "3":
                topic = "attention";
                break;
            default:
                topic = "adviser";
                break;
        }
        kafkaProducer.produce(topic,data);
    }
}

一個MarketSensitivityAnalyMap實現了MapFunction接口的轉換類。

public class MarketSensitivityAnalyMap implements MapFunction<JSONObject,MarketSensitivity> {
    @Override
    public MarketSensitivity map(JSONObject value) throws Exception {
        String adviser = value.getString("adviser");
        String orderStr = value.getString("order");
        AdvisterOpertor advisterOpertor = JSONObject.parseObject(adviser,AdvisterOpertor.class);
        Long userId = advisterOpertor.getUserId();
        Long advisterId = advisterOpertor.getAdvisterId();
        Integer advisterType = advisterOpertor.getAdvisterType();
        Order order = JSONObject.parseObject(orderStr,Order.class);
        Integer orderNums = 0;
        if (order != null) {
            orderNums = 1;
        }
        Integer adviserNums = 1;
        Long timeInfo = DateUntil.getCurrentHourStart(System.currentTimeMillis());
        MarketSensitivity marketSensitivity = new MarketSensitivity();
        marketSensitivity.setAdviserNums(adviserNums);
        marketSensitivity.setOrderNums(orderNums);
        marketSensitivity.setUserId(userId);
        marketSensitivity.setAdvisterId(advisterId);
        marketSensitivity.setTimeInfo(timeInfo);
        String fieldGroup = "MarketSensitivity==" + timeInfo + "==" + userId + "==" + advisterId;
        marketSensitivity.setGroupField(fieldGroup);
        marketSensitivity.setAdvisterType(advisterType);
        return marketSensitivity;
    }
}

一個MarketSensitivityAnalyReduce實現了ReduceFunction的統計類

public class MarketSensitivityAnalyReduce implements ReduceFunction<MarketSensitivity> {
    @Override
    public MarketSensitivity reduce(MarketSensitivity value1, MarketSensitivity value2) throws Exception {
        Long userId = value1.getUserId();
        String groupField = value1.getGroupField();
        Long advisterId = value1.getAdvisterId();
        Long timeInfo = value1.getTimeInfo();
        Integer advisterType = value1.getAdvisterType();
        Integer advisterNums1 = value1.getAdviserNums();
        Integer orderNums1 = value1.getOrderNums();

        Integer advisterNums2 = value2.getAdviserNums();
        Integer orderNums2 = value2.getOrderNums();

        MarketSensitivity marketSensitivity = new MarketSensitivity();
        marketSensitivity.setUserId(userId);
        marketSensitivity.setGroupField(groupField);
        marketSensitivity.setAdvisterId(advisterId);
        marketSensitivity.setTimeInfo(timeInfo);
        marketSensitivity.setAdviserNums(advisterNums1 + advisterNums2);
        marketSensitivity.setOrderNums(orderNums1 + orderNums2);
        marketSensitivity.setAdvisterType(advisterType);

        return marketSensitivity;
    }
}

一個MarketSensitivityAnalySink實現了SinkFunction接口的存儲類

public class MarketSensitivityAnalySink implements SinkFunction<MarketSensitivity> {
    private ClickUntil clickUntil = ClickUntilFactory.createClickUntil();

    @Override
    public void invoke(MarketSensitivity value, Context context) throws Exception {
        if (value != null) {
            String groupField = value.getGroupField();
            String[] groupFields = groupField.split("==");
            String timeinfo = groupFields[1];
            String userId = groupFields[2];
            String advisterId = groupFields[3];
            Integer advisterNums = value.getAdviserNums();
            Integer orderNums = value.getOrderNums();
            String sensitivityFlag = "";
            if (advisterNums <= 2 && orderNums == 0) {
                //廣告點擊次數小於2次,且沒有下訂單爲對該用戶對廣告不敏感
                sensitivityFlag = "不敏感";
            }else if ((advisterNums > 5 && orderNums == 0)
                    || (advisterNums > 1 && orderNums == 1)) {
                //廣告點擊次數大於5次沒有下訂單或者廣告點擊大於1次下了一個訂單爲
                //該用戶對廣告敏感度一般
                sensitivityFlag = "一般";
            }else if ((advisterNums > 1 && orderNums > 1)
                    || (advisterNums > 5 && orderNums == 1)) {
                //廣告點擊次數大於1次且下了多個訂單或者廣告點擊次數大於5次且下了一個訂單
                //該用戶對廣告非常敏感
                sensitivityFlag = "非常敏感";
            }
            String tablename = "userAdvSensitivity_info";
            Map<String,String> dataMap = new HashMap<>();
            dataMap.put("userId",userId);
            dataMap.put("advisterId",advisterId);
            dataMap.put("advisterNums",advisterNums + "");
            dataMap.put("orderNums",orderNums + "");
            dataMap.put("sensitivityFlag",sensitivityFlag);
            Set<String> fields = new HashSet<>();
            fields.add("userId");
            fields.add("advisterId");
            fields.add("advisterNums");
            fields.add("orderNums");
            clickUntil.saveData(tablename,dataMap,fields);
        }
    }
}

然後是Flink的流處理,這裏使用的是廣告流和訂單流的雙流匯聚,對用戶對市場廣告敏感度進行畫像。

/**
 * 市場廣告營銷敏感度
 */
public class MarketSensitivityAnaly {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        Properties properties = new Properties();
        properties.setProperty("bootstrap.servers","127.0.0.1:9092");
        properties.setProperty("group.id","portrait");
        FlinkKafkaConsumer<String> myConsumerAdv = new FlinkKafkaConsumer<>("adviser",
                new SimpleStringSchema(),properties);
        DataStreamSource<String> dataAdv = env.addSource(myConsumerAdv);
        env.enableCheckpointing(5000);

        FlinkKafkaConsumer<String> myConsumerOrder = new FlinkKafkaConsumer<>("order",
                new SimpleStringSchema(),properties);
        DataStreamSource<String> dataOrder = env.addSource(myConsumerOrder);
        //廣告點擊流和訂單流雙流匯聚
        DataStream<JSONObject> dataJoin = dataAdv.join(dataOrder).where(new KeySelector<String, String>() {
            @Override
            public String getKey(String value) throws Exception {
                AdvisterOpertor advisterOpertor = JSONObject.parseObject(value, AdvisterOpertor.class);
                Long advisterId = advisterOpertor.getAdvisterId();
                return advisterOpertor.getUserId() + "==" + advisterId;
            }
        }).equalTo(new KeySelector<String, String>() {
            @Override
            public String getKey(String value) throws Exception {
                Order order = JSONObject.parseObject(value, Order.class);
                Long advisterId = order.getAdvisterId();
                return order.getUserId() + "==" + advisterId;
            }
        }).window(TumblingEventTimeWindows.of(Time.hours(1)))
                .apply(new JoinFunction<String, String, JSONObject>() {
                    @Override
                    public JSONObject join(String first, String second) throws Exception {
                        JSONObject jsonObject = new JSONObject();
                        jsonObject.put("adviser", first);
                        jsonObject.put("order", second);
                        return jsonObject;
                    }
                });
        DataStream<MarketSensitivity> map = dataJoin.map(new MarketSensitivityAnalyMap());
        DataStream<MarketSensitivity> reduct = map.keyBy(MarketSensitivity::getGroupField)
                .timeWindowAll(Time.hours(1))
                .reduce(new MarketSensitivityAnalyReduce());
        reduct.addSink(new MarketSensitivityAnalySink());
        env.execute("portrait market sensitivity");
    }
}

創建用戶畫像廣告類型營銷敏感度標籤

創建一個AdvisterTypeMarketSensitivityAnalyMap實現了MapFunction接口的轉換類

public class AdvisterTypeMarketSensitivityAnalyMap implements MapFunction<MarketSensitivity,MarketSensitivity> {

    @Override
    public MarketSensitivity map(MarketSensitivity value) throws Exception {
        Long timeInfo = value.getTimeInfo();
        Integer advisterType = value.getAdvisterType();
        Integer advisterNums = value.getAdviserNums();
        Integer orderNums = value.getOrderNums();
        String sensitivityFlag = "";
        if (advisterNums <= 2 && orderNums == 0) {
            //廣告點擊次數小於2次,且沒有下訂單爲對該用戶對廣告不敏感
            sensitivityFlag = "不敏感";
        }else if ((advisterNums > 5 && orderNums == 0)
                || (advisterNums > 1 && orderNums == 1)) {
            //廣告點擊次數大於5次沒有下訂單或者廣告點擊大於1次下了一個訂單爲
            //該用戶對廣告敏感度一般
            sensitivityFlag = "一般";
        }else if ((advisterNums > 1 && orderNums > 1)
                || (advisterNums > 5 && orderNums == 1)) {
            //廣告點擊次數大於1次且下了多個訂單或者廣告點擊次數大於5次且下了一個訂單
            //該用戶對廣告非常敏感
            sensitivityFlag = "非常敏感";
        }
        String advisterTypeName = "";
        switch (advisterType) {
            case 0:
                advisterTypeName = "動畫";
                break;
            case 1:
                advisterTypeName = "純文字";
                break;
            case 2:
                advisterTypeName = "視頻";
                break;
            case 3:
                advisterTypeName = "文字加動畫";
                break;
            default:
                break;
        }
        String groupField = "advisterType==" + timeInfo + "==" + advisterType
                + "==" + sensitivityFlag;
        Long advisterTypeNums = 1L;
        MarketSensitivity marketSensitivity = new MarketSensitivity();
        marketSensitivity.setGroupField(groupField);
        marketSensitivity.setAdvisterTypeName(advisterTypeName);
        marketSensitivity.setTimeInfo(timeInfo);
        marketSensitivity.setAdvisterTypeNums(advisterTypeNums);
        marketSensitivity.setSensitivityFlag(sensitivityFlag);

        return marketSensitivity;
    }
}

一個AdvisterTypeMarketSensitivityAnalyReduce實現了ReduceFunction接口的統計類

public class AdvisterTypeMarketSensitivityAnalyReduce implements ReduceFunction<MarketSensitivity> {
    @Override
    public MarketSensitivity reduce(MarketSensitivity value1, MarketSensitivity value2) throws Exception {
        String advisterTypeName = value1.getAdvisterTypeName();
        Long timeInfo = value1.getTimeInfo();
        String sensitivityFlag = value1.getSensitivityFlag();
        String groupField = value1.getGroupField();

        Long advisterTypeNums1 = value1.getAdvisterTypeNums();
        Long advisterTypeNums2 = value2.getAdvisterTypeNums();

        MarketSensitivity marketSensitivity = new MarketSensitivity();
        marketSensitivity.setAdvisterTypeName(advisterTypeName);
        marketSensitivity.setTimeInfo(timeInfo);
        marketSensitivity.setGroupField(groupField);
        marketSensitivity.setSensitivityFlag(sensitivityFlag);
        marketSensitivity.setAdvisterTypeNums(advisterTypeNums1 + advisterTypeNums2);
        return marketSensitivity;
    }
}

一個AdvisterTypeMarketSensitivityAnalySink實現了SinkFunction的標籤存儲類

public class AdvisterTypeMarketSensitivityAnalySink implements SinkFunction<MarketSensitivity> {
    private ClickUntil clickUntil = ClickUntilFactory.createClickUntil();

    @Override
    public void invoke(MarketSensitivity value, Context context) throws Exception {
        if (value != null) {
            String advisterTypeName = value.getAdvisterTypeName();
            Long timeInfo = value.getTimeInfo();
            String sensitivityFlag = value.getSensitivityFlag();
            Long advisterTypeNums = value.getAdvisterTypeNums();

            String tablename = "advistertype_info";
            Map<String, String> dataMap = new HashMap<>();
            dataMap.put("advistertypename",advisterTypeName);
            dataMap.put("timeinfo",timeInfo + "");
            dataMap.put("sensitivityflag",sensitivityFlag);
            dataMap.put("advistertypenums",advisterTypeNums + "");
            Set<String> fields = new HashSet<>();
            fields.add("timeinfo");
            fields.add("advistertypenums");
            clickUntil.saveData(tablename,dataMap,fields);
        }
    }
}

然後修改MarketSensitivityAnaly的Flink流處理

/**
 * 市場廣告營銷敏感度
 */
public class MarketSensitivityAnaly {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        Properties properties = new Properties();
        properties.setProperty("bootstrap.servers","127.0.0.1:9092");
        properties.setProperty("group.id","portrait");
        FlinkKafkaConsumer<String> myConsumerAdv = new FlinkKafkaConsumer<>("adviser",
                new SimpleStringSchema(),properties);
        DataStreamSource<String> dataAdv = env.addSource(myConsumerAdv);
        env.enableCheckpointing(5000);

        FlinkKafkaConsumer<String> myConsumerOrder = new FlinkKafkaConsumer<>("order",
                new SimpleStringSchema(),properties);
        DataStreamSource<String> dataOrder = env.addSource(myConsumerOrder);
        //廣告點擊流和訂單流雙流匯聚
        DataStream<JSONObject> dataJoin = dataAdv.join(dataOrder).where(new KeySelector<String, String>() {
            @Override
            public String getKey(String value) throws Exception {
                AdvisterOpertor advisterOpertor = JSONObject.parseObject(value, AdvisterOpertor.class);
                Long advisterId = advisterOpertor.getAdvisterId();
                return advisterOpertor.getUserId() + "==" + advisterId;
            }
        }).equalTo(new KeySelector<String, String>() {
            @Override
            public String getKey(String value) throws Exception {
                Order order = JSONObject.parseObject(value, Order.class);
                Long advisterId = order.getAdvisterId();
                return order.getUserId() + "==" + advisterId;
            }
        }).window(TumblingEventTimeWindows.of(Time.hours(1)))
                .apply(new JoinFunction<String, String, JSONObject>() {
                    @Override
                    public JSONObject join(String first, String second) throws Exception {
                        JSONObject jsonObject = new JSONObject();
                        jsonObject.put("adviser", first);
                        jsonObject.put("order", second);
                        return jsonObject;
                    }
                });
        DataStream<MarketSensitivity> map = dataJoin.map(new MarketSensitivityAnalyMap());
        DataStream<MarketSensitivity> reduct = map.keyBy(MarketSensitivity::getGroupField)
                .timeWindowAll(Time.hours(1))
                .reduce(new MarketSensitivityAnalyReduce());
        reduct.addSink(new MarketSensitivityAnalySink());
        
        DataStream<MarketSensitivity> advisterTypeMap = reduct.map(new AdvisterTypeMarketSensitivityAnalyMap());
        DataStream<MarketSensitivity> advisterTypeReduce = advisterTypeMap.keyBy(MarketSensitivity::getGroupField)
                .timeWindowAll(Time.hours(1))
                .reduce(new AdvisterTypeMarketSensitivityAnalyReduce());
        advisterTypeReduce.addSink(new AdvisterTypeMarketSensitivityAnalySink());

        env.execute("portrait market sensitivity");
    }
}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章