通過canal將mysql中數據同步到elasticsearch

 環境:

mysql 5.7,elasticsearch 7.4.2,canal.deployer-1.1.5

這裏我要做的是通過canal將MySQL中修改的數據同步到elasticsearch當中。

一、MySQL配置

1.1 修改MySQL的配置文件

[root@localhost local]# vim /etc/my.cnf
[root@localhost local]# systemctl restart mysqld

my.cnf: (新增部分)附:MySQL官方文檔

#開啓日誌
log_bin = mysql‐bin
#設置服務id
server_id = 1
#不記錄每條sql語句的上下文信息,僅需記錄哪條數據被修改了,修改成什麼樣了
binlog_format = ROW

修改完配置文件,需要重啓MySQL,如果啓動失敗,則可以使用如下命令查看:

 cat /var/log/mysqld.log

1.2 查看log_bin是否成功開啓 : 

[root@localhost local]# mysql -uroot -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.7.30-log MySQL Community Server (GPL)

Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show variables like '%log_bin%';
+---------------------------------+----------------------------------+
| Variable_name                   | Value                            |
+---------------------------------+----------------------------------+
| log_bin                         | ON                               |
| log_bin_basename                | /var/lib/mysql/mysql‐bin         |
| log_bin_index                   | /var/lib/mysql/mysql‐bin.index   |
| log_bin_trust_function_creators | OFF                              |
| log_bin_use_v1_row_events       | OFF                              |
| sql_log_bin                     | ON                               |
+---------------------------------+----------------------------------+
6 rows in set (0.03 sec)

1.3 創建canal賬號,並賦予權限 

mysql> grant select,replication slave,replication client on *.* to 'canal'@'%' identified by 'canal';
Query OK, 0 rows affected, 1 warning (0.01 sec)

mysql> flush privileges;
Query OK, 0 rows affected (0.01 sec)

 二、Canal 服務端配置

附:canal 官方快速入門

[root@localhost canal]# ll
總用量 8
drwxr-xr-x. 2 root root   76 5月  23 20:28 bin
drwxr-xr-x. 5 root root  123 5月  23 20:28 conf
drwxr-xr-x. 2 root root 4096 5月  23 20:28 lib
drwxrwxrwx. 2 root root    6 10月  9 2019 logs
[root@localhost canal]# cd conf/
[root@localhost conf]# ll
總用量 16
-rwxrwxrwx. 1 root root  291 8月  31 2019 canal_local.properties
-rwxrwxrwx. 1 root root 5259 9月  30 2019 canal.properties
drwxrwxrwx. 2 root root   33 5月  23 20:28 example
-rwxrwxrwx. 1 root root 3262 9月  16 2019 logback.xml
drwxrwxrwx. 2 root root   39 5月  23 20:28 metrics
drwxrwxrwx. 3 root root  149 5月  23 20:28 spring
[root@localhost conf]# cd example/
[root@localhost example]# ll
總用量 4
-rwxrwxrwx. 1 root root 2036 9月  30 2019 instance.properties
[root@localhost example]# vi instance.properties
[root@localhost example]# cd ..
[root@localhost conf]# cd ..
[root@localhost canal]# cd bin
[root@localhost bin]# ls
restart.sh  startup.bat  startup.sh  stop.sh
[root@localhost bin]# ./startup.sh
.。。。。。。。(省略)
cd to /usr/local/canal/bin for continue
【####################以下爲了查看canal是否啓動成功(選其中一種即可)#########################】
[root@localhost bin]# ps -ef | grep canal
.。。。。。。。(省略)
[root@localhost bin]# netstat -an | grep 11111
tcp        0      0 0.0.0.0:11111           0.0.0.0:*               LISTEN
.。。。。。。。(省略)
[root@localhost canal]# cd logs/
[root@localhost logs]# ls
canal  example
[root@localhost logs]# cd example/
[root@localhost example]# ll
總用量 192
-rw-r--r--. 1 root root 90509 5月  23 20:41 example.log
[root@localhost example]# tail -f example.log
.。。。。。。。(省略)
2020-05-23 20:41:39.517 [main] INFO  c.a.otter.canal.instance.core.AbstractCanalInstance - start successful....
2020-05-23 20:41:39.751 [destination = example , address = /127.0.0.1:3306 , EventParser] WARN  c.a.o.c.p.inbound.mysql.rds.RdsBinlogEventParserProxy - ---> begin to find start position, it will be long time for reset or first position
2020-05-23 20:41:39.751 [destination = example , address = /127.0.0.1:3306 , EventParser] WARN  c.a.o.c.p.inbound.mysql.rds.RdsBinlogEventParserProxy - prepare to find start position just show master status
[root@localhost canal]# cat /usr/local/canal/logs/canal/canal.log
2020-05-23 20:41:38.118 [main] INFO  com.alibaba.otter.canal.deployer.CanalController - ## start the canal server[172.17.0.1(172.17.0.1):11111]
2020-05-23 20:41:39.646 [main] INFO  com.alibaba.otter.canal.deployer.CanalStarter - ## the canal server is running now ......
2020-05-23 20:48:44.964 [canal-instance-scan-0] INFO  com.alibaba.otter.canal.deployer.CanalController - auto notify stop example successful.
2020-05-23 20:48:45.957 [canal-instance-scan-0] INFO  com.alibaba.otter.canal.deployer.CanalController - auto notify start example successful.
2020-05-23 20:48:45.957 [canal-instance-scan-0] INFO  com.alibaba.otter.canal.deployer.CanalController - auto notify reload example successful.

 instance.properties:(只展示部分)

#################################################
## mysql serverId , v1.0.26+ will autoGen
canal.instance.mysql.slaveId=2

# username/password
canal.instance.dbUsername=canal
canal.instance.dbPassword=canal

三、Canal Java客戶端

3.1 pom.xml

         <dependency>
            <groupId>com.baomidou</groupId>
            <artifactId>mybatis-plus-boot-starter</artifactId>
            <version>3.3.1</version>
        </dependency>
        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
        </dependency>
        <dependency>
            <groupId>com.alibaba.otter</groupId>
            <artifactId>canal.client</artifactId>
            <version>1.1.4</version>
        </dependency>
        <dependency>
            <groupId>com.alibaba.otter</groupId>
            <artifactId>canal.common</artifactId>
            <version>1.1.4</version>
        </dependency>
        <dependency>
            <groupId>com.alibaba.otter</groupId>
            <artifactId>canal.protocol</artifactId>
            <version>1.1.4</version>
        </dependency>
        <dependency>
            <groupId>org.elasticsearch.client</groupId>
            <artifactId>elasticsearch-rest-high-level-client</artifactId>
            <version>7.4.2</version>
        </dependency>

 3.2 canal java客戶端連接canal服務端的配置

package com.lucifer.dianping.canal;

import com.alibaba.otter.canal.client.CanalConnector;
import com.alibaba.otter.canal.client.CanalConnectors;
import com.google.common.collect.Lists;
import org.springframework.beans.factory.DisposableBean;
import org.springframework.context.annotation.Bean;
import org.springframework.stereotype.Component;

import java.net.InetSocketAddress;

/**
 * author: lucifer
 * date: 2020/5/23 21:16
 * description: canal客戶端連接canal服務端配置
 */
@Component
public class CanalClient implements DisposableBean {

    private CanalConnector canalConnector;

    @Bean
    public CanalConnector getCanalConnector() {
        canalConnector = CanalConnectors.newClusterConnector(Lists.newArrayList(
                new InetSocketAddress("192.168.24.133", 11111)),
                "example", "canal", "canal"
        );
        canalConnector.connect();
        //指定filter,格式{database}.{table}
        canalConnector.subscribe();
        //回滾尋找上次中斷的爲止
        canalConnector.rollback();
        return canalConnector;
    }


    /**
     * 在spring容器銷燬的時候,需要斷開canal客戶端的連接
     * 防止canal連接的泄露
     *
     * @throws Exception
     */
    @Override
    public void destroy() throws Exception {
        if (canalConnector!=null){
            canalConnector.disconnect();
        }
    }
}

連接成功,如圖:

application.yml:

server:
  port: 8010
spring:
  datasource:
    driver-class-name: com.mysql.cj.jdbc.Driver
    url: jdbc:mysql://192.168.24.133:3306/dianping?autoReconnect=true&useUnicode=true&createDatabaseIfNotExist=true&characterEncoding=utf8&serverTimezone=UTC
    username: root
    password: 123456
    type: com.alibaba.druid.pool.DruidDataSource
  elasticsearch:
    rest:
      uris: 192.168.24.133:9200

四、整合進es

4.1 創建好es索引:

#創建es索引
PUT user

#查詢
GET /user/_search
{
  "query": {
    "match_all": {}
  }
}

4.2  通過canal將MySQL中數據插入es

package com.lucifer.dianping.canal;

import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.JSONObject;
import com.alibaba.fastjson.serializer.SerializerFeature;
import com.alibaba.otter.canal.client.CanalConnector;
import com.alibaba.otter.canal.protocol.CanalEntry;
import com.alibaba.otter.canal.protocol.Message;
import com.baomidou.mybatisplus.core.conditions.query.QueryWrapper;
import com.google.protobuf.InvalidProtocolBufferException;
import com.lucifer.dianping.mapper.UserMapper;
import com.lucifer.dianping.pojo.User;
import lombok.extern.slf4j.Slf4j;
import org.apache.commons.lang3.StringUtils;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.springframework.beans.BeansException;
import org.springframework.context.ApplicationContext;
import org.springframework.context.ApplicationContextAware;
import org.springframework.scheduling.annotation.Scheduled;
import org.springframework.stereotype.Component;

import javax.annotation.Resource;
import java.io.IOException;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

/**
 * author: lucifer
 * date: 2020/5/23 21:44
 * description: TODO
 */
@Slf4j
@Component
public class CanalScheduling implements Runnable, ApplicationContextAware {

    private ApplicationContext applicationContext;

    @Resource
    private UserMapper userMapper;

    @Resource
    private RestHighLevelClient restHighLevelClient;

    @Resource
    private CanalConnector canalConnector;

    @Scheduled(fixedDelay = 100) //每隔100秒執行
    @Override
    public void run() {
        long batchId = -1;
        try {
            //每次拉取條數
            int batchSize = 1000;
            Message message = canalConnector.getWithoutAck(batchSize);
            //批次id
            batchId = message.getId();
            List<CanalEntry.Entry> entries = message.getEntries();
            if (batchId != -1 && entries.size() > 0) {
                entries.forEach(entry -> {
                    //MySQL種my.cnf中配置的是binlog_format = ROW,這裏只解析ROW類型
                    if (entry.getEntryType() == CanalEntry.EntryType.ROWDATA) {
                        //解析處理
                        publishCanalEvent(entry);
                    }
                });
            }
            canalConnector.ack(batchId);
        } catch (Exception e) {
            e.printStackTrace();
            canalConnector.rollback(batchId);
        }
    }

    private void publishCanalEvent(CanalEntry.Entry entry) {
        // CanalEntry.EntryType entryType = entry.getEntryType();
        //表名
        String tableName = entry.getHeader().getTableName();
        //數據庫名
        String database = entry.getHeader().getSchemaName();
        CanalEntry.RowChange rowChange = null;
        try {
            rowChange = CanalEntry.RowChange.parseFrom(entry.getStoreValue());
        } catch (InvalidProtocolBufferException e) {
            e.printStackTrace();
            return;
        }
        rowChange.getRowDatasList().forEach(rowData -> {
            //這裏也可以獲取改變前的數據
            List<CanalEntry.Column> beforeColumnsList = rowData.getBeforeColumnsList();
            beforeColumnsList.stream().forEach(column -> {
                log.info("更改前的數據:name:{},value:{}", column.getName(),column.getValue());
            });
            //獲取改變後的數據
            List<CanalEntry.Column> afterColumnsList = rowData.getAfterColumnsList();
           /* String primaryKey = "id";
            CanalEntry.Column idColumn = afterColumnsList.stream().filter(column ->
                    column.getIsKey() && primaryKey.equals(column.getName())).findFirst().orElse(null);*/
            Map<String, Object> columnsToMap = parseColumnsToMap(afterColumnsList);
            try {
                //插入es
                indexES(columnsToMap, database, tableName);
            } catch (IOException e) {
                e.printStackTrace();
            }
        });
    }

    Map<String, Object> parseColumnsToMap(List<CanalEntry.Column> columns) {
        Map<String, Object> map = new HashMap<>();
        columns.forEach(column -> {
            if (column == null) {
                return;
            }
            log.info("更改後的數據:name:{},value:{}", column.getName(),column.getValue());
            map.put(column.getName(), column.getValue());
        });
        return map;
    }

    /**
     * ps:
     * 1. 問題1:異常:java.lang.IllegalArgumentException: The number of object passed must be even but was [1]
     *      如果使用下面寫法:
     *       User user = userMapper.selectById(new Integer((String) dataMap.get("id")));
     *      .....
     *      indexRequest.source(user);
     *      所以這裏我改成使用indexRequest.source(map),使用map;
     * <p>
     * 2.問題2:異常:cannot write xcontent for unknown value of type class java.sql.Timestamp
     *      QueryWrapper<User> queryWrapper = new QueryWrapper<>();
     *      queryWrapper.ge("id", new Integer((String) dataMap.get("id")));
     *      List<Map<String, Object>> maps = userMapper.selectMaps(queryWrapper);
     *      for (Map<String, Object> map : maps) {
     *      IndexRequest indexRequest = new IndexRequest();
     *      indexRequest.id(String.valueOf(map.get("id")));
     *      indexRequest.source(map);
     *      restHighLevelClient.index(indexRequest, RequestOptions.DEFAULT);
     * }
     *      User對象中,時間爲Date類型
     *      調用此代碼:List<Map<String, Object>> maps = userMapper.selectMaps(queryWrapper);
     *      maps中的updateAt、createAt兩字段不是實體類中定義的date類型成了java.sql.Timestamp類型
     *      es7.3.2無法處理Timestamp類型,因此這裏修改寫法,正確的寫法在下面代碼中
     *      <p>
     * 問題3:
     * 異常:Found interface org.elasticsearch.common.bytes.BytesReference, but class was expected
     *      控制檯輸出:
     *      java.lang.IncompatibleClassChangeError: Found interface org.elasticsearch.common.bytes.BytesReference, but class was expected
     *      at org.elasticsearch.client.RequestConverters.index(RequestConverters.java:340) ~[elasticsearch-rest-high-level-client-7.4.2.jar:7.6.2]
     *      at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1450) ~[elasticsearch-rest-high-level-client-7.4.2.jar:7.6.2]
     *      at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1424) ~[elasticsearch-rest-high-level-client-7.4.2.jar:7.6.2]
     * 原因:我個人猜想:可能是版本問題
     *      在spring boot2.3.0.RELEASE版本中,可以查看es的依賴版本是7.6.2,而我elasticsearch-rest-high-level-client版本是7.4.2,與我安裝的es版本一致
     *      因此我在這裏修改spring boot2.3.0.RELEASE版本中默認提供的es版本,在pom.xml中增加如下部分:
     * <properties>
     * <elasticsearch.version>7.4.2</elasticsearch.version>
     * </properties>
     * 問題即可解決。
     */
    private void indexES(Map<String, Object> dataMap, String database, String table) throws IOException {
        log.info("dataMap:{},database:{},table:{}", dataMap, database, table);
        //不是“dianping”庫中的,不處理
        if (!StringUtils.equals("dianping", database)) {
            return;
        }
        //不是user表中的數據不處理
        if (StringUtils.equals("user", table)) {
            //利用mybatis-plus 根據id查詢出數據,並將其轉化成map
            User user = userMapper.selectById(new Integer((String) dataMap.get("id")));
            Map<String, Object> map = JSON.parseObject(JSON.toJSONString(user, SerializerFeature.WriteNullStringAsEmpty,
                    SerializerFeature.WriteNullNumberAsZero, SerializerFeature.WriteMapNullValue), Map.class);
            IndexRequest indexRequest = new IndexRequest("user");
            indexRequest.id(String.valueOf(map.get("id")));
            indexRequest.source(map);
            restHighLevelClient.index(indexRequest, RequestOptions.DEFAULT);
        } else {
            return;
        }
    }


    @Override
    public void setApplicationContext(ApplicationContext applicationContext) throws BeansException {
        this.applicationContext = applicationContext;
    }
}

 問題2的截圖:

問題3的截圖:

這裏可以看出spring boot2.3.0.RELEASE版本中,可以查看es的依賴版本是7.6.2:

mysql中user表數據: 

修改:

控制檯打印:

2020-05-24 00:15:16.168  INFO 21540 --- [   scheduling-1] c.l.dianping.canal.CanalScheduling       : dataMap:{password=123, gender=1, telphone=1234567, nick_name=Lucifer, update_at=2020-05-23 21:02:08, id=1, create_at=2020-05-24 00:15:17},database:dianping,table:user
2020-05-24 00:15:16.194 DEBUG 21540 --- [   scheduling-1] c.l.d.mapper.UserMapper.selectById       : ==>  Preparing: SELECT id,create_at,update_at,telphone,password,nick_name,gender FROM user WHERE id=? 
2020-05-24 00:15:16.195 DEBUG 21540 --- [   scheduling-1] c.l.d.mapper.UserMapper.selectById       : ==> Parameters: 1(Integer)
2020-05-24 00:15:16.199 DEBUG 21540 --- [   scheduling-1] c.l.d.mapper.UserMapper.selectById       : <==      Total: 1

再次修改成:

es查詢:這裏使用的是kibana查看

 
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章