將Hive統計分析結果導入到MySQL數據庫表中（三）——使用Hive UDF或GenericUDF

前面我分別介紹了兩種Hive將分析結果導入到MySQL表的兩種方式：Sqoop導入方式和使用Hive、MySQL JDBC驅動，現在我介紹第三種，也是使用比較多的方式——使用Hive 自定義函數（UDF或GenericUDF）將每條記錄插入到數據庫表中。

一、使用UDF方式

使用UDF方式實現比較簡單，只要繼承UDF類，並重寫evaluate方法即可

1、編寫實現類

package com.gxnzx.hive.udf;

import org.apache.hadoop.hive.ql.exec.UDF;

import com.gxnzx.hive.util.DBSqlHelper;

public class AnalyzeStatistics  extends UDF{

        public String evaluate(String clxxbh,String hphm){
                //jtxx2數據庫爲目標數據庫表
                String sql="insert into jtxx2 values(?,?)";
                //往數據庫中插入記錄
                if(DBSqlHelper.addBatch(sql, clxxbh, hphm)){
                        return clxxbh+"  SUCCESS  "+hphm;
                }else{
                        return clxxbh+"  faile  "+hphm;
                }
        }
}

2、數據庫操作方法

          public static boolean addBatch(String sql,String clxxbh,String hphm){
                  boolean flag=false;
                  try{
                          conn=DBSqlHelper.getConn(); //打開一個數據庫連接
                          ps=(PreparedStatement) conn.prepareStatement(sql);

                          ps.setString(1, clxxbh);
                          ps.setString(2, hphm);
                          System.out.println(ps.toString());
                          ps.execute();
                          flag=true;

                  }catch(Exception e){
                          e.printStackTrace();
                  }finally{
                          try {
                                ps.close();
                        } catch (SQLException e) {
                                e.printStackTrace();
                        }
                  }
                 return flag;
          }

3、使用eclipse將該項目包打成jar包導入到hive類環境中

hive> add jar  hiveudf2.jar

4、將MySQL JDBC驅動包導入hive 類環境中

hive> add jar /home/hadoopUser/cloud/hive/apache-hive-0.13.1-bin/lib/mysql-connector-java-5.1.18-bin.jar

5、創建hive 臨時函數

hive> create temporary function  analyze as 'com.gxnzx.hive.udf.AnalyzeStatistics';

6、測試

hive> select analyze(clxxbh,hphm) from transjtxx_hbase limit 10;
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1428394594787_0034, Tracking URL = http://secondmgt:8088/proxy/application_1428394594787_0034/
Kill Command = /home/hadoopUser/cloud/hadoop/programs/hadoop-2.2.0/bin/hadoop job  -kill job_1428394594787_0034
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2015-04-23 10:15:34,355 Stage-1 map = 0%,  reduce = 0%
2015-04-23 10:15:51,032 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 7.14 sec
MapReduce Total cumulative CPU time: 7 seconds 140 msec
Ended Job = job_1428394594787_0034
MapReduce Jobs Launched:
Job 0: Map: 1   Cumulative CPU: 7.14 sec   HDFS Read: 256 HDFS Write: 532 SUCCESS
Total MapReduce CPU Time Spent: 7 seconds 140 msec
OK
32100017000000000220140317000015  SUCCESS  魯Q58182
32100017000000000220140317000016  SUCCESS  魯QV4662
32100017000000000220140317000019  SUCCESS  蘇LL8128
32100017000000000220140317000020  SUCCESS  蘇CAH367
32100017000000000220140317000023  SUCCESS  魯Q7899W
32100017000000000220140317000029  SUCCESS  蘇HN3819
32100017000000000220140317000038  SUCCESS  魯C01576
32100017000000000220140317000044  SUCCESS  蘇DT9178
32100017000000000220140317000049  SUCCESS  蘇LZ1112
32100017000000000220140317000052  SUCCESS  蘇K9795警
Time taken: 35.815 seconds, Fetched: 10 row(s)

7、查看MySQL表中數據

mysql> select * from jtxx2;
+----------------------------------+-------------+
| cllxbh                           | hphm        |
+----------------------------------+-------------+
| 32100017000000000220140317000015 | 魯Q58182    |
| 32100017000000000220140317000016 | 魯QV4662    |
| 32100017000000000220140317000019 | 蘇LL8128    |
| 32100017000000000220140317000020 | 蘇CAH367    |
| 32100017000000000220140317000023 | 魯Q7899W    |
| 32100017000000000220140317000029 | 蘇HN3819    |
| 32100017000000000220140317000038 | 魯C01576    |
| 32100017000000000220140317000044 | 蘇DT9178    |
| 32100017000000000220140317000049 | 蘇LZ1112    |
| 32100017000000000220140317000052 | 蘇K9795警   |
+----------------------------------+-------------+
10 rows in set (0.00 sec)

二、使用GenericUDF方式

使用GenericUDF方式，實現比較複雜，我參考了別人的代碼，如下：

1、編寫調用函數

package com.gxnzx.hive.main;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.SQLException;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.udf.UDFType;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
import org.apache.hadoop.hive.serde.Constants;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.StringObjectInspector;
import org.apache.hadoop.io.IntWritable;

/**
 * AnalyzeGenericUDFDBOutput is designed to output data directly from Hive to a
 * JDBC datastore. This UDF is useful for exporting small to medium summaries
 * that have a unique key.
 * 
 * Due to the nature of hadoop, individual mappers, reducers or entire jobs can
 * fail. If a failure occurs a mapper or reducer may be retried. This UDF has no
 * way of detecting failures or rolling back a transaction. Consequently, you
 * should only only use this to export to a table with a unique key. The unique
 * key should safeguard against duplicate data.
 * 
 * To use this UDF ,you should follow below three steps First of all, you need
 * to packag the UDF into the jar file; Secondly, you should use hive add jar
 * feature to add the UDF jar file to current class path; Thirdly, you should
 * use hive add jar feature to add JDBC Driver jar file to current class path;
 * Fourthly, you should use hive create temporary function feature to create an
 * temporary function belong to the UDF class.
 * 
 * Examples for MySQL: hive> add jar udf.jar hive> add jar
 * mysql-connector-java-5.1.18-bin.jar hive> create temporary function
 * analyzedboutput as 'com.gxnzx.hive.main.AnalyzeGenericUDFDBOutput'
 */
@Description(name = "analyzedboutput", value = "_FUNC_(jdbctring,username,password,preparedstatement,[arguments])"
		+ " - sends data to a jdbc driver", extended = "argument 0 is the JDBC connection string\n"
		+ "argument 1 is the database user name\n"
		+ "argument 2 is the database user's password\n"
		+ "argument 3 is an SQL query to be used in the PreparedStatement\n"
		+ "argument (4-n) The remaining arguments must be primitive and are "
		+ "passed to the PreparedStatement object\n")
@UDFType(deterministic = false)
public class AnalyzeGenericUDFDBOutput extends GenericUDF {
	private static final Log LOG = LogFactory
			.getLog(AnalyzeGenericUDFDBOutput.class.getName());

	private transient ObjectInspector[] argumentOI;
	private transient Connection connection = null;
	private String url;
	private String user;
	private String pass;
	private final IntWritable result = new IntWritable(-1);

	/**
	 * @param arguments
	 *            argument 0 is the JDBC connection string argument 1 is the
	 *            user name argument 2 is the password argument 3 is an SQL
	 *            query to be used in the PreparedStatement argument (4-n) The
	 *            remaining arguments must be primitive and are passed to the
	 *            PreparedStatement object
	 */
	@Override
	public ObjectInspector initialize(ObjectInspector[] arguments)
			throws UDFArgumentTypeException {
		argumentOI = arguments;

		// this should be connection
		// url,username,password,query,column1[,columnn]*
		for (int i = 0; i < 4; i++) {
			if (arguments[i].getCategory() == ObjectInspector.Category.PRIMITIVE) {
				PrimitiveObjectInspector poi = ((PrimitiveObjectInspector) arguments[i]);

				if (!(poi.getPrimitiveCategory() == PrimitiveObjectInspector.PrimitiveCategory.STRING)) {
					throw new UDFArgumentTypeException(i,
							"The argument of function should be \""
									+ Constants.STRING_TYPE_NAME + "\", but \""
									+ arguments[i].getTypeName()
									+ "\" is found");
				}
			}
		}
		for (int i = 4; i < arguments.length; i++) {
			if (arguments[i].getCategory() != ObjectInspector.Category.PRIMITIVE) {
				throw new UDFArgumentTypeException(i,
						"The argument of function should be primative"
								+ ", but \"" + arguments[i].getTypeName()
								+ "\" is found");
			}
		}

		return PrimitiveObjectInspectorFactory.writableIntObjectInspector;
	}

	/**
	 * @return 0 on success -1 on failure
	 */
	@Override
	public Object evaluate(DeferredObject[] arguments) throws HiveException {

		url = ((StringObjectInspector) argumentOI[0])
				.getPrimitiveJavaObject(arguments[0].get());
		user = ((StringObjectInspector) argumentOI[1])
				.getPrimitiveJavaObject(arguments[1].get());
		pass = ((StringObjectInspector) argumentOI[2])
				.getPrimitiveJavaObject(arguments[2].get());

		try {
			connection = DriverManager.getConnection(url, user, pass);
		} catch (SQLException ex) {
			LOG.error("Driver loading or connection issue", ex);
			result.set(2);
		}

		if (connection != null) {
			try {

				PreparedStatement ps = connection
						.prepareStatement(((StringObjectInspector) argumentOI[3])
								.getPrimitiveJavaObject(arguments[3].get()));
				for (int i = 4; i < arguments.length; ++i) {
					PrimitiveObjectInspector poi = ((PrimitiveObjectInspector) argumentOI[i]);
					ps.setObject(i - 3,
							poi.getPrimitiveJavaObject(arguments[i].get()));
				}
				ps.execute();
				ps.close();
				result.set(0);
			} catch (SQLException e) {
				LOG.error("Underlying SQL exception", e);
				result.set(1);
			} finally {
				try {
					connection.close();
				} catch (Exception ex) {
					LOG.error("Underlying SQL exception during close", ex);
				}
			}
		}

		return result;
	}

	@Override
	public String getDisplayString(String[] children) {
		StringBuilder sb = new StringBuilder();
		sb.append("dboutput(");
		if (children.length > 0) {
			sb.append(children[0]);
			for (int i = 1; i < children.length; i++) {
				sb.append(",");
				sb.append(children[i]);
			}
		}
		sb.append(")");
		return sb.toString();
	}

}

2、將程序打成jar包，導入到Hive class path下

hive> add jar  hiveGenericUdf.jar;

3、添加mysql JDBC驅動 JAR文件

hive> add jar /home/hadoopUser/cloud/hive/apache-hive-0.13.1-bin/lib/mysql-connector-java-5.1.18-bin.jar

4、創建臨時函數

hive> create temporary function analyzedboutput as 'com.gxnzx.hive.main.AnalyzeGenericUDFDBOutput';

5、測試

hive> select analyzedboutput('jdbc:mysql://192.168.2.133:3306/transport','hive','hive','insert into jtxx2 values(?,?)',clxxbh,hphm) from transjtxx_hbase limit 5;
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1428394594787_0043, Tracking URL = http://secondmgt:8088/proxy/application_1428394594787_0043/
Kill Command = /home/hadoopUser/cloud/hadoop/programs/hadoop-2.2.0/bin/hadoop job  -kill job_1428394594787_0043
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2015-04-23 22:01:46,205 Stage-1 map = 0%,  reduce = 0%
2015-04-23 22:02:01,985 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 9.37 sec
MapReduce Total cumulative CPU time: 9 seconds 370 msec
Ended Job = job_1428394594787_0043
MapReduce Jobs Launched:
Job 0: Map: 1   Cumulative CPU: 9.37 sec   HDFS Read: 256 HDFS Write: 10 SUCCESS
Total MapReduce CPU Time Spent: 9 seconds 370 msec
OK
0
0
0
0
0
Time taken: 32.118 seconds, Fetched: 5 row(s)

analyzedboutput六個參數分別表示：MySQL JDBC連接字符串、MYSQL數據用戶名、密碼、SQL插入語句、Hive表中對應的clxxbh，hphm兩個查詢字段。

6、查看MySQL數據庫表數據

mysql> select * from jtxx2;
Empty set (0.00 sec)

mysql> select * from jtxx2;
+----------------------------------+-----------+
| cllxbh                           | hphm      |
+----------------------------------+-----------+
| 32100017000000000220140317000015 | 魯Q58182  |
| 32100017000000000220140317000016 | 魯QV4662  |
| 32100017000000000220140317000019 | 蘇LL8128  |
| 32100017000000000220140317000020 | 蘇CAH367  |
| 32100017000000000220140317000023 | 魯Q7899W  |
+----------------------------------+-----------+
5 rows in set (0.00 sec)

//此處結束

將Hive統計分析結果導入到MySQL數據庫表中（三）——使用Hive UDF或GenericUDF

開源高性能結構化日誌模塊NanoLog

杭州的 IT 崩盤了麼？

【簡寫Mybatis-02】註冊機的實現以及SqlSession處理

手繪二維碼

.NET藉助虛擬網卡實現一個簡單異地組網工具

將Hive統計分析結果導入到MySQL數據庫表中（二）——使用Hive和MySQL JDBC驅動

Sqoop1.4.4實現關係型數據庫多表同時導入HDFS或Hive中

JPA 初探—配置及逆向工程增刪改查案例

Spring+Jersey+Hibernate+MySQL+HTML實現用戶信息增刪改查案例（附Jersey單元測試）

Sqoop1.4.4將MySQL中數據導入到Hive表中

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結