Hive基本語法和使用

原創

qq_40178533

2020-06-16 02:30

Hive 語法

建表語句

第一種常用新建原始表：

create [EXTERNAL] table vv_stat_fact
(
userid  string,
stat_date string,
tryvv int,
sucvv int,
ptime float
)
 PARTITIONED BY ( 非必選；創建分區表
  dt string)
clustered by (userid) into 3000 buckets // 非必選；分桶子
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'  // 必選;指定列之間的分隔符 
STORED AS rcfile   // 非必選；指定文件的讀取格式，默認textfile格式
location '/testdata/'; //非必選；指定文件在hdfs上的存儲路徑，如果已經有文件，會自動加載 ，默認在hive的warehouse下

第二種關聯建表

create table dianxin_as_S AS select * from dianxin_503 limit 10;

注意新建表不允許是外部表。select後面表需要是已經存在的表，建表同時會加載數據。會啓動mapreduce任務去讀取源表數據寫入新表。

CREATE EXTERNAL TABLE IF NOT EXISTS dianxin_like LIKE dianxin_503;

第三種，創建分區表
分區表指的是在創建表時指定分區空間，實際上就是在hdfs上表的目錄下再創建子目錄。在使用數據時如果指定了需要訪問的分區名稱，則只會讀取相應的分區，避免全表掃描，提高查詢效率。

CREATE TABLE page_view(viewTime INT, ip STRING ) PARTITIONED BY (dt STRING, country STRING) ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\001'

加載數據

使用load data 命令
從hdfs導入數據，路徑可以是目錄，會將目錄下所有文件導入，但是文件格式必須一致。
load data inpath ‘/test/’ into table dianxin_test;
從本地文件系統導入
load data local inpath ‘/test/’ into table dianxin_test;
表對錶加載：
create table IF NOT EXISTS dianxin_test2 as select * from dianxin_test
insert [overwrite] into table dianxin_test2 select * from dianxin_test;

注意：
1，如果建表語句沒有指定存儲路徑，不管是外部表還是內部表，存儲路徑都是會默認在hive/warehouse/xx.db/表名的目錄下。加載的數據也會移動到該表的存儲目錄下。注意是移動，移動，移動。不是複製
2，刪除外部表，文件不會刪除，對應目錄也不會刪除

Hive的DDL語句

創建數據庫 create database xxxxx;
查看數據庫 show databases；刪除數據庫 drop database tmp;
強制刪除數據庫：drop database tmp cascade;
查看錶：SHOW TABLES；
查看錶的元信息：
desc test_table;
describe extended test_table;
describe formatted test_table;
查看建表語句：show create table table_XXX
重命名錶：
alter table test_table rename to new_table;
修改列數據類型：alter table lv_test change column colxx string;
增加、刪除分區：
alter table test_table add partition (pt=xxxx)
alter table test_table drop if exists partition(…);

Hive的DML語句

Hive函數

常見的函數

if函數 if（，，）
case when 函數：case when 。。。end
日期函數：to_date…
字符串函數：concat，concat_ws
聚合函數：sum，count。。。
null值判斷：is null ，is not null

高級函數
窗口函數（開窗函數）：用戶分組中開窗
row_number() 等
select * from (select name,date_time,row_number() over(partition by name order by cost desc) as rn from window_t)a where rn=1;
一般用於分組中求 TopN

Hive-自定義函數UDF

UDF函數可以直接應用於select語句，對查詢結構做格式化處理後，再輸出內容。
編寫UDF函數的時候需要注意一下幾點：
- 自定義UDF需要繼承org.apache.hadoop.hive.ql.exec.UDF
- 需要evaluate函數。
步驟
- 把程序打包放到目標機器上去；
- 進入hive客戶端，添加jar包： add jar /usr/local/testdata/hive_UP.jar;
- 創建臨時函數：hive>CREATE TEMPORARY FUNCTION f_up as ‘hive_demo.hive_udf’;
查詢HQL語句：
select f_up(line) from wc_test;
銷燬臨時函數：hive> DROP TEMPORARY FUNCTION f_up;
注：UDF只能實現一進一出的操作，如果需要實現多進一出，則需要實現UDAF

Hive-jdbc的連接操作

首先開啓 metastore：hive --service metastore
開啓 hiveserver2：hive --service hiveserver2
添加maven依賴

 <dependency>
             <groupId>org.apache.hive</groupId>
             <artifactId>hive-jdbc</artifactId>
             <version>1.2.1/version>
 </dependency>

Java代碼

package hive;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;

public class hive_jdbc {
	public static void main(String[] args) throws Exception {
		Class.forName("org.apache.hive.jdbc.HiveDriver");
		Connection con = DriverManager.getConnection("jdbc:hive2://master:10000/zhangsan");
		Statement cs = con.createStatement();
		//查詢sql，query
		ResultSet rs = cs.executeQuery("select * from zhangsan.air_id");
		//通常用於ddl操作
		//cs.execute("create table testxx");
		while(rs.next()) {
			String word = rs.getString(1);
			String count = rs.getString(2);
			System.out.println(word + ", " + count);
		}
		rs.close();
		cs.close();
		con.close();
	}
}

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Hive基本語法和使用

Hive 語法

建表語句

加載數據

Hive的DDL語句

Hive的DML語句

Hive函數

Hive-jdbc的連接操作

《Python進階》學習筆記

一個docker容器暴露多個端口

leetcode 60 排列序列

Leetcode 3161. 物塊放置查詢

微服務實踐之使用 Visual Studio 2022 調試Dapr 應用程序

wpf附加屬性理解 WPF附加屬性

java中棧的管理

高光譜Houston數據集

算法的複雜度總結

大數據MapReduce的原理小結

Hbase 的API調用

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結