1. hive建庫建表與數據導入

1.1. 建庫

hive中有一個默認的庫：

庫名： default

庫目錄：hdfs://hdp20-01:9000/user/hive/warehouse

新建庫：

create database db_order;

庫建好後，在hdfs中會生成一個庫目錄：

hdfs://hdp20-01:9000/user/hive/warehouse/db_order.db

1.2. 建表

1.2.1. 基本建表語句

use db_order;

create table t_order(id string,create_timestring,amount float,uid string);

表建好後，會在所屬的庫目錄中生成一個表目錄

/user/hive/warehouse/db_order.db/t_order

只是，這樣建表的話，hive會認爲表數據文件中的字段分隔符爲 ^A

正確的建表語句爲：

create table t_order(id string,create_timestring,amount float,uid string)

row format delimited

fields terminated by ',';

這樣就指定了，我們的表數據文件中的字段分隔符爲 ","

1.2.2. 刪除表

drop table t_order;

刪除表的效果是：

hive會從元數據庫中清除關於這個表的信息；

hive還會從hdfs中刪除這個表的表目錄；

1.2.3. 內部表與外部表

內部表(MANAGED_TABLE)：表目錄按照hive的規範來部署，位於hive的倉庫目錄/user/hive/warehouse中

外部表(EXTERNAL_TABLE)：表目錄由建表用戶自己指定

create external tablet_access(ip string,url string,access_time string)

row format delimited

fields terminated by ','

location '/access/log';

外部表和內部表的特性差別：

1、內部表的目錄在hive的倉庫目錄中 VS 外部表的目錄由用戶指定

2、drop一個內部表時：hive會清除相關元數據，並刪除表數據目錄

3、drop一個外部表時：hive只會清除相關元數據；

一個hive的數據倉庫，最底層的表，一定是來自於外部系統，爲了不影響外部系統的工作邏輯，在hive中可建external表來映射這些外部系統產生的數據目錄；

然後，後續的etl操作，產生的各種表建議用managed_table

1.2.4. 分區表

分區表的實質是：在表目錄中爲數據文件創建分區子目錄，以便於在查詢時，MR程序可以針對分區子目錄中的數據進行處理，縮減讀取數據的範圍。

比如，網站每天產生的瀏覽記錄，瀏覽記錄應該建一個表來存放，但是，有時候，我們可能只需要對某一天的瀏覽記錄進行分析

這時，就可以將這個表建爲分區表，每天的數據導入其中的一個分區；

當然，每日的分區目錄，應該有一個目錄名（分區字段）

1.2.4.1. 一個分區字段的實例：

示例如下：

1、創建帶分區的表

create table t_access(ip string,url string,access_time string)

partitioned by(dt string)

row format delimited

fields terminated by ',';

注意：分區字段不能是表定義中的已存在字段

2、向分區中導入數據

load data localinpath '/root/access.log.2017-08-04.log' into table t_accesspartition(dt='20170804');

load data localinpath '/root/access.log.2017-08-05.log' into table t_accesspartition(dt='20170805');

3、針對分區數據進行查詢

a、統計8月4號的總PV：

select count(*) from t_access where dt='20170804';

實質：就是將分區字段當成表字段來用，就可以使用where子句指定分區了

b、統計表中所有數據總的PV：

select count(*) from t_access;

實質：不指定分區條件即可

1.2.4.2. 多個分區字段示例

建表：

create table t_partition(id int,namestring,age int)

partitioned by(departmentstring,sex string,howold int)

row format delimited fields terminated by',';

導數據：

load data localinpath '/root/p1.dat' into table t_partition partition(department='xiangsheng',sex='male',howold=20);

1.2.5. CTAS建表語法

可以通過已存在表來建表：

1、create tablet_user_2 like t_user;

新建的t_user_2表結構定義與源表t_user一致，但是沒有數據

2、在建表的同時插入數據

create table t_access_user

select ip,url from t_access;

t_access_user會根據select查詢的字段來建表，同時將查詢的結果插入新表中

1.3. 數據導入導出

1.3.1. 將數據文件導入hive的表

方式1：導入數據的一種方式：

手動用hdfs命令，將文件放入表目錄；

方式2：在hive的交互式shell中用hive命令來導入本地數據到表目錄

hive>load data local inpath '/root/order.data.2' into table t_order;

方式3：用hive命令導入hdfs中的數據文件到表目錄

hive>load data inpath'/access.log.2017-08-06.log' into table t_access partition(dt='20170806');

注意：導本地文件和導HDFS文件的區別：

本地文件導入表：複製

hdfs文件導入表：移動

1.3.2. 將hive表中的數據導出到指定路徑的文件

1、將hive表中的數據導入HDFS的文件

insert overwrite directory'/root/access-data'

row format delimited fields terminatedby ','

select * from t_access;

2、將hive表中的數據導入本地磁盤文件

insert overwrite local directory '/root/access-data'

row format delimited fields terminatedby ','

select * from t_access limit 100000;

1.3.3. hive文件格式

HIVE支持很多種文件格式： SEQUENCEFILE | TEXT FILE | PARQUET FILE | RC FILE

create table t_pq(movie string,rateint) stored as textfile;

create table t_pq(movie string,rateint) stored as sequencefile;

create table t_pq(movie string,rate int) stored as parquetfile;

演示：

1、先建一個存儲文本文件的表
create table t_access_text(ip string,url string,access_time string)

row format delimited fields terminated by','

stored as textfile;

導入文本數據到表中：

load data local inpath'/root/access-data/000000_0' into table t_access_text;

2、建一個存儲sequence file文件的表：

create table t_access_seq(ip string,urlstring,access_time string)

stored as sequencefile;

從文本表中查詢數據插入sequencefile表中，生成數據文件就是sequencefile格式的了：

insert into t_access_seq

select * from t_access_text;

3、建一個存儲parquet file文件的表：

create table t_access_parq(ip string,urlstring,access_time string)

stored as parquetfile;

HIVE精煉筆記總結——[建導篇]

1. hive建庫建表與數據導入

1.1. 建庫

1.2. 建表

1.2.1. 基本建表語句

1.2.2. 刪除表

1.2.3. 內部表與外部表

1.2.4. 分區表

1.2.5. CTAS建表語法

1.3. 數據導入導出

1.3.1. 將數據文件導入hive的表

1.3.2. 將hive表中的數據導出到指定路徑的文件

1.3.3. hive文件格式

一張ZooKeeper案例示意圖

MapReduce編程小案例.11th—數據傾斜場景part1

ZooKeeper的功能和應用場景圖示

一張圖解釋mapreduce程序在YARN上啓動-運行-註銷的全流程

HIVE精煉筆記總結——[啓示篇]

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結