工作中用到了一些,總結一下:
1、 行拆分、合併
1. 分拆explode(ARRAY)
返回值:多行
SELECT explode(myCol) AS myNewCol FROM myTable;
注:1.使用UDTF函數時,select中不可以包含其他表達式;
2.UDTF函數不能嵌套使用;
3.UDTF不支持GROUP BY / CLUSTER BY / DISTRIBUTE BY / SORT BY;
2. 合併去重collect_set(col)
返回值:數組(去重的效果)
Xjt2原始數據:
1 one
2 two
2 two
2 two
3 three
4 four
4 four
1 one one
1 one two
2 twotwo
2 twoone
select id,collect_set(name) from xjt2 group by id;
1 ["one two","oneone","one"]
2 ["twoone","twotwo","two"]
3 ["three"]
4 ["four"]
select collect_set(name) from xjt2;
["three","one","four","two"]
注:當collect_set(col)與其他字段同時在select語句中時,必須使用group by other_fields;
2、 時間函數
1. 獲取當前Unix時間戳unix_timestamp()
返回值類型:BIGINT
select unix_timestamp() from xjt1;
1383012276
2. 將日期轉時間戳unix_timestamp(string date)
返回值類型:BIGINT,若轉換失敗,則返回0
select unix_timestamp('2013-01-13 00:00:00') from xjt1;
1358006400
3. 轉化指定格式(pattern)日期轉時間戳unix_timestamp(string date, string pattern)
返回值類型:BIGINT,若轉化失敗,則返回0
select unix_timestamp('2013-01-13 00:00:00','yyyyMMdd')from xjt1;
1354291200
4. 將Unix時間戳轉日期from_unixtime(BIGINT,’format’)
select from_unix(unix_timestamp(),'yyyyMMdd') from xjt1;
20131029
5. 取日期to_date()、取年year()、取月month()、取天數day()
返回值類型:SRING
select to_date('1990-10-10 00:00:00') from xjt1;
1990-10-10
6. 日期增加函數date_add(string startdate, int days)
返回值類型:STRING
select date_add('2013-10-29',10) from xjt1;
2013-11-08
7. 日期減少函數date_sub(string startdate, intdays)
返回值類型:STRING
select date_sub('2013-10-29',10) from xjt1;
2013-10-19
8. 日期比較函數datediff(string enddate, string startdate)
返回值類型:INT(結束日期減去開始日期,結束日期放在前面)
select datediff('2013-10-29','2013-12-10') from xjt1;
-42
3、 條件判斷函數CASE
返回值:T/F
語法:CASE a WHEN b THENc [WHEN d THEN e]* [ELSE f] END
說明:如果 a 等於 b ,那麼返回 c ;如果 a 等於 d ,那麼返回 e ;否則返回 f
4、 字符串分割函數split(stringstr, string pat)
返回值類型:ARRAY
selectsplit('hello world hello hive',' ') from xjt1;
["hello","world","hello","hive"]