CompareFilter.CompareOp.LESS | 匹配小於設置值的值 |
CompareFilter.CompareOp.LESS_OR_EQUAL | 匹配小於等於設置值的值 |
CompareFilter.CompareOp.EQUAL | 匹配等於設置值的值 |
CompareFilter.CompareOp.NOT_EQUAL | 匹配不等於設置值的值 |
CompareFilter.CompareOp.GREATER_OR_EQUAL | 匹配大於等於設置值的值 |
CompareFilter.CompareOp.GREATER | 匹配大於設置值的值 |
CompareFilter.CompareOp.NO_OP | 排除一切值 |
BinaryComparator | 使用Bytes.compareTo()比較當前值與閾值 |
BinaryPrefixComparator | 前綴匹配 |
NullComparator | 不做匹配,只判斷當前值是否NULL |
BitComparator | 通過BitwiseOp類提供的按位與(AND)/或(OR)/異或(XOR)操作執行位級比較 |
RegexStringComparator | 根據一個正則表達式,在實例化這個比較器的時候去匹配表中的數據 |
SubstringComparator | 把閾值和表中的數據當做是一個String實例,同時通過contains()操作匹配字符串 |
LongComparator |
RowFilter | 根據rowKey過濾數據,只留下符合匹配條件的行 |
FamilyFilter | 根據FamilyFilter過濾數據,留下符合條件的行和FamilyFiter |
QualifierFilter | 根據列名過濾數據 |
ValueFilter | 根據值過濾數據 |
DependentColumnFilter | 選定一個參考列,使用參考列的時間戳作爲過濾條件,過濾時,每一行每一列與參考列的時間戳進行比較,就是返回一起修改的列 構造函數如下: DependentColumnFilter(final byte [] family, final byte[] qualifier, final boolean dropDependentColumn, final CompareOp valueCompareOp, final ByteArrayComparable valueComparator) DependentColumnFilter(final byte [] family, final byte [] qualifier) DependentColumnFilter(final byte [] family, final byte [] qualifier, final boolean dropDependentColumn) 構造器有三個,dropDependentColumn表示判斷時間戳的結果是返回還是剔除,另外valueCompareOp/valueComparator兩個參數表示可以在過濾的時候,連value一起過濾 |
val table = new HTable(HandleHbase. conf,tableName)
var scan = new Scan()
scan.addColumn(Bytes.toBytes( "cf0"), Bytes.toBytes("qual6" ))
println( "--------------------row filter BinaryComparator -------------------------" )
val rowFilter1 = new RowFilter(CompareFilter.CompareOp. LESS_OR_EQUAL,
new BinaryComparator(Bytes. toBytes( "row-22")))
scan.setFilter(rowFilter1)
val rowScanner1 = table.getScanner(scan)
for(res <- rowScanner1.iterator().asScala){
println(res)
}
rowScanner1.close()
println( "--------------------row filter SubstringComparator -------------------------")
val rowFilter2 = new RowFilter(CompareFilter.CompareOp. EQUAL,
new SubstringComparator( "-3"))
scan.setFilter(rowFilter2)
val rowScanner2 = table.getScanner(scan)
for(res <- rowScanner2.iterator().asScala){
println(res)
}
rowScanner2.close()
scan = new Scan()
scan.setStartRow(Bytes.toBytes( "row-2"))
scan.setStopRow(Bytes.toBytes( "row-3"))
println( "--------------------family filter BinaryComparator -------------------------")
val familyFilter1 = new FamilyFilter(CompareFilter.CompareOp. EQUAL,
new BinaryComparator(Bytes. toBytes( "cf1")))
scan.setFilter(familyFilter1)
val familyScanner1 = table.getScanner(scan)
for(res <- familyScanner1.iterator().asScala){
println(res)
}
familyScanner1.close()
println( "--------------------qualifier filter BinaryComparator -------------------------")
/** 不管列簇,只要有列qual1就輸出 */
val qualifierFilter1 = new QualifierFilter(CompareFilter.CompareOp. EQUAL,
new BinaryComparator(Bytes. toBytes( "qual1")))
scan.setFilter(qualifierFilter1)
val qualifierScanner1 = table.getScanner(scan)
for(res <- qualifierScanner1.iterator().asScala){
println(res)
}
qualifierScanner1.close()
println( "--------------------value filter BinaryComparator -------------------------")
val valueFilter1 = new ValueFilter(CompareFilter.CompareOp. EQUAL,
new BinaryComparator(Bytes. toBytes( "val2")))
scan.setFilter(valueFilter1)
val valueScanner1 = table.getScanner(scan)
for(res <- valueScanner1.iterator().asScala){
println(res)
}
valueScanner1.close()
println( "--------------------dependent column filter BinaryComparator -------------------------")
/** 使用cf0:qual1這一列作爲參考列 ,輸出和這一列一起修改的列 */
val dependentFilter1 = new DependentColumnFilter(Bytes. toBytes( "cf0"), Bytes.toBytes("qual1"))
scan.setFilter(dependentFilter1)
val dependentScanner1 = table.getScanner(scan)
for(res <- dependentScanner1.iterator().asScala){
println(res)
}
dependentScanner1.close()
專用Filter
SingleColumnValueFilter | 用列簇/列/值作爲匹配條件,只有匹配特定列簇,列和值的行會被保留,其餘的會被剔除(通過setFilterIfMissing設置缺失對應列簇,列的是否會被保留) |
SingleColumnValueExcludeFilter | 作用和SingleColumnValueFilter一樣,不同的是作爲對照的列是否會被保留下來 |
PrefixFilter | 匹配行前綴,前綴匹配的行會被保留下來,RowFilter可以實現這個功能,只是這個Filter用起來比較方便 |
PageFilter | 用戶可以使用這個Filter對結果按行分頁,這個Filter每次返回固定行數的匹配結果. |
KeyOnlyFilter | 只返回KeyValue的鍵,不返回值 |
FirstKeyOnlyFilter | 返回每行第一列 |
InclusiveStopFilter | 在scan中,使用setStartRow和setStopRow的時候是前閉後開的,可以使用這個Filter將stopRow包括進來 |
TimestampsFilter | 可以設置多個時間版本,返回符合版本的值 |
ColumnCountGetFilter | 可以使用這個過濾器來限制每行返回多少列 |
ColumnPaginationFilter | 與PageFilter相似,可以對一行的所有列進行分頁 |
ColumnPrefixFilter | 列前綴過濾 |
RandomRowFilter | 隨機行過濾器 |
val table = new HTable( hbaseHandle.conf,tableName)
val scan = new Scan()
scan.setStartRow(Bytes.toBytes( "row-1"))
scan.setStopRow(Bytes.toBytes( "row-2"))
println( "--------------------1.single column value filter -------------------------" )
/** 以列簇,列,值作爲判斷條件 ,過濾剩下匹配的行 */
val singleColumnValueFilter = new SingleColumnValueFilter(Bytes. toBytes( "cf0"),
Bytes.toBytes("qual3" ),CompareFilter.CompareOp. EQUAL,new BinaryComparator(Bytes.toBytes( "val2")))
singleColumnValueFilter.setFilterIfMissing(true)
scan.setFilter(singleColumnValueFilter)
val singleColumnValueScanner = table.getScanner(scan)
for(res <- singleColumnValueScanner.iterator().asScala){
println(res)
}
singleColumnValueScanner.close()
println( "--------------------2.single column value exclude filter -------------------------")
/** 以列簇,列,值作爲判斷條件 ,過濾剩下匹配的行,作爲匹配條件的列不再保留 */
val singleColumnValueExcludeFilter = new SingleColumnValueExcludeFilter(Bytes. toBytes("cf0"),
Bytes.toBytes("qual3" ),CompareFilter.CompareOp. EQUAL,new BinaryComparator(Bytes.toBytes( "val2")))
singleColumnValueExcludeFilter.setFilterIfMissing(true)
scan.setFilter(singleColumnValueExcludeFilter)
val singleColumnValueExcludeScanner = table.getScanner(scan)
for(res <- singleColumnValueExcludeScanner.iterator().asScala){
println(res)
}
singleColumnValueExcludeScanner.close()
println( "--------------------3.prefix filter -------------------------" )
/** 匹配行健前綴*/
val prefixFilter = new PrefixFilter(Bytes. toBytes( "row-11"))
scan.setFilter(prefixFilter)
val prefixScanner = table.getScanner(scan)
for(res <- prefixScanner.iterator().asScala){
println(res)
}
prefixScanner.close()
附加過濾器(就是Filter的裝飾類,給一個Filter附加而外的功能)
SkipFilter | 很多過濾器是默認保留空置的行的,這個過濾器裝飾的Filter能夠過濾空行 |
WhileMatchFilter | 這個過濾器和SkipFilter相似,但是在第一條過濾數據出現的時候,這個過濾器就會停止 |
FilterList(final Filter... rowFilters)
FilterList(final Operator operator)
FilterList(final Operator operator, final List<Filter> rowFilters)
FilterList(final Operator operator, final Filter... rowFilters)
MUST_PASS_ALL | 所有過濾器包含這個值,這個值纔會被包含在結果中,相當於AND操作 |
MUST_PASS_ONE | 只要有一個過濾器包含這個值,那這個值就會包含在結果,相當於OR操作 |
val scan = new Scan()
scan.setStartRow(Bytes.toBytes( "row-2"))
scan.setStopRow(Bytes.toBytes( "row-3"))
val familyFilter1 = new FamilyFilter(CompareFilter.CompareOp. EQUAL,
new BinaryComparator(Bytes. toBytes( "cf1")))
val qualifierFilter1 = new QualifierFilter(CompareFilter.CompareOp. EQUAL,
new BinaryComparator(Bytes. toBytes( "qual1")))
println( "------------------test MUST_PASS_ALL---------------------" )
val filterList1 = new FilterList(FilterList.Operator. MUST_PASS_ALL,familyFilter1 ,qualifierFilter1)
scan.setFilter(filterList1)
val filterScanner1 = table.getScanner(scan)
for(res <- filterScanner1.iterator().asScala){
println(res)
}
filterScanner1.close()
println( "------------------test MUST_PASS_ONE---------------------" )
val filterList2 = new FilterList(FilterList.Operator. MUST_PASS_ONE,familyFilter1 ,qualifierFilter1)
scan.setFilter(filterList2)
val filterScanner2 = table.getScanner(scan)
for(res <- filterScanner2.iterator().asScala){
println(res)
}
filterScanner2.close()
自定義Filter
public boolean filterRowKey(byte[] data, int offset, int length) 在這個方法中判斷RowKey是否要過濾,返回true表示過濾這一行,返回false表示不過濾這一行. |
public ReturnCode filterKeyValue(final Cell v) 上一個方法執行後,確定一行不過濾,這時候就可以逐個掃描一行的KeyValue(Cell)了,返回一個枚舉類型ReturnCode,ReturnCode的返回類型有如下幾個:
|
public void filterRowCells(List<Cell> kvs) 一旦所有的行和列經過前面兩個方法的檢查後,這個方法會被調用.本方法讓用戶可以訪問之前兩個方法篩選出來的KeyValue實例.DependentColumnFilter過濾器使用這個方法來過濾與參考列不匹配的數據. |
public boolean filterRow() 以上所有方法執行完之後,filterRow會被執行.PageFilter使用當前方法來檢查在一次迭代分頁中返回的行數是否達到預期的頁大小,如果達到頁大小則返回True.默認返回值是false,此時結果中包含當前行. |
public void reset() 在迭代器中爲每個新行重置過濾器. |
public boolean filterAllRemaining() 當這個返回True,可以用於結果整個掃描操作.可以使用這個方法減少掃描,優化結果. |
object test_filter1{
def main(args: Array[String]): Unit = {
Logger.getLogger("org.apache.spark").setLevel(Level.ERROR)
//設置spark參數
val conf =new SparkConf().setMaster("local[2]").setAppName("HbaseTest")
conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
val sc = new SparkContext(conf)
val hbaseConf = HBaseConfiguration.create()
val sqlContext = new SQLContext(sc)
//配置HBase
hbaseConf.set("hbase.rootdir", "hdfs://http://192.168.10.228/hbase")
hbaseConf.set("hbase.zookeeper.quorum", "192.168.10.228,192.168.10.229,192.168.10.230,192.168.10.231,192.168.10.232")
hbaseConf.set("hbase.zookeeper.property.clientPort", "2181")
hbaseConf.set("hbase.master", "192.168.10.230")
//定義表Hbase表的名字
val tableName = "deppon_test"
val out_tbl="deppon_tt"
//設置需要在hbase中查詢的表名
hbaseConf.set(TableInputFormat.INPUT_TABLE, tableName)
//構建表
val table = new HTable(hbaseConf,tableName)
val scan = new Scan()
//1.指定列族和需要顯示的列名
//scan.addColumn(Bytes.toBytes("basicmod"),Bytes.toBytes("pv"))
//2.設置rowkey的範圍,啓示和結束
//scan.setStartRow(Bytes.toBytes(""))
//scan.setStopRow(Bytes.toBytes(""))
/*3.設置過濾器,需要指定列族 ,列名,和列過濾條件
不能設置多個過濾器,後者會覆蓋前者 */
val filter1 = new SingleColumnValueFilter(Bytes.toBytes( "basicmod"),
Bytes.toBytes("pv" ),CompareFilter.CompareOp.GREATER_OR_EQUAL,new BinaryComparator(Bytes. toBytes( "5")))
filter1.setFilterIfMissing(true)
val filter2 = new SingleColumnValueFilter(Bytes.toBytes( "basicmod"),
Bytes.toBytes("createtime" ),CompareFilter.CompareOp.GREATER_OR_EQUAL,new BinaryComparator(Bytes. toBytes( "20150411")))
filter2.setFilterIfMissing(true)
/**
* 4.通過使用filterlist可以加載多個過濾器
* 設置多個過濾器
*/
val filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL,filter1,filter2)
scan.setFilter(filterList)
//獲取表的掃描
val singleColumnValueScanner = table.getScanner(scan)
/**
* 將錶轉換並註冊成表
* result_rdd 是result類型,保存了表中的相關信息,可以取出對應的數據
* 並創建表
*/
//此處轉換爲list之後可以進行隱式轉換,創建表的原因還不太清楚
val result_rdd = singleColumnValueScanner.iterator().asScala
val table_nm = result_rdd.map{x=>{
val key = Bytes.toString(x.getRow)
val cookieid = Bytes.toString(x.getValue("basicmod".getBytes, "cookieid".getBytes))
val createtime = Bytes.toString(x.getValue("basicmod".getBytes, "createtime".getBytes))
val pv = Bytes.toString(x.getValue("basicmod".getBytes, "pv".getBytes))
(key,cookieid,createtime,pv)
}}.toList
//導入隱式轉換
import sqlContext.implicits._
//構建dataframe
val tbl_rdd = table_nm.map(x=>tbl_test2(x._1,x._2,x._3,x._4)).toDF()
//註冊表
tbl_rdd.registerTempTable("person_test")
sqlContext.sql("select * from person_test").show()
//
// for(res <- singleColumnValueScanner.iterator().asScala){
// println(res)
// }
singleColumnValueScanner.close()
sc.stop()
}
}
case class tbl_test2(id:String,cookieid:String,createtime:String,pv:String)