Hadoop 自定義key

自定義key簡介

hadoop中自定義key的組成是由writable類型組成。如果用java的數據類型,最終還是要轉換成writable類型。
自定義key要繼承WritableComparable接口,原因參考文章
Hadoop 的Writable序列化接口

自定義key例子

public class MyKeyWritable implements WritableComparable<MyKeyWritable> {
    private static final Text.Comparator TEXT_COMPARATOR = new Text.Comparator();
    private static final IntWritable.Comparator INT_COMPARATOR = new IntWritable.Comparator();

    private Text value;
    private IntWritable flag;


    public MyKeyWritable() {
        this.set(new Text(), new IntWritable());
    }

    public MyKeyWritable(Text value, IntWritable flag) {
        this.set(value, flag);
    }

    public void set(Text value, IntWritable flag) {
        this.value = value;
        this.flag = flag;
    }

    public Text getValue() {
        return value;
    }

    public IntWritable getFlag() {
        return flag;
    }

    public void write(DataOutput out) throws IOException {
        this.value.write(out);
        this.flag.write(out);
    }

    public void readFields(DataInput in) throws IOException {
        this.value.readFields(in);
        this.flag.readFields(in);
    }

    @Override
    public int hashCode() {
        return super.hashCode();
    }

    @Override
    public boolean equals(Object obj) {
        if (!(obj instanceof MyKeyWritable))
            return false;
        MyKeyWritable sw = (MyKeyWritable) obj;
        return this.value.equals(sw.value) && this.flag.equals(sw.flag);
    }

    @Override
    public String toString() {
        return this.value.toString() + "|" + this.flag.get();
    }

    public static class Comparator extends WritableComparator {
        public Comparator() {
            super(MyKeyWritable.class);
        }
        @Override
        public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
            int thisValueLen = WritableUtils.decodeVIntSize(b1[s1]) + readInt(b1, s1);
            int thatValueLen = WritableUtils.decodeVIntSize(b2[s2]) + readInt(b2, s2);

            int res1 = TEXT_COMPARATOR.compare(b1, s1, thisValueLen, b2, s2, thatValueLen);

             /*
                a negative integer, zero, or a positive integer, first
                argument is less than, equal to, or greater than the second
             */
            if (res1 != 0)
                return res1;
            int res2 = INT_COMPARATOR.compare(b1, s1 + thisValueLen, l1 - thisValueLen, b2,
                    s2 + thatValueLen, l2 - thatValueLen);
            return res2;
        }
    }

    public int compareTo(MyKeyWritable o) {
        int res = this.value.compareTo(o.value);
        if (res != 0)
            return res;
        return this.flag.compareTo(o.flag);
    }

    static {
        WritableComparator.define(MyKeyWritable.class, new Comparator());
    }
}

分析

自定義key 繼承了WritableComparable 接口,實現了Writable接口的write(DataOutput out)和readFields(DataInput in)兩個方法,也實現了Comparable 接口的compareTo(T o)的方法,並且實現了Object 的equals(Object obj)方法,到此一個自定義key就實現了

爲什麼要在用內部類實現WritableComparator類呢?
雖然實現了compareTo(MyKeyWritable o) ,但是他進行比較的時候必須是對象之間進行比較,在數據傳遞過程中已經將其反序列化成字節流,因此在比較時,需要將對象的字節流進行序列化,然後進行比較,序列化是要消耗資源和性能的,爲了提高比較效率,實現WritableComparator類或者RawComparator接口,實現其compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) 方法,就不需要序列化,以字節的方式去比較,效率得以提高。

TEXT_COMPARATOR 、INT_COMPARATOR 是Text和IntWritable裏面WritableComparator的實現,我們可以直接去使用,只不過在自定義的時候對其進行了整合,爲我所用。(這裏可以瀏覽源碼去了解)

下面代碼是註冊者個比較器
static {
WritableComparator.define(MyKeyWritable .class, new Comparator());
}

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章