Compression
LZOor Snappy(20% better than LZO)——Block(default)
Serialization
AVROdidn’t work well——deserialization issue
Developedconfigurable serialization mechanism that uses JSON except Data type
Secondary Indexes
Wereusing ITHBase and IHBase from contrib——doesn’t work well
Redesignedschema without need for index
Westill need it though
Performance
Severaltunable parameters
Hardware(Hadoop+HBase)
DataNode -24GB RAM,8Cores,4*1TB(64GB,24Cores,8*2TB)
6mappers and 6 reducers per node (16 mappers,4reducers)
Memoryallocation by process
DataNode——1GB(2GB)
TaskTracker——1GB(2GB)
MapTasks——6*1GB(16*1.5GB)
ReduceTasks——6*1GB(4*1.5GB)
RegionServer——8GB(24GB)
TotalAllocation——24GB(64GB)
Deployment
Donot run ZK instances on DN,have a separate ZK quorum(3 minimum)
Donot run HMaster on NN
AvoidSPOF for HMaster(run additional master(s))