大數據核心技術源碼分析之-Avro篇-2

拿到Avro-trunk下的源碼,第一個分析對象就是

avro-trunk_src\lang\java下的源碼

源碼結構包括avro,compiler,ipc,mapred,protobuf,thrift等等

首先切入avro中

一級類集中在JsonProperties[頂級抽象類]

Schema,Protocol【繼承JsonProperties】

SchemaNormalization,以及SchemaBuilder

和Exception

從中可以分析avro核心支持所謂的Json格式Schema的原因所在

從Schema中可以看出所支持的Schema類型

public enum Type {
    RECORD, ENUM, ARRAY, MAP, UNION, FIXED, STRING, BYTES,
      INT, LONG, FLOAT, DOUBLE, BOOLEAN, NULL;
    private String name;
    private Type() { this.name = this.name().toLowerCase(); }
    public String getName() { return name; }
  };

而Protocol包括兩類Message

 

針對JsonProperties內置爲

Map<String,JsonNode> props = new LinkedHashMap<String,JsonNode>(1);

關注兩個同步方法:

public synchronized JsonNode getJsonProp(String name) {
    return props.get(name);
  }

public synchronized void addProp(String name, JsonNode value) {}

實現讀寫的同步控制

在Protocol中定義的Message和TwoWayMessage如下

public class Message extends JsonProperties {
    private String name;
    private String doc;
    private Schema request;

TwoWayMessage如下

private class TwoWayMessage extends Message {
    private Schema response;
    private Schema errors;

針對SchemaBuilder顧名思義爲 創建對應的Schema

對應包含多種類型的Builder

還包含對應的FieldDefault系列和Completion

以及

private abstract static class Completion<R> {
    protected abstract R complete(Schema schema);
  }

針對FieldDefault的定義如下

 private static abstract class FieldDefault<R, S extends FieldDefault<R, S>> extends Completion<S> {
    private final FieldBuilder<R> field;
    private Schema schema;
    protected FieldDefault(FieldBuilder<R> field) {
      this.field = field;
    }
   
    /** Completes this field with no default value **/
    public final FieldAssembler<R> noDefault() {
      return field.completeField(schema);
    }
   
    private FieldAssembler<R> usingDefault(Object defaultVal) {
      return field.completeField(schema, defaultVal);
    }
   
    @Override
    protected final S complete(Schema schema) {
      this.schema = schema;
      return self();
    }
   
    protected abstract S self();
  }

關注最後一個方法:

 // create default value JsonNodes from objects
  private static JsonNode toJsonNode(Object o) {
    try {
      String s;
      if (o instanceof ByteBuffer) {
        // special case since GenericData.toString() is incorrect for bytes
        // note that this does not handle the case of a default value with nested bytes
        ByteBuffer bytes = ((ByteBuffer) o);
        bytes.mark();
        byte[] data = new byte[bytes.remaining()];
        bytes.get(data);
        bytes.reset(); // put the buffer back the way we got it
        s = new String(data, "ISO-8859-1");
        char[] quoted = JsonStringEncoder.getInstance().quoteAsString(s);
        s = "\"" + new String(quoted) + "\"";
      } else {
        s = GenericData.get().toString(o);
      }
      return new ObjectMapper().readTree(s);
    } catch (IOException e) {
      throw new SchemaBuilderException(e);
    }
  }

通過NIO方式將Object轉換爲JsonNode

對應的JsonNode爲org.codehaus.jackson.JsonNode;

 

分析其它源碼結構在avro下的

包括data,file,generic,io,ipc,reflect,specific,tool,util

package data:

包括

Json

包括一個Writer和Reader

RecordBuilder

public interface RecordBuilder<T> {
  T build();
}

RecordBuilderBase

public abstract class RecordBuilderBase<T extends IndexedRecord>
  implements RecordBuilder<T>

該BuilderBase提供驗證的模版方法

ErrorBuilder

一個繼承的Builder

public interface ErrorBuilder<T> extends RecordBuilder<T> {
 
  /** Gets the value */
  Object getValue();
 
  /** Sets the value */
  ErrorBuilder<T> setValue(Object value);
 
  /** Checks whether the value has been set */
  boolean hasValue();
 
  /** Clears the value */
  ErrorBuilder<T> clearValue();
 
  /** Gets the error cause */
  Throwable getCause();
 
  /** Sets the error cause */
  ErrorBuilder<T> setCause(Throwable cause);
 
  /** Checks whether the cause has been set */
  boolean hasCause();
 
  /** Clears the cause */
  ErrorBuilder<T> clearCause();

}

在package下的file裏面包括如下類繼承體系

抽象類Codec.java定義了壓縮和解壓縮,getName,equals,hashCode等

public abstract class Codec {
  /** Name of the codec; written to the file's metadata. */
  public abstract String getName();
  /** Compresses the input data */
  public abstract ByteBuffer compress(ByteBuffer uncompressedData) throws IOException;
  /** Decompress the data  */
  public abstract ByteBuffer decompress(ByteBuffer compressedData) throws IOException;
  /**
   * Codecs must implement an equals() method.  Two codecs, A and B are equal
   * if: the result of A and B decompressing content compressed by A is the same
   * AND the retult of A and B decompressing content compressed by B is the same
   **/
  @Override
  public abstract boolean equals(Object other);
  /**
   * Codecs must implement a hashCode() method that is consistent with equals().*/
  @Override
  public abstract int hashCode();
  @Override
  public String toString() {
    return getName();
  }
}

對應的子類包括:

public class BZip2Codec extends Codec 實現Implements bzip2 compression and decompression.

內部依賴org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream;
                和org.apache.commons.compress.compressors.bzip2.BZip2CompressorOutputStream;

class DeflateCodec extends Codec  實現Implements DEFLATE (RFC1951) compression and decompression.

final class NullCodec extends Codec 實現Implements "null" (pass through) codec.

class SnappyCodec extends Codec 實現Implements Snappy compression and decompression

內部使用CRC32 crc32 = new CRC32();

注意上述的4個子類,一個是public,另外兩個爲定義訪問控制項,還有一個爲final

以及Codec的抽象工廠

public abstract class CodecFactory

對應的createInstance是抽象工廠方法

/** Creates internal Codec. */
  protected abstract Codec createInstance();

工廠註冊

 public static CodecFactory addCodec(String name, CodecFactory c) {
    return REGISTERED.put(name, c);
  }

創建工廠方法

public static CodecFactory fromString(String s) {
    CodecFactory o = REGISTERED.get(s);
    if (o == null) {
      throw new AvroRuntimeException("Unrecognized codec: " + s);
    }
    return o;
  }

已經對應的具體工廠實例

 public static CodecFactory nullCodec() {
    return NullCodec.OPTION;
  }

  /** Deflate codec, with specific compression.
   * compressionLevel should be between 1 and 9, inclusive. */
  public static CodecFactory deflateCodec(int compressionLevel) {
    return new DeflateCodec.Option(compressionLevel);
  }

  /** Snappy codec.*/
  public static CodecFactory snappyCodec() {
    return new SnappyCodec.Option();
  }

  /** bzip2 codec.*/
  public static CodecFactory bzip2Codec() {
    return new BZip2Codec.Option();
  }

兩個跟file有關的接口

SeekableInput

public interface SeekableInput extends Closeable {

  /** Set the position for the next {@link java.io.InputStream#read(byte[],int,int) read()}. */
  void seek(long p) throws IOException;

  /** Return the position of the next {@link java.io.InputStream#read(byte[],int,int) read()}. */
  long tell() throws IOException;

  /** Return the length of the file. */
  long length() throws IOException;

  /** Equivalent to {@link java.io.InputStream#read(byte[],int,int)}. */
  int read(byte[] b, int off, int len) throws IOException;
}

四個方法

seek,tell,length,read

對應的子類SeekableFileInput

public class SeekableFileInput
  extends FileInputStream implements SeekableInput {

  public SeekableFileInput(File file) throws IOException { super(file); }
  public SeekableFileInput(FileDescriptor fd) throws IOException { super(fd); }

  public void seek(long p) throws IOException { getChannel().position(p); }
  public long tell() throws IOException { return getChannel().position(); }
  public long length() throws IOException { return getChannel().size(); }

}

另外一個子類SeekableByteArrayInput

public class SeekableByteArrayInput extends ByteArrayInputStream implements SeekableInput {

    public SeekableByteArrayInput(byte[] data) {
        super(data);
    }

    public long length() throws IOException {
        return this.count;
    }

    public void seek(long p) throws IOException {
        this.reset();
        this.skip(p);
    }

    public long tell() throws IOException {
        return this.pos;
    }
}

另外一個接口爲FileReader,包括next,sync,pastSync,tell四個方法

public interface FileReader<D> extends Iterator<D>, Iterable<D>, Closeable {
  /** Return the schema for data in this file. */
  Schema getSchema();

   D next(D reuse) throws IOException;
  void sync(long position) throws IOException;
  boolean pastSync(long position) throws IOException;
  long tell() throws IOException;

}

對應實現子類包括:

DataFileReader

public class DataFileReader<D>
  extends DataFileStream<D> implements FileReader<D> {}

以及另外的一個版本DataFileReader12

/** Read files written by Avro version 1.2. */
public class DataFileReader12<D> implements FileReader<D>, Closeable {}

該類中有幾個方法值得關注

@Override
  public synchronized D next(D reuse) throws IOException {
    while (blockCount == 0) {                     // at start of block

      if (in.tell() == in.length())               // at eof
        return null;

      skipSync();                                 // skip a sync

      blockCount = vin.readLong();                // read blockCount
        
      if (blockCount == FOOTER_BLOCK) {
        seek(vin.readLong()+in.tell());           // skip a footer
      }
    }
    blockCount--;
    return reader.read(reuse, vin);
  }

public synchronized void seek(long position) throws IOException {
    in.seek(position);
    blockCount = 0;
    blockStart = position;
    vin = DecoderFactory.get().binaryDecoder(in, vin);
  }

  /** Move to the next synchronization point after a position. */
  @Override
  public synchronized void sync(long position) throws IOException {
    if (in.tell()+SYNC_SIZE >= in.length()) {
      seek(in.length());
      return;
    }
    in.seek(position);
    vin.readFixed(syncBuffer);
    for (int i = 0; in.tell() < in.length(); i++) {
      int j = 0;
      for (; j < sync.length; j++) {
        if (sync[j] != syncBuffer[(i+j)%sync.length])
          break;
      }
      if (j == sync.length) {                     // position before sync
        seek(in.tell() - SYNC_SIZE);
        return;
      }
      syncBuffer[i%sync.length] = (byte)in.read();
    }
    seek(in.length());
  }

以及構造函數

 public DataFileReader12(SeekableInput sin, DatumReader<D> reader)
    throws IOException {
    this.in = new DataFileReader.SeekableInputStream(sin);

    byte[] magic = new byte[4];
    in.read(magic);
    if (!Arrays.equals(MAGIC, magic))
      throw new IOException("Not a data file.");

    long length = in.length();
    in.seek(length-4);
    int footerSize=(in.read()<<24)+(in.read()<<16)+(in.read()<<8)+in.read();
    seek(length-footerSize);
    long l = vin.readMapStart();
    if (l > 0) {
      do {
        for (long i = 0; i < l; i++) {
          String key = vin.readString(null).toString();
          ByteBuffer value = vin.readBytes(null);
          byte[] bb = new byte[value.remaining()];
          value.get(bb);
          meta.put(key, bb);
        }
      } while ((l = vin.mapNext()) != 0);
    }

    this.sync = getMeta(SYNC);
    this.count = getMetaLong(COUNT);
    String codec = getMetaString(CODEC);
    if (codec != null && ! codec.equals(NULL_CODEC)) {
      throw new IOException("Unknown codec: " + codec);
    }
    this.schema = Schema.parse(getMetaString(SCHEMA));
    this.reader = reader;

    reader.setSchema(schema);

    seek(MAGIC.length);         // seek to start
  }

當然還包括

DataFileStream實現Iterator

public class DataFileStream<D> implements Iterator<D>, Iterable<D>, Closeable {

內置核心方法

@Override
  public boolean hasNext() {
    try {
      if (blockRemaining == 0) {
        // check that the previous block was finished
        if (null != datumIn) {
          boolean atEnd = datumIn.isEnd();
          if (!atEnd) {
            throw new IOException("Block read partially, the data may be corrupt");
          }
        }
        if (hasNextBlock()) {
          block = nextRawBlock(block);
          block.decompressUsing(codec);
          blockBuffer = block.getAsByteBuffer();
          datumIn = DecoderFactory.get().binaryDecoder(
              blockBuffer.array(), blockBuffer.arrayOffset() +
              blockBuffer.position(), blockBuffer.remaining(), datumIn);
        }
      }
      return blockRemaining != 0;
    } catch (EOFException e) {                    // at EOF
      return false;
    } catch (IOException e) {
      throw new AvroRuntimeException(e);
    }
  }

以及一個DataFileWriter

public class DataFileWriter<D> implements Closeable, Flushable {

核心方法

/** Flush the current state of the file. */
  @Override
  public void flush() throws IOException {
    sync();
    vout.flush();
  }

  public void close() throws IOException {
    if (isOpen) {
      flush();
      out.close();
      isOpen = false;
    }
  }

以及LengthLimitedInputStream.java類

class LengthLimitedInputStream extends FilterInputStream {}

更多內容分析繼續......

 

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章