拿到Avro-trunk下的源碼,第一個分析對象就是
avro-trunk_src\lang\java下的源碼
源碼結構包括avro,compiler,ipc,mapred,protobuf,thrift等等
首先切入avro中
一級類集中在JsonProperties[頂級抽象類]
Schema,Protocol【繼承JsonProperties】
SchemaNormalization,以及SchemaBuilder
和Exception
從中可以分析avro核心支持所謂的Json格式Schema的原因所在
從Schema中可以看出所支持的Schema類型
public enum Type {
RECORD, ENUM, ARRAY, MAP, UNION, FIXED, STRING, BYTES,
INT, LONG, FLOAT, DOUBLE, BOOLEAN, NULL;
private String name;
private Type() { this.name = this.name().toLowerCase(); }
public String getName() { return name; }
};
而Protocol包括兩類Message
針對JsonProperties內置爲
Map<String,JsonNode> props = new LinkedHashMap<String,JsonNode>(1);
關注兩個同步方法:
public synchronized JsonNode getJsonProp(String name) {
return props.get(name);
}
和
public synchronized void addProp(String name, JsonNode value) {}
實現讀寫的同步控制
在Protocol中定義的Message和TwoWayMessage如下
public class Message extends JsonProperties {
private String name;
private String doc;
private Schema request;
TwoWayMessage如下
private class TwoWayMessage extends Message {
private Schema response;
private Schema errors;
針對SchemaBuilder顧名思義爲 創建對應的Schema
對應包含多種類型的Builder
還包含對應的FieldDefault系列和Completion
以及
private abstract static class Completion<R> {
protected abstract R complete(Schema schema);
}
針對FieldDefault的定義如下
private static abstract class FieldDefault<R, S extends FieldDefault<R, S>> extends Completion<S> {
private final FieldBuilder<R> field;
private Schema schema;
protected FieldDefault(FieldBuilder<R> field) {
this.field = field;
}
/** Completes this field with no default value **/
public final FieldAssembler<R> noDefault() {
return field.completeField(schema);
}
private FieldAssembler<R> usingDefault(Object defaultVal) {
return field.completeField(schema, defaultVal);
}
@Override
protected final S complete(Schema schema) {
this.schema = schema;
return self();
}
protected abstract S self();
}
關注最後一個方法:
// create default value JsonNodes from objects
private static JsonNode toJsonNode(Object o) {
try {
String s;
if (o instanceof ByteBuffer) {
// special case since GenericData.toString() is incorrect for bytes
// note that this does not handle the case of a default value with nested bytes
ByteBuffer bytes = ((ByteBuffer) o);
bytes.mark();
byte[] data = new byte[bytes.remaining()];
bytes.get(data);
bytes.reset(); // put the buffer back the way we got it
s = new String(data, "ISO-8859-1");
char[] quoted = JsonStringEncoder.getInstance().quoteAsString(s);
s = "\"" + new String(quoted) + "\"";
} else {
s = GenericData.get().toString(o);
}
return new ObjectMapper().readTree(s);
} catch (IOException e) {
throw new SchemaBuilderException(e);
}
}
通過NIO方式將Object轉換爲JsonNode
對應的JsonNode爲org.codehaus.jackson.JsonNode;
分析其它源碼結構在avro下的
包括data,file,generic,io,ipc,reflect,specific,tool,util
package data:
包括
Json
包括一個Writer和Reader
RecordBuilder
public interface RecordBuilder<T> {
T build();
}
RecordBuilderBase
public abstract class RecordBuilderBase<T extends IndexedRecord>
implements RecordBuilder<T>
該BuilderBase提供驗證的模版方法
ErrorBuilder
一個繼承的Builder
public interface ErrorBuilder<T> extends RecordBuilder<T> {
/** Gets the value */
Object getValue();
/** Sets the value */
ErrorBuilder<T> setValue(Object value);
/** Checks whether the value has been set */
boolean hasValue();
/** Clears the value */
ErrorBuilder<T> clearValue();
/** Gets the error cause */
Throwable getCause();
/** Sets the error cause */
ErrorBuilder<T> setCause(Throwable cause);
/** Checks whether the cause has been set */
boolean hasCause();
/** Clears the cause */
ErrorBuilder<T> clearCause();
}
在package下的file裏面包括如下類繼承體系
抽象類Codec.java定義了壓縮和解壓縮,getName,equals,hashCode等
public abstract class Codec {
/** Name of the codec; written to the file's metadata. */
public abstract String getName();
/** Compresses the input data */
public abstract ByteBuffer compress(ByteBuffer uncompressedData) throws IOException;
/** Decompress the data */
public abstract ByteBuffer decompress(ByteBuffer compressedData) throws IOException;
/**
* Codecs must implement an equals() method. Two codecs, A and B are equal
* if: the result of A and B decompressing content compressed by A is the same
* AND the retult of A and B decompressing content compressed by B is the same
**/
@Override
public abstract boolean equals(Object other);
/**
* Codecs must implement a hashCode() method that is consistent with equals().*/
@Override
public abstract int hashCode();
@Override
public String toString() {
return getName();
}
}
對應的子類包括:
public class BZip2Codec extends Codec 實現Implements bzip2 compression and decompression.
內部依賴org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream;
和org.apache.commons.compress.compressors.bzip2.BZip2CompressorOutputStream;
class DeflateCodec extends Codec 實現Implements DEFLATE (RFC1951) compression and decompression.
final class NullCodec extends Codec 實現Implements "null" (pass through) codec.
class SnappyCodec extends Codec 實現Implements Snappy compression and decompression
內部使用CRC32 crc32 = new CRC32();
注意上述的4個子類,一個是public,另外兩個爲定義訪問控制項,還有一個爲final
以及Codec的抽象工廠
public abstract class CodecFactory
對應的createInstance是抽象工廠方法
/** Creates internal Codec. */
protected abstract Codec createInstance();
工廠註冊
public static CodecFactory addCodec(String name, CodecFactory c) {
return REGISTERED.put(name, c);
}
創建工廠方法
public static CodecFactory fromString(String s) {
CodecFactory o = REGISTERED.get(s);
if (o == null) {
throw new AvroRuntimeException("Unrecognized codec: " + s);
}
return o;
}
已經對應的具體工廠實例
public static CodecFactory nullCodec() {
return NullCodec.OPTION;
}
/** Deflate codec, with specific compression.
* compressionLevel should be between 1 and 9, inclusive. */
public static CodecFactory deflateCodec(int compressionLevel) {
return new DeflateCodec.Option(compressionLevel);
}
/** Snappy codec.*/
public static CodecFactory snappyCodec() {
return new SnappyCodec.Option();
}
/** bzip2 codec.*/
public static CodecFactory bzip2Codec() {
return new BZip2Codec.Option();
}
兩個跟file有關的接口
SeekableInput
public interface SeekableInput extends Closeable {
/** Set the position for the next {@link java.io.InputStream#read(byte[],int,int) read()}. */
void seek(long p) throws IOException;
/** Return the position of the next
{@link java.io.InputStream#read(byte[],int,int) read()}. */
long tell() throws IOException;
/** Return the length of the file. */
long length() throws IOException;
/** Equivalent to {@link java.io.InputStream#read(byte[],int,int)}. */
int read(byte[] b, int off, int len) throws IOException;
}
四個方法
seek,tell,length,read
對應的子類SeekableFileInput
public class SeekableFileInput
extends FileInputStream implements SeekableInput {
public SeekableFileInput(File file) throws IOException { super(file); }
public SeekableFileInput(FileDescriptor fd) throws IOException { super(fd); }
public void seek(long p) throws IOException { getChannel().position(p); }
public long tell() throws IOException { return getChannel().position(); }
public long length() throws IOException { return getChannel().size(); }
}
另外一個子類SeekableByteArrayInput
public class SeekableByteArrayInput extends ByteArrayInputStream implements SeekableInput {
public SeekableByteArrayInput(byte[] data) {
super(data);
}
public long length() throws IOException {
return this.count;
}
public void seek(long p) throws IOException {
this.reset();
this.skip(p);
}
public long tell() throws IOException {
return this.pos;
}
}
另外一個接口爲FileReader,包括next,sync,pastSync,tell四個方法
public interface FileReader<D> extends Iterator<D>, Iterable<D>, Closeable {
/** Return the schema for data in this file. */
Schema getSchema();
D next(D reuse) throws IOException;
void sync(long position) throws IOException;
boolean pastSync(long position) throws IOException;
long tell() throws IOException;
}
對應實現子類包括:
DataFileReader
public class DataFileReader<D>
extends DataFileStream<D> implements FileReader<D> {}
以及另外的一個版本DataFileReader12
/** Read files written by Avro version 1.2. */
public class DataFileReader12<D> implements FileReader<D>, Closeable {}
該類中有幾個方法值得關注
@Override
public synchronized D next(D reuse) throws IOException {
while (blockCount == 0) { // at start of block
if (in.tell() == in.length()) // at eof
return null;
skipSync(); // skip a sync
blockCount = vin.readLong(); // read blockCount
if (blockCount == FOOTER_BLOCK) {
seek(vin.readLong()+in.tell()); // skip a footer
}
}
blockCount--;
return reader.read(reuse, vin);
}
public synchronized void seek(long position) throws IOException {
in.seek(position);
blockCount = 0;
blockStart = position;
vin = DecoderFactory.get().binaryDecoder(in, vin);
}
/** Move to the next synchronization point after a position. */
@Override
public synchronized void sync(long position) throws IOException {
if (in.tell()+SYNC_SIZE >= in.length()) {
seek(in.length());
return;
}
in.seek(position);
vin.readFixed(syncBuffer);
for (int i = 0; in.tell() < in.length(); i++) {
int j = 0;
for (; j < sync.length; j++) {
if (sync[j] != syncBuffer[(i+j)%sync.length])
break;
}
if (j == sync.length) { // position before sync
seek(in.tell() - SYNC_SIZE);
return;
}
syncBuffer[i%sync.length] = (byte)in.read();
}
seek(in.length());
}
以及構造函數
public DataFileReader12(SeekableInput sin, DatumReader<D> reader)
throws IOException {
this.in = new DataFileReader.SeekableInputStream(sin);
byte[] magic = new byte[4];
in.read(magic);
if (!Arrays.equals(MAGIC, magic))
throw new IOException("Not a data file.");
long length = in.length();
in.seek(length-4);
int footerSize=(in.read()<<24)+(in.read()<<16)+(in.read()<<8)+in.read();
seek(length-footerSize);
long l = vin.readMapStart();
if (l > 0) {
do {
for (long i = 0; i < l; i++) {
String key = vin.readString(null).toString();
ByteBuffer value = vin.readBytes(null);
byte[] bb = new byte[value.remaining()];
value.get(bb);
meta.put(key, bb);
}
} while ((l = vin.mapNext()) != 0);
}
this.sync = getMeta(SYNC);
this.count = getMetaLong(COUNT);
String codec = getMetaString(CODEC);
if (codec != null && ! codec.equals(NULL_CODEC)) {
throw new IOException("Unknown codec: " + codec);
}
this.schema = Schema.parse(getMetaString(SCHEMA));
this.reader = reader;
reader.setSchema(schema);
seek(MAGIC.length); // seek to start
}
當然還包括
DataFileStream實現Iterator
public class DataFileStream<D> implements Iterator<D>, Iterable<D>, Closeable {
內置核心方法
@Override
public boolean hasNext() {
try {
if (blockRemaining == 0) {
// check that the previous block was finished
if (null != datumIn) {
boolean atEnd = datumIn.isEnd();
if (!atEnd) {
throw new IOException("Block read partially, the data may be corrupt");
}
}
if (hasNextBlock()) {
block = nextRawBlock(block);
block.decompressUsing(codec);
blockBuffer = block.getAsByteBuffer();
datumIn = DecoderFactory.get().binaryDecoder(
blockBuffer.array(), blockBuffer.arrayOffset() +
blockBuffer.position(), blockBuffer.remaining(), datumIn);
}
}
return blockRemaining != 0;
} catch (EOFException e) { // at EOF
return false;
} catch (IOException e) {
throw new AvroRuntimeException(e);
}
}
以及一個DataFileWriter
public class DataFileWriter<D> implements Closeable, Flushable {
核心方法
/** Flush the current state of the file. */
@Override
public void flush() throws IOException {
sync();
vout.flush();
}
public void close() throws IOException {
if (isOpen) {
flush();
out.close();
isOpen = false;
}
}
以及LengthLimitedInputStream.java類
class LengthLimitedInputStream extends FilterInputStream {}
更多內容分析繼續......