拷貝map任務輸出源碼解讀

map任務的輸出由ReduceTask類的方法long copyOutput(MapOutputLocation loc)實現，包括以下幾個步驟：

1.檢查是否已經拷貝，如果已經拷貝，則返回-2表示要拷貝的數據已經過期

// check if we still need to copy the output from this location

if (copiedMapOutputs.contains(loc.getTaskId()) ||

obsoleteMapIds.contains(loc.getTaskAttemptId())) {

return CopyResult.OBSOLETE;

}

2. 構造map輸出的路徑及文件名和本地用於存儲遠程數據的臨時文件路徑

//map輸出文件名output/map_任務Id.out

Path filename =

new Path(String.format(

MapOutputFile.REDUCE_INPUT_FILE_FORMAT_STRING,

TaskTracker.OUTPUT, loc.getTaskId().getId()));

// Copy the map output to a temp file whose name is unique to this attempt

//拷貝到本地的臨時文件名

Path tmpMapOutput = new Path(filename+"-"+id);

3. 執行數據的拷貝

這步主要由函數getMapOutput（）實現，在下面會詳細描述這個個過程

// Copy the map output

MapOutput mapOutput = getMapOutput(loc, tmpMapOutput,

reduceId.getTaskID().getId());

4.以同步併發的機制實現以下功能

synchronized (ReduceTask.this) {}

1）再次檢查當前拷貝的數據是否已經拷貝過，如果拷貝過，則丟棄

if (copiedMapOutputs.contains(loc.getTaskId())) {

mapOutput.discard();

return CopyResult.OBSOLETE;

}

2）檢查原始map輸出數據大小是否爲0，如果爲0，則把拷貝生成的文件刪除

// Special case: discard empty map-outputs

if (bytes == 0) {

try {

mapOutput.discard();

} catch (IOException ioe) {

LOG.info("Couldn't discard output of " + loc.getTaskId());

}

// Note that we successfully copied the map-output

noteCopiedMapOutput(loc.getTaskId());

return bytes;

}

3）分別處理拷貝完成的數據，分爲內存和本地文件兩種

a.數據被拷貝到內存中，則把拷貝的內存數據句柄加入集合中

// Process map-output

if (mapOutput.inMemory) {

// Save it in the synchronized list of map-outputs

mapOutputsFilesInMemory.add(mapOutput);

}

b.數據存儲在本地文件，則把臨時文件重命名爲最終文件

// Rename the temporary file to the final file;

// ensure it is on the same partition

//把拷貝生成的臨時文件重命名爲最後

tmpMapOutput = mapOutput.file;

//把output/output/map_任務Id.out-0這樣的臨時文件重命名爲

//output/output/map_任務Id.out這樣的文件

filename = new Path(tmpMapOutput.getParent(), filename.getName());

if (!localFileSys.rename(tmpMapOutput, filename)) {

localFileSys.delete(tmpMapOutput, true);

bytes = -1;

throw new IOException("Failed to rename map output " +

tmpMapOutput + " to " + filename);

}

4）把本次拷貝的任務加入已經拷貝任務的集合中，並修改可拷貝的任務數

// Note that we successfully copied the map-output

//把此任務id加入進copiedMapOutputs

//並把還需要拷貝的map任務數置爲（總數-已經拷貝的數量）

noteCopiedMapOutput(loc.getTaskId());

此方法內部代碼爲：

/**

* Save the map taskid whose output we just copied.

* This function assumes that it has been synchronized on ReduceTask.this.

* @param taskId map taskid

private void noteCopiedMapOutput(TaskID taskId) {

copiedMapOutputs.add(taskId);

ramManager.setNumCopiedMapOutputs(numMaps - copiedMapOutputs.size());

}

getMapOutput是數據拷貝的主實現方法，以下是這個方法的源碼解析，方法簽名爲

private MapOutput getMapOutput(MapOutputLocation mapOutputLoc,

Path filename, int reduce)

throws IOException, InterruptedException

內部實現步驟：

1.獲取map任務輸出地址的連接和輸入流

// Connect

URL url = mapOutputLoc.getOutputLocation();

URLConnection connection = url.openConnection();

InputStream input = setupSecureConnection(mapOutputLoc, connection);

2.檢查當前地址的map輸出是否是想要獲取的map輸出

// Validate header from map output

TaskAttemptID mapId = null;

try {

mapId =

TaskAttemptID.forName(connection.getHeaderField(FROM_MAP_TASK));

} catch (IllegalArgumentException ia) {

LOG.warn("Invalid map id ", ia);

return null;

}

TaskAttemptID expectedMapId = mapOutputLoc.getTaskAttemptId();

if (!mapId.equals(expectedMapId)) {

LOG.warn("data from wrong map:" + mapId +

" arrived to reduce task " + reduce +

", where as expected map output should be from " + expectedMapId);

return null;

}

如果是，則往下繼續執行，如果不是，則說明取數據的地址出現問題，則返回

3.檢查map輸出的數據大小是否大於零，包括壓縮和未壓縮的情況

//未壓縮的數據

long decompressedLength =

Long.parseLong(connection.getHeaderField(RAW_MAP_OUTPUT_LENGTH));

//壓縮的數據長度

long compressedLength =

Long.parseLong(connection.getHeaderField(MAP_OUTPUT_LENGTH));

if (compressedLength < 0 || decompressedLength < 0) {

LOG.warn(getName() + " invalid lengths in map output header: id: " +

mapId + " compressed len: " + compressedLength +

", decompressed len: " + decompressedLength);

return null;

}

4.檢查map輸出的分區是否屬於此reduce任務

//檢查是否屬於此reduce任務的輸出，我的理解是，map端的分區輸出記錄有reduce的 //任務id，需要查看map端輸出

//猜測？job在初始化任務的時候，已經創建了所有的map任務ID以及reduce任務ID

int forReduce =

(int)Integer.parseInt(connection.getHeaderField(FOR_REDUCE_TASK));

//reduce的值爲當前reduce任務id

if (forReduce != reduce) {

LOG.warn("data for the wrong reduce: " + forReduce +

" with compressed len: " + compressedLength +

", decompressed len: " + decompressedLength +

" arrived to reduce task " + reduce);

return null;

}

5.執行數據的拷貝

此步，又可以分爲以下幾個詳細的步驟：

1）檢查剩下的內存是否足夠存儲拷貝的數據

//We will put a file in memory if it meets certain criteria:

//1. The size of the (decompressed) file should be less than 25% of

// the total inmem fs

//2. There is space available in the inmem fs

// Check if this map-output can be saved in-memory

//通過檢查輸出數據沒有壓縮的大小與內存能放的最大值比較，如果小於，則可以放，如 //果大於，則不可以放內存

//最大值是mapred.job.reduce.total.mem.bytes配置的0.25倍

boolean shuffleInMemory = ramManager.canFitInMemory(decompressedLength);

2）拷貝數據到內存

if (shuffleInMemory) {

if (LOG.isDebugEnabled()) {

LOG.debug("Shuffling " + decompressedLength + " bytes (" +

compressedLength + " raw bytes) " +

"into RAM from " + mapOutputLoc.getTaskAttemptId());

}

mapOutput = shuffleInMemory(mapOutputLoc, connection, input,

(int)decompressedLength,

(int)compressedLength);

}

shuffleInMemory函數的詳細源碼分析如下：

a）檢查是否有足夠的內存存放數據，如果內存不夠，則把線程進入等待隊列，直到內存夠了以後，線程被通知，然後繼續執行

/**

* 如果內存空間大小不夠，則調用wait進行等待，當空間釋放後，線程被喚醒後，此方 * 法返回

* 返回true表示不用等待，false表示等待後，線程喚醒返回

// Reserve ram for the map-output

boolean createdNow = ramManager.reserve(mapOutputLength, input);

b) 重新連接

如果createdNow返回爲真，則表示內存夠，線程沒有進入對象等待對象，則不需要重新連接，如果返回爲假，則說明線程進入等待隊列，並且重新被激活，原來的連接已經關閉

// Reconnect if we need to

//因爲空間不夠，線程進入等待，關閉了與map輸出節點之間的連接，所以需要重新連接

if (!createdNow) {

// Reconnect

try {

connection = mapOutputLoc.getOutputLocation().openConnection();

input = setupSecureConnection(mapOutputLoc, connection);

} catch (IOException ioe) {

LOG.info("Failed reopen connection to fetch map-output from " +

mapOutputLoc.getHost());

// Inform the ram-manager

ramManager.closeInMemoryFile(mapOutputLength);

ramManager.unreserve(mapOutputLength);

throw ioe;

}

c) 計算數據長度，因爲數據帶有校驗信息，需要減去

//截留出真實數據長度，因爲輸入流中的數據包括數據校驗信息和真實數據

IFileInputStream checksumIn =

new IFileInputStream(input,compressedLength);

input = checksumIn;

d)如果數據是壓縮的，則把輸入流改爲壓縮文件

// Are map-outputs compressed?

if (codec != null) {

decompressor.reset();

input = codec.createInputStream(input, decompressor);

}

e）執行數據的拷貝

// Copy map-output into an in-memory buffer

byte[] shuffleData = new byte[mapOutputLength];

MapOutput mapOutput =

new MapOutput(mapOutputLoc.getTaskId(),

mapOutputLoc.getTaskAttemptId(), shuffleData, compressedLength);

int bytesRead = 0;

try {

//n表示實際讀到的字節數，因爲一次實際讀到的數值要小於等於總長度

//所以下面循環度，但是接收空間長度不變都是數組的完整初始化長度

int n = input.read(shuffleData, 0, shuffleData.length);

while (n > 0) {

bytesRead += n;

shuffleClientMetrics.inputBytes(n);

// indicate we're making progress

reporter.progress();

n = input.read(shuffleData, bytesRead,

(shuffleData.length-bytesRead));

}

if (LOG.isDebugEnabled()) {

LOG.debug("Read " + bytesRead + " bytes from map-output for " +

mapOutputLoc.getTaskAttemptId());

}

input.close();

} catch (IOException ioe) {

LOG.info("Failed to shuffle from " + mapOutputLoc.getTaskAttemptId(),

ioe);

// Inform the ram-manager

ramManager.closeInMemoryFile(mapOutputLength);

ramManager.unreserve(mapOutputLength);

// Discard the map-output

try {

mapOutput.discard();

} catch (IOException ignored) {

LOG.info("Failed to discard map-output from " +

mapOutputLoc.getTaskAttemptId(), ignored);

}

mapOutput = null;

// Close the streams

IOUtils.cleanup(LOG, input);

// Re-throw

readError = true;

throw ioe;

}

// Close the in-memory file

ramManager.closeInMemoryFile(mapOutputLength);

f）檢查拷貝完的數據長度是否與原始文件的長度相等，不相等，則丟棄拷貝的數據

3）拷貝數據到硬盤

此部分代碼相對簡單，不做闡述，總體分兩個步驟，與拷貝的內存一致

第一步拷貝

第二步檢查數據長度是否一致

private MapOutput shuffleToDisk(MapOutputLocation mapOutputLoc,

InputStream input,

Path filename,

long mapOutputLength)

throws IOException {

// Find out a suitable location for the output on local-filesystem

Path localFilename =

lDirAlloc.getLocalPathForWrite(filename.toUri().getPath(),

mapOutputLength, conf);

MapOutput mapOutput =

new MapOutput(mapOutputLoc.getTaskId(), mapOutputLoc.getTaskAttemptId(),

conf, localFileSys.makeQualified(localFilename),

mapOutputLength);

// Copy data to local-disk

OutputStream output = null;

long bytesRead = 0;

try {

output = rfs.create(localFilename);

byte[] buf = new byte[64 * 1024];

int n = -1;

try {

n = input.read(buf, 0, buf.length);

} catch (IOException ioe) {

readError = true;

throw ioe;

}

while (n > 0) {

bytesRead += n;

shuffleClientMetrics.inputBytes(n);

output.write(buf, 0, n);

// indicate we're making progress

reporter.progress();

try {

n = input.read(buf, 0, buf.length);

} catch (IOException ioe) {

readError = true;

throw ioe;

}

LOG.info("Read " + bytesRead + " bytes from map-output for " +

mapOutputLoc.getTaskAttemptId());

output.close();

input.close();

} catch (IOException ioe) {

LOG.info("Failed to shuffle from " + mapOutputLoc.getTaskAttemptId(),

ioe);

// Discard the map-output

try {

mapOutput.discard();

} catch (IOException ignored) {

LOG.info("Failed to discard map-output from " +

mapOutputLoc.getTaskAttemptId(), ignored);

}

mapOutput = null;

// Close the streams

IOUtils.cleanup(LOG, input, output);

// Re-throw

throw ioe;

}

// Sanity check

if (bytesRead != mapOutputLength) {

try {

mapOutput.discard();

} catch (Exception ioe) {

// IGNORED because we are cleaning up

LOG.info("Failed to discard map-output from " +

mapOutputLoc.getTaskAttemptId(), ioe);

} catch (Throwable t) {

String msg = getTaskID() + " : Failed in shuffle to disk :"

+ StringUtils.stringifyException(t);

reportFatalError(getTaskID(), t, msg);

}

mapOutput = null;

throw new IOException("Incomplete map output received for " +

mapOutputLoc.getTaskAttemptId() + " from " +

mapOutputLoc.getOutputLocation() + " (" +

bytesRead + " instead of " +

mapOutputLength + ")"

);

}

return mapOutput;

}

拷貝map任務輸出源碼解讀

我的友情鏈接

深度分析如何在Hadoop中控制Map的數量

hive存儲處理器（StorageHandlers）以及hive與hbase整合

osgi學習之---擴展點理解

osgi啓動級別

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結