HDFS-API 操作

>hdfs 動態擴容：

HDFS 中存儲的文本副本大小：默認是128M

參數：dfs.blocksize 默認128M 每一個副本大小，這是客戶端的行爲，傳一個副本或者獲取一個副本，值獲取配置大小的副本，也就是存儲的大小都是由客戶端決定的。

def.replication 默認副本數量3個，也是由客戶端決定，

元數據存儲目錄:

dfs.namenode.name.dir

file://${hadoop.tmp.dir}/dfs/name

默認的存儲方式是，在配置的hadoop.tmp.dir配置目錄下面的name，

正確的做法是，配置多個目錄

<name>dfs.namenode.name.dir</name>

<value>/mnt/disk1, /mnt/disk2,nfs://xxx/xxxx</value>

dataNode的文件塊目錄也應該是配置到專門存儲數據的磁盤

dfs.datanode.data.dir 默認的也是在 file://${hadoop.tmp.dir}/dfs/data 目錄中

<name>dfs.datanode.dir</name>

<value>/mnt/disk1, /mnt/disk2,nfs://xxx/xxxx</value>

這裏的寫法核上面的寫法是不一樣的，上面的namenode 的寫法是沒一個數據在每一個目錄都寫一遍，而datanode 的寫法是，講所有的掛載目錄都當成是一個，這次給這個目錄寫，下一次給下一個目錄寫。

第一個api操作程序：

// 1 獲取入口FileSystem

Configuration conf = new Configuration();
// set 客戶端使用的參數
conf.set("dfs.replication","2");

FileSystem dfsClient = FileSystem.get(new URI("hdfs://172.16.214.128:9000"), conf,"root");

// 上傳一個文件
Path fromLocalPath = new Path("/Users/xuxliu/Downloads/hadoopConfigcopy.docx");
Path targetHdfsPath = new Path("/test/test.doc");
dfsClient.copyFromLocalFile(fromLocalPath, targetHdfsPath);
dfsClient.close();

遇到問題：Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=xuxliu, access=WRITE, inode="/":root:supergroup:drwxr-xr-x

權限問題，默認是需要開啓權限的，而我們在命令行裏面沒有遇到這個問題是因爲命令行裏是root執行的，解決方法，關閉權限

<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>

需要重啓

API 測試代碼

private FileSystem dfsClient;

    @Before
    public void init() throws Exception {
        Configuration conf = new Configuration();
        conf.set("dfs.replication","2");
        dfsClient = FileSystem.get(new URI("hdfs://172.16.214.128:9000"), conf,"root");
    }

    /**
     *  測試hdfs 上取文件到本地
     */
    @Test
    public void testGetFile() throws IOException {
        Path targetLocalPath = new Path("/Users/xuxliu/Downloads/");
        Path sorHdfsPath = new Path("/test/test.doc");
        dfsClient.copyToLocalFile(sorHdfsPath, targetLocalPath);

        dfsClient.close();
    }

    @Test
    public void testMkdir() throws IOException {
        boolean mkdirs = dfsClient.mkdirs(new Path("/xxx/yyyy"));
        dfsClient.close();
    }

    @Test
    public void testDelete() throws IOException {
//        delete(Path f, boolean recursive)
        boolean delete = dfsClient.delete(new Path("/xxx"), true);
        dfsClient.close();
    }

    @Test
    public void testRenameDir() throws IOException {
        dfsClient.rename(new Path("/xxx"), new Path("/xxxuuu"));
        dfsClient.close();
    }

    @Test
    public void testlistDir() throws IOException {
//        final Path f, final boolean recursive)  遞歸顯示全部
        RemoteIterator<LocatedFileStatus> iterator = dfsClient.listFiles(new Path("/"), true);
        while (iterator.hasNext()){
            LocatedFileStatus next = iterator.next();
            System.out.println(next.getBlockLocations());
            System.out.println(next.getBlockSize());
            System.out.println(next.getOwner());
        }

        dfsClient.close();
    }

    @Test
    public void testDirInfo() throws IOException {
//        List the statuses of the files/directories in the given path if the path is
//          * a directory.  不會遞歸
        FileStatus[] fileStatuses = dfsClient.listStatus(new Path("/"));
        for (FileStatus fs: fileStatuses
             ) {
            System.out.println(fs);
//            FileStatus{path=hdfs://172.16.214.128:9000/a.txt; isDirectory=false;
//            length=15; replication=2; blocksize=134217728; modification_time=1550903769312;
//            access_time=1550903768982; owner=root; group=supergroup; permission=rw-r--r--;
//            isSymlink=false}
        }

        dfsClient.close();
    }

    @Test
    public void testGetPartOfFile() throws IOException {
        FSDataInputStream inputStream = dfsClient.open(new Path("/qingshu.txt"));
        FileOutputStream outputStream = new FileOutputStream("/Users/xuxliu/Downloads/qingshu.txt");

        // offset from start index 指定讀取的起始位置
        inputStream.seek(20);

        byte[] bytes = new byte[10];
        int len = 0;
        long count = 0;
        while (inputStream.read(bytes)!=-1){// 不等於-1 就是讀取了一個緩存
            outputStream.write(bytes);
            count += 1;

            if (count == 20){
                break;
            }
        }

        inputStream.close();
        outputStream.close();
        dfsClient.close();
    }

    @Test
    // 輸出流的方式網 hdfs 寫數據
    public void testWriteDataToHdfs() throws IOException {
        // 打開 hdfs 文件
        FSDataOutputStream hdfsOut = dfsClient.create(new Path("/qingshu2.txt"));
        hdfsOut.write("77777777777777777".getBytes());
        hdfsOut.write("88888888888888888".getBytes());

        hdfsOut.flush();
        hdfsOut.close();
        dfsClient.close();
    }

HDFS 讀寫機制：

寫：

客戶端請求寫數據(帶着路徑)nameNode，namenode會返回這個路徑是否能寫
客戶端繼續請求namenode，寫入一個block，
namenode 會返回來可用的datanode blockId 可用的datanode等
客戶端和datanode握手，請求傳輸文件塊，只會傳遞給一個datanode，這個datanode會和其他的datanode發請求，誰準備接受數據。返回接聽的端口，告訴客戶端準備好了
客戶端本地讀書局，然後網絡流給第一臺機器收數據，這臺機器的這些數據，還會給其他datanode傳輸數據
接着客戶端傳遞這個文件的第二塊，重複上面的步驟

下載數據：

客戶端讀取數據請求。
namenode 會根據請求參數，查看這些數據是否存在，如果存在則返回請求文件的元數據
客戶端拿着這些元數據取datanode 取數據

NameNode 如何管理元數據的：

元數據是什麼：HDFS 中的文件的描述信息，路徑/BLK信息/位置/長度/副本數量，這些數據是放在內存中的

所以宕機那麼就會很危險了。

元數據的的任何東西都是因爲客戶端的操作而引起的。nameNode 的信息會根據客戶端的操作而跟新，也會將這些操作寫到磁盤裏面去，防止宕機，隔一段時間就會將操作的東西寫到磁盤(fsimage)，而這個存放是有secondaryNamenode做的，到達一個觸發點就會將這段時間的操作，寫到 fsimage，然後nameNode重新記錄新的操作，萬一宕機，那麼可以根據執行這些記錄而重新操作一遍講數據恢復。

secondaryNamenode 執行checkpoint操作

sql -1-基礎

JavaWeb-Cookie和Session

Java 編程正則表達式

Java 編程工廠模式

Java 高併發編程重入鎖 && 面試題

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結