java 根據url將網頁下載到服務器或本地

原創

2020-02-28 10:13

這裏使用文件流或者字符流都可以，但是我在使用文件流出現了部分JavaScript的代碼被新網頁當成字符串的情況

源碼：

import java.io.*;
import java.net.MalformedURLException;
import java.net.URL;
import java.net.URLConnection;

public static void getPutFile(String urlPath,String downloadDir) throws Exception{
                //確定爬取的網頁地址
                         String strurl=urlPath;
                         //建立url爬取核心對象
                        try {
                                 URL url=new URL(strurl);
                                 //通過url建立與網頁的連接
                                 URLConnection conn=url.openConnection();
                                 //通過鏈接取得網頁返回的數據
                                 InputStream is=conn.getInputStream();

                                 System.out.println(conn.getContentEncoding());
                                 //一般按行讀取網頁數據，並進行內容分析
                                 //因此用BufferedReader和InputStreamReader把字節流轉化爲字符流的緩衝流
                                 //進行轉換時，需要處理編碼格式問題
                                 BufferedReader br=new BufferedReader(new InputStreamReader(is,"UTF-8"));
                                 //按行讀取並打印
                                 File file = new File(downloadDir);
                                 //創建本地文件操作對象
                                 if(file.exists()) {
                                     //文件不存在
                                     System.out.println("目標文件不存在！");
                                     try {
                                             //如果目標文件不存在則自動創建
                                             file.createNewFile();
                                             System.out.println("已自動創建文件！");
                                         } catch (IOException e) {
                                             System.out.println("自動創建文件失敗！");
                                         }
                                 }
                                 String line=null;
                                 while((line=br.readLine())!=null){
                                         System.out.println(line);
                                         //創建文件輸出流將讀取到的網頁源代碼寫入文件（文件流）
//                                         FileOutputStream fileOutputStream = new FileOutputStream(file,true);
//                                         fileOutputStream.write(line.getBytes());
//                                         fileOutputStream.close();
                                         //字節流
                                           OutputStream out = new FileOutputStream(file,true);
                                           out.write(line.getBytes()); //向文件中寫入數據
                                           out.write('\r'); // \r\n表示換行
                                           out.write('\n');
                                           out.close();
                                     }

                                 br.close();
                             } catch (Exception e) {
                                 // TODO Auto-generated catch block
                                 e.printStackTrace();
                            }
            }

public static void main(String[] args) {
        try {
           getPutFile("http://news.baidu.com/","C:\\Users\\hunuo\\Desktop\\index123.html");
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

java 根據url將網頁下載到服務器或本地

開源高性能結構化日誌模塊NanoLog

【簡寫Mybatis-02】註冊機的實現以及SqlSession處理

手繪二維碼

.NET藉助虛擬網卡實現一個簡單異地組網工具

MySQL中exists和in的區別及使用場景

java 基於 HttpURLConnection(post)和基於CloseableHttpClient(post) 請求

java不同方式方式導入導出數據

將xml 字符串轉換成 jsonObjec (一)

java讀取，覆蓋，追加txt 內容（二）包含網絡讀取

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結