pytube項目
最近我家姑娘的幼兒園外教需要一整套YouTube的教學兒歌《Singing Walrus Music》,在家長羣裏發出求助後,作爲程序員的老爸必須把這個事情安排的明明白白的。
Github地址
https://github.com/nficano/pytube
文檔地址
https://python-pytube.readthedocs.io
安裝方式
pip install pytube
快速上手
from pytube import YouTube
YouTube('http://youtube.com/watch?v=9bZkp7q19f0').streams.first().download()
- pytube的
first()
方法,按照作者的解釋,會選取最高分辨率的視頻進行下載,但親測後發現效果並不理想。 - YouTube的是採用DASH Streams的技術架構,其中的DASH技術會將視頻、音頻進行獨立拆分,比如視頻有480p video,720p video,音頻有44100採樣 audio,22050採樣audio。通過以下代碼即可輸出DASH的Representation描述信息:
yt = YouTube('http://youtube.com/watch?v=9bZkp7q19f0')
yt.streams.all()
[<Stream: itag="22" mime_type="video/mp4" res="720p" fps="30fps" vcodec="avc1.64001F" acodec="mp4a.40.2">,
<Stream: itag="43" mime_type="video/webm" res="360p" fps="30fps" vcodec="vp8.0" acodec="vorbis">,
<Stream: itag="18" mime_type="video/mp4" res="360p" fps="30fps" vcodec="avc1.42001E" acodec="mp4a.40.2">,
<Stream: itag="36" mime_type="video/3gpp" res="240p" fps="30fps" vcodec="mp4v.20.3" acodec="mp4a.40.2">,
<Stream: itag="17" mime_type="video/3gpp" res="144p" fps="30fps" vcodec="mp4v.20.3" acodec="mp4a.40.2">,
<Stream: itag="137" mime_type="video/mp4" res="1080p" fps="30fps" vcodec="avc1.640028">,
<Stream: itag="248" mime_type="video/webm" res="1080p" fps="30fps" vcodec="vp9">,
<Stream: itag="136" mime_type="video/mp4" res="720p" fps="30fps" vcodec="avc1.4d401f">,
<Stream: itag="247" mime_type="video/webm" res="720p" fps="30fps" vcodec="vp9">,
<Stream: itag="135" mime_type="video/mp4" res="480p" fps="30fps" vcodec="avc1.4d401e">,
<Stream: itag="244" mime_type="video/webm" res="480p" fps="30fps" vcodec="vp9">,
<Stream: itag="134" mime_type="video/mp4" res="360p" fps="30fps" vcodec="avc1.4d401e">,
<Stream: itag="243" mime_type="video/webm" res="360p" fps="30fps" vcodec="vp9">,
<Stream: itag="133" mime_type="video/mp4" res="240p" fps="30fps" vcodec="avc1.4d4015">,
<Stream: itag="242" mime_type="video/webm" res="240p" fps="30fps" vcodec="vp9">,
<Stream: itag="160" mime_type="video/mp4" res="144p" fps="30fps" vcodec="avc1.4d400c">,
<Stream: itag="278" mime_type="video/webm" res="144p" fps="30fps" vcodec="vp9">,
<Stream: itag="140" mime_type="audio/mp4" abr="128kbps" acodec="mp4a.40.2">,
<Stream: itag="171" mime_type="audio/webm" abr="128kbps" acodec="vorbis">,
<Stream: itag="249" mime_type="audio/webm" abr="50kbps" acodec="opus">,
<Stream: itag="250" mime_type="audio/webm" abr="70kbps" acodec="opus">,
<Stream: itag="251" mime_type="audio/webm" abr="160kbps" acodec="opus">]
- 其中
itag="22"
的視頻爲720p並帶有音頻(acodec="mp4a.40.2"
)的視頻文件;而itag="136"
同樣的720p的,卻是無聲版視頻文件。 - 回到之前的pytube的
first()
方法,該方法會優先混合音頻的視頻源,再選擇無聲版視頻源。這就導致一種極端情況發生,first()
會簡單粗暴的選擇了低分辨率的混合版視頻源,忽略了高清版視頻源。 - 我自己對視頻篩選邏輯進行重新改寫,後面會說明。
視頻篩選
- pytube提供了多種視頻篩選策略
1、傳統混合音頻的視頻源
- 設置參數爲
progressive=True
yt.streams.filter(progressive=True).all()
[<Stream: itag="22" mime_type="video/mp4" res="720p" fps="30fps" vcodec="avc1.64001F" acodec="mp4a.40.2">,
<Stream: itag="43" mime_type="video/webm" res="360p" fps="30fps" vcodec="vp8.0" acodec="vorbis">,
<Stream: itag="18" mime_type="video/mp4" res="360p" fps="30fps" vcodec="avc1.42001E" acodec="mp4a.40.2">,
<Stream: itag="36" mime_type="video/3gpp" res="240p" fps="30fps" vcodec="mp4v.20.3" acodec="mp4a.40.2">,
<Stream: itag="17" mime_type="video/3gpp" res="144p" fps="30fps" vcodec="mp4v.20.3" acodec="mp4a.40.2">]
2、DASH流的視頻源
- 設置參數爲
adaptive=True
yt.streams.filter(adaptive=True).all()
[<Stream: itag="137" mime_type="video/mp4" res="1080p" fps="30fps" vcodec="avc1.640028">,
<Stream: itag="248" mime_type="video/webm" res="1080p" fps="30fps" vcodec="vp9">,
<Stream: itag="136" mime_type="video/mp4" res="720p" fps="30fps" vcodec="avc1.4d401f">,
<Stream: itag="247" mime_type="video/webm" res="720p" fps="30fps" vcodec="vp9">,
<Stream: itag="135" mime_type="video/mp4" res="480p" fps="30fps" vcodec="avc1.4d401e">,
<Stream: itag="244" mime_type="video/webm" res="480p" fps="30fps" vcodec="vp9">,
<Stream: itag="134" mime_type="video/mp4" res="360p" fps="30fps" vcodec="avc1.4d401e">,
<Stream: itag="243" mime_type="video/webm" res="360p" fps="30fps" vcodec="vp9">,
<Stream: itag="133" mime_type="video/mp4" res="240p" fps="30fps" vcodec="avc1.4d4015">,
<Stream: itag="242" mime_type="video/webm" res="240p" fps="30fps" vcodec="vp9">,
<Stream: itag="160" mime_type="video/mp4" res="144p" fps="30fps" vcodec="avc1.4d400c">,
<Stream: itag="278" mime_type="video/webm" res="144p" fps="30fps" vcodec="vp9">,
<Stream: itag="140" mime_type="audio/mp4" abr="128kbps" acodec="mp4a.40.2">,
<Stream: itag="171" mime_type="audio/webm" abr="128kbps" acodec="vorbis">,
<Stream: itag="249" mime_type="audio/webm" abr="50kbps" acodec="opus">,
<Stream: itag="250" mime_type="audio/webm" abr="70kbps" acodec="opus">,
<Stream: itag="251" mime_type="audio/webm" abr="160kbps" acodec="opus">]
3、其它過濾條件
only_audio=True
:只下載音頻only_video
:只下載視頻subtype='mp4'
:下載擴展名爲“mp4”的文件,包括音頻和視頻res="720p"
:下載清晰度爲720p的視頻abr="64kbps"
:下載碼率爲64kbps的視頻video_codec="vp9"
:下載壓縮格式爲vp9的視頻audio_codec="vorbis"
:下載壓縮格式爲vorbis的音頻
通過itag下載視頻
- YouTube對每個DASH流的視頻源的類型給了一個獨立的id,稱爲itag
- 可通過
get_by_itag
方法下載對應視頻
yt.streams.get_by_itag(22)
- itag對應的類型如下
itag Code | Container | Content | Resolution | Bitrate | Range | VR / 3D |
---|---|---|---|---|---|---|
5 | flv | audio/video | 240p | - | - | - |
6 | flv | audio/video | 270p | - | - | - |
17 | 3gp | audio/video | 144p | - | - | - |
18 | mp4 | audio/video | 360p | - | - | - |
22 | mp4 | audio/video | 720p | - | - | - |
34 | flv | audio/video | 360p | - | - | - |
35 | flv | audio/video | 480p | - | - | - |
36 | 3gp | audio/video | 180p | - | - | - |
37 | mp4 | audio/video | 1080p | - | - | - |
38 | mp4 | audio/video | 3072p | - | - | - |
43 | webm | audio/video | 360p | - | - | - |
44 | webm | audio/video | 480p | - | - | - |
45 | webm | audio/video | 720p | - | - | - |
46 | webm | audio/video | 1080p | - | - | - |
82 | mp4 | audio/video | 360p | - | - | 3D |
83 | mp4 | audio/video | 480p | - | - | 3D |
84 | mp4 | audio/video | 720p | - | - | 3D |
85 | mp4 | audio/video | 1080p | - | - | 3D |
92 | hls | audio/video | 240p | - | - | 3D |
93 | hls | audio/video | 360p | - | - | 3D |
94 | hls | audio/video | 480p | - | - | 3D |
95 | hls | audio/video | 720p | - | - | 3D |
96 | hls | audio/video | 1080p | - | - | - |
100 | webm | audio/video | 360p | - | - | 3D |
101 | webm | audio/video | 480p | - | - | 3D |
102 | webm | audio/video | 720p | - | - | 3D |
132 | hls | audio/video | 240p | - | - | |
133 | mp4 | video | 240p | - | - | |
134 | mp4 | video | 360p | - | - | |
135 | mp4 | video | 480p | - | - | |
136 | mp4 | video | 720p | - | - | |
137 | mp4 | video | 1080p | - | - | |
138 | mp4 | video | 2160p60 | - | - | |
139 | m4a | audio | - | 48k | - | |
140 | m4a | audio | - | 128k | - | |
141 | m4a | audio | - | 256k | - | |
151 | hls | audio/video | 72p | - | - | |
160 | mp4 | video | 144p | - | - | |
167 | webm | video | 360p | - | - | |
168 | webm | video | 480p | - | - | |
169 | webm | video | 1080p | - | - | |
171 | webm | audio | - | 128k | - | |
218 | webm | video | 480p | - | - | |
219 | webm | video | 144p | - | - | |
242 | webm | video | 240p | - | - | |
243 | webm | video | 360p | - | - | |
244 | webm | video | 480p | - | - | |
245 | webm | video | 480p | - | - | |
246 | webm | video | 480p | - | - | |
247 | webm | video | 720p | - | - | |
248 | webm | video | 1080p | - | - | |
249 | webm | audio | - | 50k | - | |
250 | webm | audio | - | 70k | - | |
251 | webm | audio | - | 160k | - | |
264 | mp4 | video | 1440p | - | - | |
266 | mp4 | video | 2160p60 | - | - | |
271 | webm | video | 1440p | - | - | |
272 | webm | video | 4320p | - | - | |
278 | webm | video | 144p | - | - | |
298 | mp4 | video | 720p60 | - | - | |
299 | mp4 | video | 1080p60 | - | - | |
302 | webm | video | 720p60 | - | - | |
303 | webm | video | 1080p60 | - | - | |
308 | webm | video | 1440p60 | - | - | |
313 | webm | video | 2160p | - | - | |
315 | webm | video | 2160p60 | - | - | |
330 | webm | video | 144p60 | - | hdr | |
331 | webm | video | 240p60 | - | hdr | |
332 | webm | video | 360p60 | - | hdr | |
333 | webm | video | 480p60 | - | hdr | |
334 | webm | video | 720p60 | - | hdr | |
335 | webm | video | 1080p60 | - | hdr | |
336 | webm | video | 1440p60 | - | hdr | |
337 | webm | video | 2160p60 | - | hdr | |
394 | mp4 | video | 144p | - | - | |
395 | mp4 | video | 240p | - | - | |
396 | mp4 | video | 360p | - | - | |
397 | mp4 | video | 480p | - | - | |
398 | mp4 | video | 720p | - | - | |
399 | mp4 | video | 1080p | - | - | |
400 | mp4 | video | 1440p | - | - | |
401 | mp4 | video | 2160p | - | - | |
402 | mp4 | video | 2880p | - | - |
關於網絡
- 因爲需要避免西方資本主義思想毒害,網絡經常請求不穩定
- 常見的錯誤會有以下兩種:
HTTPError
URLError
- 使用Pycharm的同學還會遇到
ConnectionResetError
- 前兩種錯誤需要引入
from urllib.error import HTTPError, URLError
- 然後通過where循環,try…except… 來重複請求
yt = None
while True:
try:
yt = YouTube(url)
break
except HTTPError:
self.logger.error("請求出錯一次:HTTPError")
continue
except URLError:
self.logger.error("請求出錯一次:URLError")
continue
streams = yt.streams.filter(subtype='mp4').all()
下載視頻
- 當確認了符合條件的視頻後,可通過
download
的方式直接下載
from pytube import YouTube
yt=YouTube('http://youtube.com/watch?v=9bZkp7q19f0')
mp4=yt.streams.first()
mp4.download(output_path, filename, filename_prefix)
- 其中
download
會接受3個參數:output_path
:視頻輸出路徑;filename
:視頻輸出名稱,默認爲視頻的標題,該名稱不需要擴展名;filename_prefix
:視頻名稱前綴,這裏主要是區分音頻和視頻,因爲音頻和視頻下載後名稱相同,格式相同,前者會被後者覆蓋掉。可以增加前綴來進行區分,比如音頻爲“audio_FilmTitle.mp4”、視頻爲“video_FilmTitle.mp4”