

計算機視覺(Computer Vision)簡稱CV


維基百科對其定義 (2019.6) 是:

Computer vision is an interdisciplinary field that deals with how computers can be made to gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to automate tasks that the human visual system can do. "Computer vision is concerned with the automatic extraction, analysis and understanding of useful information from a single image or a sequence of images. It involves the development of a theoretical and algorithmic basis to achieve automatic visual understanding." As a scientific discipline, computer vision is concerned with the theory behind artificial systems that extract information from images. The image data can take many forms, such as video sequences, views from multiple cameras, or multi-dimensional data from a medical scanner. As a technological discipline, computer vision seeks to apply its theories and models for the construction of computer vision systems.

計算機視覺是一個跨學科領域,涉及如何使計算機從數字圖像或視頻中獲得高層次的理解。從工程的角度來看,它尋求自動化人類視覺系統可以完成的任務。 “計算機視覺涉及從單個圖像或一系列圖像中自動提取,分析和理解有用信息。它涉及開發理論和算法基礎以實現自動視覺理解。”作爲一門科學學科計算機視覺關注從圖像中提取信息的人工系統背後的理論。圖像數據可以採用多種形式,例如視頻序列,來自多個相機的視圖或來自醫學掃描儀的多維數據。作爲一門技術學科,計算機視覺旨在將其理論和模型應用於計算機視覺系統的構建。

計算機圖形學(Computer Graphics)簡稱CG

維基百科對其定義 (2019.6) 是:

Computer graphics is a sub-field of Computer Science which studies methods for digitally synthesizing and manipulating visual content. Although the term often refers to the study of three-dimensional computer graphics, it also encompasses two-dimensional graphics and image processing.


圖像處理(Image Processing)簡稱IP

維基百科HRS Academy對其定義是:

In imaging science, image processing is processing of images using mathematical operations by using any form of signal processing for which the input is an image, a series of images, or a video, such as a photograph or video frame; the output of image processing may be either an image or a set of characteristics or parameters related to the image. Most image-processing techniques involve treating the image as a two-dimensional signal and applying standard signal-processing techniques to it. Images are also processed as three-dimensional signals where the third-dimension being time or the z-axis.


Image processing usually refers to digital image processing, but optical and analog image processing also are possible. Image processing is a method to convert an image into digital form and perform some operations on it, in order to get an enhanced image or to extract some useful information from it. It is a type of signal dispensation in which input is image, like video frame or photograph and output may be image or characteristics associated with that image. Usually Image Processing system includes treating images as two dimensional signals while applying already set signal processing methods to them. The acquisition of images (producing the input image in the first place) is referred to as imaging.


維基百科對Digital image processing的定義(2019.6)是:

In computer science, digital image processing is the use of computer algorithms to perform image processing on digital images. As a subcategory or field of digital signal processing, digital image processing has many advantages over analog image processing. It allows a much wider range of algorithms to be applied to the input data and can avoid problems such as the build-up of noise and signal distortion during processing. Since images are defined over two dimensions (perhaps more) digital image processing may be modeled in the form of multidimensional systems.



2.1 精簡的概括

        Computer Graphics 和 Computer Vision 是同一過程的兩個方向。Computer Graphics將抽象的語義信息轉化成圖像,Computer Vision從圖像中提取抽象的語義信息。Image Processing探索的是從一個圖像或者一組圖像之間的互相轉化和關係,與語義信息無關。

2.2 從輸入輸出角度看


  1. Computer Graphics,簡稱 CG 。輸入的是對虛擬場景的描述,通常爲多邊形數組,而每個多邊形由三個頂點組成,每個頂點包括三維座標、貼圖座標、rgb 顏色等。輸出的是圖像,即二維像素數組。
  2. Computer Vision,簡稱 CV。輸入的是圖像或圖像序列,通常來自相機、攝像頭或視頻文件。輸出的是對於圖像序列對應的真實世界的理解,比如檢測人臉、識別車牌。
  3. Digital Image Processing,簡稱 DIP。輸入的是圖像,輸出的也是圖像。Photoshop 中對一副圖像應用濾鏡就是典型的一種圖像處理。常見操作有模糊、灰度化、增強對比度等。


  1. 計算機圖形學的輸入是模型,輸出是圖像(像素)
  2. 計算機視覺的輸入是圖像(攝像機拍攝的照片或視頻),輸出是模型
  3. 數字圖像處理的輸入是圖像(像素),輸出也是圖像(像素)


  • CG 中也會用到 DIP,現今的三維遊戲爲了增加表現力都會疊加全屏的後期特效,原理就是 DIP,只是將計算量放在了顯卡端。
  • CV 更是大量依賴 DIP 來打雜活,比如對需要識別的照片進行預處理。

        最後還要提到近年來的熱點——增強現實(AR),它既需要 CG,又需要 CV,當然也不會漏掉 DIP。它用 DIP 進行預處理,用 CV 進行跟蹤物體的識別與姿態獲取,用 CG 進行虛擬三維物體的疊加。



2.3 從問題本身看



  1. Computer Graphics是一個Forwad Problem (Z|X): 給你光源的位置,物體形狀,物體表面信息,你如何根據已有的變量的狀態模擬出一個環境出來。
  2. Computer Vision正好相反,是一個Inverse Problem (X|Z):你所有能得到的都是觀測信息(measurements), 根據得到的每一個Pixel的信息(顏色,深度),我要來估計物體環境的特徵和狀態出來,比如物體運動(Tracking),三維結構(SFM),物體類別(Classification and Segmentation)等等。
  3. 對於Image Processing來說,它恰好介於兩者之間,兩種問題都有。但對於State-of-art的研究來說,Image Processing更偏於Computer Vision, 或者看上去更像Computer Vision的子類。

        儘管這三類研究中,隨着CV領域的不斷進步,以及越來越高級相機傳感器出現(Depth Camera, Event Camera),很多算法都被互相用到,但是從Motivation來看,並沒有太大變化。


        得益於這幾個領域的共同進步,所以你能看到Graphics和Computer Vision現在出現越來越多的交集。如果根據觀測量(圖片), Computer Vision 可以越來越準確的估計出越來越多的變量,那麼這些變量套到Graphics算法中,就可以模擬出一個跟真實環境一樣的場景出來。


2.3 從最終目標角度看

Computer Vision 的終極目標是模仿人眼和大腦對看到的真實事物的理解,關鍵詞是“真實”和“理解”,如人臉識別;
Computer Graphics 的終極目標是創造非真實的視覺感知,關鍵詞是“非真實”和“創造”,如3D特效;
Image Processing 的終極目標是圖像轉換,像素級的處理,關鍵就是圖像與圖像的轉換,涉及信號處理,如給圖片加濾鏡。

2.4 從學科分類來看

Computer Science/ Artificial Intelligence/ Computer Vision
Computer Science/ Computer Graphics and Visualization
Electrical Engineering/ Signal Processing/ Digital Signal Processing/ Digital Image Processing









還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.