The word “Computer Vision” or “Machine Vision”, refers to when a computer or machine can see a thing almost as a human see. The computer vision is different from CC Cameras (Close-Circuit Camera), because CC Camera just streams the objects to Monitor, but in computer vision, computer (by some algorithms) can interpret the objects in stream and for example can detect faces or humans or animals in pictures or videos or even in online video streams. First technologies for computer vision, appeared in early 2000’s. The OpenCV almost was the first popular tool for computer vision which created by “Intel” Corporation, in 2000. The first Algorithms for object detection was the “Viola Jones Algorithm” which invented in 2001 by Paul Viola and Michael Jones. This algorithm is based on “Haar like features” and it estimates any important feature of an object by a Haar like shape and if an object in a picture or video, has all of this Haar like features, then that is our desired object. A sliding window with specified size (which is customized by programmer), starts from top left of image or video and it moves to right after each step and then to down after end of the row. Each window that has our desired object will be covered by a colored rectangle in image or video. “Haar like features” and “Sliding Window” are shown in below pictures.
Above algorithm works well for simple features like face-detection or car-detection in images or videos. But when you have lots of features in an image or video and you want to detect all of the objects, then you need more advance tool for this purpose (for example in a picture, you want detects humans, all type of animals by its name or all type of objects like car, motorcycle, chair, window and …). After 2006-2007 and by introducing deep learning and growth of CNNs (Convolutional Neural Networks) and also by advances in speed of CPU and GPU of computers, Deep Learning methods came in action to solve more complex computer vision problems.
At the same time, couple of powerful frameworks for deep learning invented in python language and also in C++ (Keras, TensorFlow, PyTorch in Python and Caffe in C++). This frameworks helped machine learning scientist a lot, to train their models and solve complex problems in computer vision and etc. Some advanced competitions like “ImageNet Large Scale Visual Recognition Challenge (ILSVRC)” held during recent years to allow machine learning researcher to invent more efficient models, which can detect objects in pictures and videos. Training a model for problems like ImageNet, with fastest computers and GPUs, take couple of days and sometime couple of weeks. Thanks to “Open-Source Software Community”, many researchers published their source code and also their trained-model in github and elsewhere, so anyone can download them and work on them to learn more or trying to improve the model. It seems the only current solution for advance and complex vision problems, is deep learning, which obviously has a lot of computation cost and researches try to reduce computation costs by new advance algorithms.