What is Computer Vision?


What is Computer Vision?

Computer vision is a field in Artificial Intelligence and Computer Science that helps computers understand like a human.

Building a computer like a human is not easy, not only it’s complicated but also we still haven’t thoroughly understood its operation.

No one considered it easy, except the pioneer in AI,  Marvin Minsky, in 1966, who famously instructed his students to "connect the camera to a computer and let it describe what it sees”. But that was 50 years ago and now the research is still unfinished.

This human visual simulation process is divided into three sequential stages (similar to how humans see): eye simulation (acquiring  - difficult), visual cortex stimulation (processing - very difficult) and simulation of the rest of the brain (analyzing - hardest).


Eye simulation is the field that we achieve the most success. In several years, humans created sensors and image processors that resemble (and to some extent better) the ability to see like human eyes.

Larger, optically perfect lenses and semiconductor sub-pixels as small as nanometers make today's cameras amazingly accurate and responsive. The camera can take thousands of pictures per second and detect it from a distance with high accuracy. Despite their high fidelity, these devices are no better than 19th-century pinhole cameras. They merely record the distribution of the photons in the intended direction. The best camera sensor is also not able to detect a ball, let alone catch them. In other words, the hardware is limited without software - until now still has been the most difficult. However, cameras nowadays are flexible and the best foundation for research.


The brain is built from scratch with images that gradually fill the mind, it does more of the visual tasks than any other work and this all comes down to the cellular level. Billions of cells work together to make patterns, capture signals.

A group of neurons will tell other groups when there is a difference along a line (in a certain angle, like moving faster or in another direction). The high-level neural network combines these patterns into supermodel: circle, upward movement. More information will be added gradually: white circle, black line, increasing size, etc. The image will appear when new information is added.

The first study about computer vision showed that a web of neurons is so complicated that it was incomprehensible to explain top-down approach: the book looks like this> so there's this pattern> otherwise it would look like this, etc. 

As for some subjects, this is also effective, but when describing each object, from multiple perspectives, variations in colour, motion, and more, imagine how difficult it will be. Even a baby's cognitive level will require immensely large amounts of data.

A top-down approach that mimics the way the brain works seems more promising. The computer can apply the transformation sequence to the image and find out the contours, the object it refers to, the angle of view, the movement, etc. This process requires a lot of computational and statistical numbers but also only by the number of figures, images it was once taught - as well as on the human brain.

The image above (from E-lab of Purdue University) shows that a computer can display (according to its algorithm) those highlighted objects having similar appearance and nature like other examples of that object, according to some degree of statistical certainty.

Proponents of this approach may say “I told you” that until recent years, creating and operating artificial neural networks has been very difficult due to the enormous amount of computation. Progress in parallel computing has alleviated this difficulty. The years have seen an explosion of research and use of this system to mimic the human brain. Pattern recognition is still accelerating, and we are making continuous progress.


Of course, you can build a system that recognizes an apple at every angle, in any situation, despite standing or moving, even if it is bitten or intact, but still can’t recognize an orange. It also can’t tell what an apple is, if it’s edible, big or small or its purposes. It means even good software and hardware can’t operate without an operating system. 

That’s the rest of the brain: long-term/short-term memory, sensory data, attention, perception, lessons about interacting with the world, etc. written on the neural network that is more complicated than anything we've ever seen, in a way that we just can't understand.

This is a place for computer science and artificial intelligence coming together. Between computer scientists, engineers, psychologists, neuron scientists, philosophists, they still haven’t defined the way a brain works, let alone simulation. Although it was in its infancy, computer vision was still extremely useful. It's in the camera to recognize your face (Face ID) and smile. It helps self-driving cars recognize signs, pedestrians. It's in robots in factories, recognizing products, transmitting to humans. The road is long to the day they can see like humans but on that road, the things they do also are amazing.

We, Hachinet Software is a Vietnamese IT outsourcing company providing software solutions and business systems to companies and factories in Japan and all around the world, consists of: 

Frontend: HTML5, CSS3, Bootstrap, AngularJS, CoffeeScript
Backend: Java, C #, ASP.NET, C ++, VB.NET, COBOL, Python, Ruby, PHP
Mobile: iOS, Android
Web technologies: Ruby, .Net, PHP, C #, Java, COBOL ...
Web application development

Medical system
Reservation system

Contact Us
The personal information you submit will be used only for responding to inquiries, providing information on products and services, and providing useful information, and will not be used for any other purpose. If you are interested in our service or are looking for an Outsourcing partner in Vietnam, please feel free to contact us via our email contact@hachinet.com