{"id":1178492,"date":"2021-10-15T12:00:21","date_gmt":"2021-10-15T16:00:21","guid":{"rendered":"https:\/\/www.kdnuggets.com\/?p=133890"},"modified":"2021-10-15T12:00:21","modified_gmt":"2021-10-15T16:00:21","slug":"how-our-obsession-with-algorithms-broke-computer-vision-and-how-synthetic-computer-vision-can-fix-it","status":"publish","type":"station","link":"https:\/\/platodata.io\/plato-data\/how-our-obsession-with-algorithms-broke-computer-vision-and-how-synthetic-computer-vision-can-fix-it\/","title":{"rendered":"How our Obsession with Algorithms Broke Computer Vision: And how Synthetic Computer Vision can fix it"},"content":{"rendered":"\n
\n

How our Obsession with Algorithms Broke Computer Vision: And how Synthetic Computer Vision can fix it<\/h1>\n
\n
\n= Previous post<\/strong><\/a><\/div>\n
\nNext post<\/strong> =><\/a><\/div>\n

 
  <\/p>\n

<\/div><\/div>\n
Tags: Algorithms<\/a>, Computer Vision<\/a>, Synthetic Data<\/a><\/div>\n

<\/p>\n

Deep Learning radically improved Machine Learning as a whole. The Data-Centric revolution is about to do the same. In this post, we\u2019ll take a look at the pitfalls of mainstream Computer Vision (CV) and discuss why Synthetic Computer Vision (SCV) is the future. <\/p>\n<\/div>\n

<\/div>\n
<\/p>\n
<\/p>\n
comments<\/a><\/div>\n

By Paul Pop<\/a>, Co-founder and CEO at Neurolabs<\/b><\/p>\n

\"Figure\"
\nSynthetic Computer Vision aims to translate what\u2019s in the Virtual world back to the Real world. (Image by author)<\/span><\/center>
\n <\/p>\n

\ud83d\udde3\ufe0f The Current State of Computer Vision<\/h2>\n

 
\n 
\nAs of today, there has been over $15B worth of investments in over 1,800 Computer Vision startups in the past 8 years, according to 
Crunchbase<\/a>. More than 20 of these companies are currently valued above $1B and there\u2019s a lot more to come according to Forbes<\/a>.<\/p>\n

Why are these companies valued so greatly? To put it simply, they are teaching computers how to see. By doing so, they are automating tasks that have previously been accomplished using human sight.<\/p>\n

This boom followed a 2012 technology inflection<\/strong> point in Computer Vision, with the advent of Neural Networks \u2014 algorithms that mimic the human brain and are trained using colossal amounts of human-labelled data. Since 2012, algorithms have steadily improved and have become a match for humans in many visual tasks, for example counting objects, lip reading<\/a> or cancer screening<\/a>.<\/p>\n

In the 10 years that followed everybody did their part: academia led the way with better algorithms; large companies invested in an army of humans who have diligently labelled these image datasets. Some of these efforts were even open sourced for the benefit of the community, such as ImageNet<\/a>, a 14 million image dataset.<\/p>\n

Unfortunately, now as these systems are getting deployed to productions, we are hitting a brick wall:<\/p>\n

    \n
  1. The labelled data that we have is unreliable<\/strong>. A systematic study from MIT researchers<\/a> of popular ML datasets, found an average error rate of incorrect labelling of 5.93%<\/a> for ImageNet and an average of 3.4% across other datasets.\n<\/li>\n
  2. There is little effort dedicated to solving the data problem<\/strong>. The intellectual efforts of academia are almost entirely focused on algorithm development, ignoring the fundamental need for good data \u2014 a guesstimate by Andrew Ng puts the ratio at 99% algorithm focus vs 1% data<\/a>.\n<\/li>\n
  3. Computer Vision algorithms don\u2019t generalise<\/strong> well from one domain to another An algorithm trained to detect cars in the south of France will struggle to detect the same car in snowy Norway. Likewise a system trained on specific cameras might fail with another camera make and model.\n<\/li>\n<\/ol>\n

    \u265f\ufe0f Searching for inspiration<\/h2>\n

     
    \n 
    \nAlready in 1946 Alan Turing suggested chess as a benchmark for computer capabilities, which was since throughly researched receiving a lot of media attention.<\/p>\n

    A commonly accepted way to measure performance in chess is through the Elo rating system<\/a>, which provides a valid comparison of player skills. The graph below shows world champions and chess game engines. The human performance is hovering around the 2800 rating for the past 50 years, which is then suppressed by computers in 2010.<\/p>\n

    Until the last decade, we humans have designed chess algorithms to play based on rules we could design and understand. The Deep Learning revolution allowed us to break beyond human understanding, bringing a leap forward \u2014 just like it has for Computer Vision.<\/p>\n

    \"Figure\"
    \nChess engine and human ELO ratings (Image by author)<\/span><\/center>
    \n <\/p>\n

    As good as the progress of Deep Learning chess game engines was, it has now been suppressed by the next level of chess engine: AlphaZero<\/strong> from DeepMind. What\u2019s more impressive, is that AlphaZero did not use any human sourced data<\/strong> to achieve this performance. It was built without any knowledge of historical chess games, or any human guidance for finding optimal moves. AlphaZero was the teacher and the student \u2014 it taught itself how to better play the game by competing against itself and learning through the process.<\/p>\n

    AlphaZero won against Stockfish 8<\/a>, best engine at the time, without losing a single game, keeping that edge even when Alpha Zero was given an order of magnitude less time<\/a> to compute its next move.<\/p>\n

    Considering the remarkable improvements that AlphaZero, one has to wonder: Can we translate its success in chess to Computer Vision?<\/em><\/strong>
    \n <\/p>\n

    \ud83d\udcf0 The new wave: Data-Centric AI<\/h2>\n

     
    \n 
    \nWithin the new paradigm of 
    Data Centric AI<\/a>, the goal is not to create better algorithms, but increase performance by changing the data itself. Even if we disregard the hurdle of obtaining and labelling image datasets in the first place, questions still remain around the quality of the data: are we uniformly covering all possible use cases? is the data covering edge cases?<\/p>\n

    If we are to follow the path of Data-Centric Computer Vision, one must be in control of the data sourcing process. The data needs to be balanced and we need to have a good understanding of the parameters that are influencing what a Computer Vision model learns.<\/p>\n

    Let\u2019s take a simple example in which we look at controlling 3 of such parameters: camera angle, lighting and occlusions. Can you imagine gathering a real dataset in which you have to diligently control the values of only these 3 parameters, whilst gathering 1000s of relevant images? With real data, the task is Sisyphean.<\/p>\n

    \ud83d\udcbe How do we manage data today?<\/h2>\n

     
    \n 
    \nIn the past 5 years, there we have made tremendous progress in optimising the data gathering process and the quality of the data labels. Moreover, we have learned to make the most of the datasets, by using a variety of data augmentation<\/em> techniques. Given an image in our dataset, we apply mathematical functions to it in order to create more variety in our data.<\/p>\n

    There are now over 400 companies with a total market value of $1.3T<\/a> (a little over the market value of Facebook, ) catering to the data needs of our latest algorithms.<\/p>\n

    But does the current path lead to a dead end? Are we reaching the limits of the algorithms built on top of human sourced datasets? Like in chess, as long as we\u2019re using human sourced data as input for our algorithms, we\u2019re bound by design not to significantly surpass our own abilities.<\/p>\n

    In chess, the post-Deep Learning breakthrough came once we\u2019ve stopped building on suboptimal human data and allowed the machines to build their own data in order to optimise what they learn. In computer vision we must do the same, allowing the machine to generate the data they need to optimise its own learning.<\/em><\/p>\n

    \ud83c\udfd4 What\u2019s next for Computer Vision?<\/h2>\n

     
    \n 
    \nThe truly scalable way of creating training data is through Virtual Reality engines<\/strong>. In terms of fidelity, the output has 
    become indistinguishable<\/a> from the real world, giving full scene control to the user. This allows the user to generate smart data<\/strong>, that is truly useful for the Computer Vision model to learn. Synthetic Data can become the bedrock needed for the new Data-Centric AI framework<\/strong>.<\/p>\n

    We have good reasons to believe that the time for wide adoption of visual Synthetic Data is now.<\/p>\n