{"id":1178492,"date":"2021-10-15T12:00:21","date_gmt":"2021-10-15T16:00:21","guid":{"rendered":"https:\/\/www.kdnuggets.com\/?p=133890"},"modified":"2021-10-15T12:00:21","modified_gmt":"2021-10-15T16:00:21","slug":"how-our-obsession-with-algorithms-broke-computer-vision-and-how-synthetic-computer-vision-can-fix-it","status":"publish","type":"station","link":"https:\/\/platodata.io\/plato-data\/how-our-obsession-with-algorithms-broke-computer-vision-and-how-synthetic-computer-vision-can-fix-it\/","title":{"rendered":"How our Obsession with Algorithms Broke Computer Vision: And how Synthetic Computer Vision can fix it"},"content":{"rendered":"\n
<\/p>\n
<\/p>\n
Deep Learning radically improved Machine Learning as a whole. The Data-Centric revolution is about to do the same. In this post, we\u2019ll take a look at the pitfalls of mainstream Computer Vision (CV) and discuss why Synthetic Computer Vision (SCV) is the future. <\/p>\n<\/div>\n
By Paul Pop<\/a>, Co-founder and CEO at Neurolabs<\/b><\/p>\n Why are these companies valued so greatly? To put it simply, they are teaching computers how to see. By doing so, they are automating tasks that have previously been accomplished using human sight.<\/p>\n This boom followed a 2012 technology inflection<\/strong> point in Computer Vision, with the advent of Neural Networks \u2014 algorithms that mimic the human brain and are trained using colossal amounts of human-labelled data. Since 2012, algorithms have steadily improved and have become a match for humans in many visual tasks, for example counting objects, lip reading<\/a> or cancer screening<\/a>.<\/p>\n In the 10 years that followed everybody did their part: academia led the way with better algorithms; large companies invested in an army of humans who have diligently labelled these image datasets. Some of these efforts were even open sourced for the benefit of the community, such as ImageNet<\/a>, a 14 million image dataset.<\/p>\n Unfortunately, now as these systems are getting deployed to productions, we are hitting a brick wall:<\/p>\n A commonly accepted way to measure performance in chess is through the Elo rating system<\/a>, which provides a valid comparison of player skills. The graph below shows world champions and chess game engines. The human performance is hovering around the 2800 rating for the past 50 years, which is then suppressed by computers in 2010.<\/p>\n Until the last decade, we humans have designed chess algorithms to play based on rules we could design and understand. The Deep Learning revolution allowed us to break beyond human understanding, bringing a leap forward \u2014 just like it has for Computer Vision.<\/p>\n As good as the progress of Deep Learning chess game engines was, it has now been suppressed by the next level of chess engine: AlphaZero<\/strong> from DeepMind. What\u2019s more impressive, is that AlphaZero did not use any human sourced data<\/strong> to achieve this performance. It was built without any knowledge of historical chess games, or any human guidance for finding optimal moves. AlphaZero was the teacher and the student \u2014 it taught itself how to better play the game by competing against itself and learning through the process.<\/p>\n AlphaZero won against Stockfish 8<\/a>, best engine at the time, without losing a single game, keeping that edge even when Alpha Zero was given an order of magnitude less time<\/a> to compute its next move.<\/p>\n Considering the remarkable improvements that AlphaZero, one has to wonder: Can we translate its success in chess to Computer Vision?<\/em><\/strong> If we are to follow the path of Data-Centric Computer Vision, one must be in control of the data sourcing process. The data needs to be balanced and we need to have a good understanding of the parameters that are influencing what a Computer Vision model learns.<\/p>\n Let\u2019s take a simple example in which we look at controlling 3 of such parameters: camera angle, lighting and occlusions. Can you imagine gathering a real dataset in which you have to diligently control the values of only these 3 parameters, whilst gathering 1000s of relevant images? With real data, the task is Sisyphean.<\/p>\n There are now over 400 companies with a total market value of $1.3T<\/a> (a little over the market value of Facebook, ) catering to the data needs of our latest algorithms.<\/p>\n But does the current path lead to a dead end? Are we reaching the limits of the algorithms built on top of human sourced datasets? Like in chess, as long as we\u2019re using human sourced data as input for our algorithms, we\u2019re bound by design not to significantly surpass our own abilities.<\/p>\n In chess, the post-Deep Learning breakthrough came once we\u2019ve stopped building on suboptimal human data and allowed the machines to build their own data in order to optimise what they learn. In computer vision we must do the same, allowing the machine to generate the data they need to optimise its own learning.<\/em><\/p>\n We have good reasons to believe that the time for wide adoption of visual Synthetic Data is now.<\/p>\n <\/p>\n Why not go a step further? What about a world in which humans are not needed to label images<\/strong> for Computer Vision? In Synthetic Computer Vision (SCV), we train Computer Vision models using Virtual Reality engines and deploy the models in the real world.<\/p>\n<\/blockquote>\n Deep Mind showed that AlphaZero was only the start of the road as they\u2019ve applied the same principles to Go<\/a>, Starcraft<\/a> and protein folding<\/a>. Today, we have all the necessary building blocks to build an AlphaZero for Computer-Vision<\/em>, a self-learning system that is not limited by human input<\/strong> by design. A system that is capable of creating and manipulating virtual scenes through which it teaches itself how to solve Visual Automation tasks.<\/p>\n It’s 2021 and we are only at the beginning of the road. Keep in mind that Synthetic Data is only one part of the puzzle<\/strong> that awaits to be solved!<\/p>\n Related:<\/b><\/p>\n
\nSynthetic Computer Vision aims to translate what\u2019s in the Virtual world back to the Real world. (Image by author)<\/span><\/center>
\n <\/p>\n\ud83d\udde3\ufe0f The Current State of Computer Vision<\/h2>\n
\n
\nAs of today, there has been over $15B worth of investments in over 1,800 Computer Vision startups in the past 8 years, according to Crunchbase<\/a>. More than 20 of these companies are currently valued above $1B and there\u2019s a lot more to come according to Forbes<\/a>.<\/p>\n\n
\u265f\ufe0f Searching for inspiration<\/h2>\n
\n
\nAlready in 1946 Alan Turing suggested chess as a benchmark for computer capabilities, which was since throughly researched receiving a lot of media attention.<\/p>\n
\nChess engine and human ELO ratings (Image by author)<\/span><\/center>
\n <\/p>\n
\n <\/p>\n\ud83d\udcf0 The new wave: Data-Centric AI<\/h2>\n
\n
\nWithin the new paradigm of Data Centric AI<\/a>, the goal is not to create better algorithms, but increase performance by changing the data itself. Even if we disregard the hurdle of obtaining and labelling image datasets in the first place, questions still remain around the quality of the data: are we uniformly covering all possible use cases? is the data covering edge cases?<\/p>\n\ud83d\udcbe How do we manage data today?<\/h2>\n
\n
\nIn the past 5 years, there we have made tremendous progress in optimising the data gathering process and the quality of the data labels. Moreover, we have learned to make the most of the datasets, by using a variety of data augmentation<\/em> techniques. Given an image in our dataset, we apply mathematical functions to it in order to create more variety in our data.<\/p>\n\ud83c\udfd4 What\u2019s next for Computer Vision?<\/h2>\n
\n
\nThe truly scalable way of creating training data is through Virtual Reality engines<\/strong>. In terms of fidelity, the output has become indistinguishable<\/a> from the real world, giving full scene control to the user. This allows the user to generate smart data<\/strong>, that is truly useful for the Computer Vision model to learn. Synthetic Data can become the bedrock needed for the new Data-Centric AI framework<\/strong>.<\/p>\n\n
\ud83d\udc41\ufe0f\u200d\ud83d\udde8\ufe0f Synthetic Computer Vision (SCV)<\/h2>\n
\n
\nHaving access to the right tools to build our own data, we can envision a world in which Computer Vision algorithms are developed and trained without the tedious process of manual data labelling. Gartner<\/a> predicts that Synthetic Data will be more predominant than real data within the next 3 years.<\/p>\n
\n <\/p>\n\ud83d\ude80 The future is bright<\/h2>\n
\n
\nWith Synthetic Computer Vision, we build in Virtual Reality and deploy for the Real world. The same way that AlphaZero taught itself what\u2019s important in chess, we let the algorithms decide what they need to see in order to optimally learn.<\/p>\n
<\/p>\n\n
<\/p>\n\ud83d\udd2cBeyond RGB images<\/h2>\n
\n
\nReality is much more than what the human eye can see<\/a>. The algorithms that we\u2019ve built are mostly focused on what a human can understand and label. But it does not have to be like that\u200a\u2014\u200awe can build algorithms for sensors that measure beyond human perception. And we can train these algorithms programatically in Virtual Reality, without having doubts over their validity.<\/p>\n\ud83e\udd13 Smarter not harder<\/h2>\n
\n
\nInstead of building larger models and using more computational power to solve our problems, we can be smart about how we source data from which our algorithms learn. Algorithms don’t need more of the same data to learn, they need a variety of everything.<\/p>\n\ud83d\udd2d The pioneers in Synthetic Data generation<\/h2>\n
\n
\nThe foundation for Synthetic Computer Vision is provided by the Synthetic Data<\/strong> it is built upon. There are roughly 30 early stage companies<\/a> operating in the visual Synthetic Data Generation space. Some are focused on a specific use case in one vertical, while the majority are operating horizontally across multiple verticals.<\/p>\n
\nSynthetic Data companies grouped by focus (Image by author).<\/span><\/center>
\n <\/p>\n\u2753Questions for you, Dear Reader<\/h2>\n
\n <\/p>\n\n
\n
\n
\nBio: Paul Pop<\/a><\/b> is Co-founder and CEO at Neurolabs. Background in Computer Science and AI from the University of Edinburgh and have been working in Computer Vision for the past decade. Led the team that build the Computer Vision player tracking system used in most European football leagues today, whilst at Hudl. <\/p>\n\n
\n
<\/p>\n