The Issue | Computers are nowhere near understanding complex visual situations
Image Recognition (or “Computer Vision”) makes it possible for applications using deep learning algorithms to understand imagery, similarly to the way NLP (Natural Language Processing) allows a computer to understand text. It is a machine’s ability to identify objects, places, people, writings and actions in image. However, what comes naturally to a human brain is, at least at this moment in time, incredibly difficult for computers. For a machine, learning how to understand what it sees, represents a difficult challenge: a computer needs to be able to identify and differentiate objects, recognize human faces, interpret various situations etc.
Consider the following example:
A human being would identify this as a violent situation in which ISIS operatives are leading captives to their death. A computer algorithm however (even a well-trained one), might label this situation as “people at the beach”. This is of course correct in a sense and yet demonstrates the machine’s inability to understand what the image is actually about, and what are its key elements.
The Solution | Shifting from Situational Understanding to Object Identification
Computer scientists will continue to push towards achieving General AI (computers with a holistic, cross-disciplinary understanding of the way things work, capable of analyzing and understanding complex situations). But until we reach that point in time, we’re still relying on Narrow AI – computers which can solve very specific problems. With this in mind, we’re now tackling visual brand safety by training algorithms to not necessarily understand a complex situation in its entirety but to identify specific objects or elements which could be indicators of unsafe content. In the above example, a computer might not be able to fully understand the situation (ISIS prisoners being marched to their death), but we can train our algorithms to identify prison uniforms, which have a distinct look and color.
So, How is Visual Brand Safety Actually Done, in Practice?
The visual brand safety process looks to give a predictive “risk” score, based on as much data as possible. This is not a deterministic decision, but with enough data – we can achieve a very accurate prediction. Initially, the system will scan the image and identify any elements which are considered potentially unsafe for advertisers. These include nudity, weapons, explosions, blood and many other elements which we can train our algorithms to identify with great precision. Any unsafe elements identified will be given an individual “risk’ score, on top of which we’ll add additional data we can pull from the page (URL date, meta-data, additional textual data).
In the example below, our algorithm would identify (a) 2 guns (b) a military uniform and (c) 2 prisoner uniforms
Once grouped together with the additional page-level date, we can provide an aggregated score for that specific image and determine the safety level of that piece of content. If we’re running visual brand-safety on video, we can also add audio elements (expletive language, gunfire or explosions, screaming…) to the equation and increase even further the accuracy of our prediction.
Looking to the Future
As general AI continues to develop, we will be able to transition from object/element identification to real situational understanding. At the moment though, brand-safety should adopt more image recognition tools and develop capabilities which will increase prediction accuracy and help advertisers stay safe in less textual, UGC environments.