June 2017

Deep attention networks are one of the core technologies we use in D.Day Labs. The idea (in a nutshell) is to focus the neural network on the important features of the data (text or even an image). Our research team incorporated attention network as part of the data processing pipeline in DataSense. Attention networks allow DataSense to achieve state-of-the-art accuracy in classification.

One of the biggest criticism about deep learning, is its inability to explain how a prediction is made. It is almost impossible to find the mathematical reasoning behind the decision, let alone a reasoning that will make sense to people. This also imposes a GDPR risk, as was published in a recent Oxford paper.

For the first time, we released an AI-driven product, that sheds light on the way that machines “think”. Amir Balaish, our data science leader and an attention networks expert,  gave a talk in a recent PyData conference (see below). Amir explained the technical base and mathematical grounds of attention networks with real-world uses in text and image processing.

Although we cannot share our clients’ data, we can show our results on public data. The Yelp dataset, is widely used among NLP researchers; the task is to predict whether a restaurant review is positive or negative.

In the example below, we processed the reviews with the deep neural network. The words in red were the most influential in the hierarchical attention network decision.

A bad review:

A good review:

And of course, Amir’s talk.

 

A scanned document is a key data source in big enterprises, such as your bank and insurance company. Documents which are not filled-in correctly impose an immense legal risk. Any party of a contract might deny its agreement to the terms. Until now, documents were validated in a manual way. Humans are prone to error, so it’s impossible to validate documents on a large scale. Another aspect is the GDPR consent, which can be in the form of a scanned document (for instance, as part of know-your-client [KYC] process).

In D.Day Labs, we believe that data intelligence is key to every aspect of data processing. Due to an increasing demand from our client base, we developed scanned documents validation, as can be seen in the images below. The correct fields are marked in green, while missing fields are marked in red. DataSense classifies the document category based on artificial intelligence. Then our algorithms use unconventional deep learning and computer vision to find the missing fields.

Artificial intelligence creates new opportunities to solve old problems in data governance. Stay tuned for more research updates!