Deep Learning Surface

Deep Learning is a tool in the Machine Learning (ML) toolbelt which is a tool in AI and Data Science toolbets. Think of it as an algorithm subset of a larger picture of algorithms and it’s area of expertise is solving some of the more complex problems out there like natural language processing (NLP), computer vision and automatic speech recognition (ASR). Like when you talk to the customer service computer voice on the phone vs. push a button.

Why am I writing about this?  Because its was the topic of my tech talk at Zipfian this week. I chose Deep Learning because I have an interest in making technology smarter, and I was clued into this area of ML as being more advanced in getting computers to act in a more intelligent and human way.

My research was only able to skim the surface because it is an involved topic that would take some time to study above and beyond what Zipfian is covering. Below is a summary of some key points I covered in my talk and additional insights. Also, the presentation slides are at this link.

Deep Learning in a nutshell:

  • Learning algorithms that model high level abstraction
  • Neural networks with many layers are the main structures
  • Term coined in 2006 when Geoff Hinton proved neural net impact

Experts

To his credit, Hinton and others have been working on research in this field since the ’80s despite lack of interest and minimal funding. It’s been a hard road for them that has finally started to pay off. AI and neural networks in general have actually been explored since the ’50s, but the biggest problems in that space in general has been computer speed and power. It really wasn’t until the last decade that significant progress and impact has been seen. For example, Google has a project called Brain that can search for specific subjects in videos (like cat images).

I mention Hinton because he’s seen as a central driver of Deep Learning and many look to him to see what’s next. He also organized the Neural Computation and Adaptive Perception (NCAP) group in 2004 that is invite only with some of top researchers and talent in the field. The goal was to help move Deep Learning research forward faster. Actually, many of those NCAP members have been hired by some of the top companies out there diving deep into the research in the last few years. For example:

  • Hinton and Andrew Ng at Google
  • Yann LeCun at Facebook
  • Terrance Sejnowski at US BRAIN Initiative

Its a field that technically has been around for a while but is really taking off with what technology is capable now.

Structure

Regarding the structure, neural networks are complex and originally modeled after the brain. They are highly connected nodes (processing elements) that process inputs based on a statistical, adaptive weights. Basically you pass in some chaotic set of inputs (it has lost of noise) and the neural net puts it together as an output. Its assembling a puzzle with all the pieces you feed it.

Below is a diagram of a neural net from a presentation Hinton posted.

The overall goal of neural networks are feature engineering. Its about defining the key attributes/characteristics of the pieces that make up the puzzle you are constructing, and determining how to weight and use them to drive out the overall result. For example, a feature could be a flat edge and you would weight nodes (apply rules) to place those pieces as a boundary of the puzzle. The nodes would have some idea of how to pick up pieces and put them down to create the puzzle.

In order to define weights for nodes, the neural net model is pre-trained on how to put the puzzle together, and the pre-training is driven by an objective function. Objective functions are a mathematical optimization technique to help select the best element from some sort of available alternatives. The function changes depending on the goals of the network. For example, you will have a different set of objectives for automatic speech recognition if you have an audience in the US vs. Australia. So your objectives will take those differences into account to help adjust node weights through each training example and improve upon the output.

A couple other concepts regarding neural nets and Deep Learning are feedfoward and backpropagation (backward propagation of errors). Feedforwad structure passes input through a single layer of nodes where there is an independence on the inputs and unsupervised learning. So nodes can’t see what each other is holding in regards to pieces and can only use their pre-trained weights to help adjust / put the pieces in a place they think is best for the output. Restricted Boltzmann Machine and Denoising Autoencoders are examples of feedforward structures.

Backpropagation is multi-layered / stacked structures that are supervised learning. It tweaks all weights in the neural network based on outputs and defined labels for the data. Backprop can look at the output of the nodes at different points in the process of constructing the final picture (see how the pieces are starting to fit together). If the picture seems to have errors / pieces not coming together then it can adjust weights in the nodes throughout the network to improve results. Gradient descent is another optimization technique that is regularly used as an alternative to backprop. Example backprop neural networks include Deep Belief and Convolutional Neural Networks (regularly used in video processing).

Last Thoughts

So I for one would love to see a Data from Next Gen or a Sarah from her but neural nets are a far off step to create that level of “smart tech”.  Plus, as mentioned above they are one tool in the bigger picture of AI. They are a very cool tool and definitely beating out other algorithms in regards to complexity of problem solving. They are fantastic at classification, prediction, pattern recognition and optimization but they are weak in areas like covering logical inferences, integrating abstract knowledge (‘sibling’ or ‘identical to’) and making sense of stories.

On the whole Deep Learning is a fascinating space for the problems it can handle and is continuing to solve. It will be interesting to see what problems it solves next (esp. with such big names putting research dollars behind it). Below are references that I used to put together this overview and there is plenty more material on the web for additional information.

References

Below are references I used while researching the topic. Its not exhaustive list but it is a good start.

Side Note on Zipfian

On the whole it was another hectic week. In a very short note, we covered graph theory, NetworkX, k-means algorithm, and clustering overall. There was a lot more detail to all of that but I’ve considering my coverage above, I’m leaving the insight at that for this week.

Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s