Deep learning architectures

Image

Connectionist architectures have existed for more than 70 years, but new architectures and graphical processing units (GPUs) brought them to the forefront of artificial intelligence. The last two decades gave us deep learning architectures, which greatly expanded the number and type of problems neural networks can address. This article introduces five of the most popular deep learning architectures recurrent neural networks (RNNs), long short-term memory (LSTM)/gated recurrent unit (GRU), convolutional neural networks (CNNs), deep belief networks (DBN), and deep stacking networks (DSNs) and then explores open source software options for deep learning.

Deep learning isn’t a single approach but rather a class of algorithms and topologies that you can apply to a broad spectrum of problems. While deep learning is certainly not new, it is experiencing explosive growth because of the intersection of deeply layered neural networks and the use of GPUs to accelerate their execution. Big data has also fed this growth. Because deep learning relies on supervised learning algorithms (those that train neural networks with example data and reward them based on their success), the more data, the better to build these deep learning structures.

Deep learning and the rise of the GPU

Deep learning consists of deep networks of varying topologies. Neural networks have been around for quite a while, but the development of numerous layers of networks (each providing some function, such as feature extraction) made them more practical to use. Adding layers means more interconnections and weights between and within the layers. This is where GPUs benefit deep learning, making it possible to train and execute these deep networks (where raw processors are not as efficient).

GPUs differ from traditional multicore processors in a few key ways. First, a traditional processor might contain 4 -24 general-purpose CPUs, but a GPU might contain 1,000-4,000 specialized data processing cores.

The high density of cores makes the GPU highly parallel (that is, it can perform many computations at once) compared with traditional CPUs. This makes GPUs ideal for large neural networks in which many neurons can be computed at once (where a traditional CPU could parallelize a considerably smaller number in parallel). GPUs also excel at floating-point vector operations because neurons are nothing more than vector multiplication and addition. All of these characteristics make neural networks on GPUs what’s called embarrassingly parallel (that is, perfectly parallel, where little or no effort is required to parallelize the task).

Deep learning architectures

The number of architectures and algorithms that are used in deep learning is wide and varied. This section explores five of the deep learning architectures spanning the past 20 years. Notably, LSTM and CNN are two of the oldest approaches in this list but also two of the most used in various applications.

Recurrent neural networks

The RNN is one of the foundational network architectures from which other deep learning architectures are built. The primary difference between a typical multilayer network and a recurrent network is that rather than completely feed-forward connections, a recurrent network might have connections that feed back into prior layers (or into the same layer). This feedback allows RNNs to maintain memory of past inputs and model problems in time.

RNNs consist of a rich set of architectures (we’ll look at one popular topology called LSTM next). The key differentiator is feedback within the network, which could manifest itself from a hidden layer, the output layer, or some combination thereof.

LSTM/GRU networks

The LSTM was created in 1997 by Hochreiter and Schimdhuber, but it has grown in popularity in recent years as an RNN architecture for various applications. You’ll find LSTMs in products that you use every day, such as smartphones. IBM applied LSTMs in IBM Watson® for milestone-setting conversational speech recognition.

The LSTM departed from typical neuron-based neural network architectures and instead introduced the concept of a memory cell. The memory cell can retain its value for a short or long time as a function of its inputs, which allows the cell to remember what’s important and not just its last computed value.

The LSTM memory cell contains three gates that control how information flows into or out of the cell. The input gate controls when new information can flow into the memory. The forget gate controls when an existing piece of information is forgotten, allowing the cell to remember new data. Finally, the output gate controls when the information that is contained in the cell is used in the output from the cell. The cell also contains weights, which control each gate. The training algorithm, commonly BPTT, optimizes these weights based on the resulting network output error.

You can send your manuscript at https://bit.ly/2GFUS3A   

Media Contact:

Lina James

Managing Editor

Mail Id: computersci@scholarlypub.com 

American Journal of Computer Science and Engineering Survey

Whatsapp number: + 1-504-608-2390