An illustrated guide to dynamic neural networks for beginners


SOURCE: ANALYTICSINDIAMAG.COM
OCT 10, 2021

In the field of deep learning one subject of research that is emerging rapidly is dynamic neural networks. When we talk about traditional static neural networks we train them with fixed parameters and fix problem-solving skills. But it is well known that the attributes of the input and the environments are changing rapidly in these changing scenarios. So we need something which can change itself automatically according to the input and environment. Here dynamic neural networks are the models which are made with their adapting nature. In this article, we will discuss Dynamic Neural Networks in detail along with their popular categories. The major points to be covered in this article are listed below.

Table of Contents

  1. Introduction
  2. What are the Dynamic Neural Networks?
  3. Advantages of Dynamic Neural Networks
  4. Categories of Dynamic Neural Networks
    1. Sample wise Dynamic Networks
    2. Spatial wise Dynamic Networks
    3. Temporal wise Dynamic Networks

Let’s begin with understanding the context and the dynamic neural networks.

Introduction

As we know, deep neural networks are emerging to solve various problems like computer vision, natural language processing, and also there are various good models like ResNet, VGG, GoogleNet, etc which are really well-performing models but most of these models work in a static manner. Where computational graphs and the network’s parameters are fixed, which limits their interpretability, efficiency, and representation power. Using the advantages of dynamic neural networks we can overcome these limitations of the static neural networks.

What are the Dynamic Neural Networks?

Dynamic Neural networks can be considered as the improvement of the static neural networks in which by adding more decision algorithms we can make neural networks learning dynamically from the input and generate better quality results. The decision algorithms are the improvements that provide power to the network for making more right decisions or computation on the input to obtain a required output more accurately and with representation power. These networks do not work in a fixed direction; they have the capabilities to learn from the environment and input. After learning they can change their work directions which can provide healthy output without performing higher computation and expending higher costs on the computation. Since they have these capabilities we can say they are adaptive according to the situations and the adaptation of the situation is dynamic like they are improving the computation methods on the run time. Which makes us add a ‘dynamic’ word before the neural network. The below section represents the advantages of the dynamic neural network, which will give us more clarity about the Dynamic Neural Networks.

Advantages of Dynamic Neural Networks

The following are the advantages of dynamic neural networks:-

  • Efficiency: dynamic neural networks have features by which they can allocate the computation when required at the test time, by activating their components. For example, in any task where the input shape is small it can allocate less computational energy to perform the task and if the input shape is big more computational energy can be allocated. with this computational efficiency, dynamic models are also data efficient in the case of few-shot learning.
  • Representation power: dynamic models have input-dependent architecture and parameters. which causes an enlarged parameter space and improved representation power. In a case where an increment in the computation model can apply feature conditioned attention weights on a convolutional neural network,
  • Adaptiveness: dynamic neural networks have the capability to achieve different accuracy and efficiency conditions with different computational costs on the running time. That’s why we can say they are good at adapting to different environments and machines. Change in the environment doesn’t affect its running but according to the budget, we can change the accuracy and efficiency.
  • Compatibility; advanced techniques for performance improvement like algorithm optimization, data preprocessing, architecture design optimization works well with static models to achieve state-of-the-art performance. These advanced techniques are also compatible with the dynamic neural network. With the addition of these accelerating techniques like network pruning, low-rank approximation for static models also can be used with dynamic neural networks for further improvement on the efficiency of networks.
  • Interpretability: we can say that neural networks are inspired by human brains. As networks of neurons help in making the brain and the brain makes the decision dynamically, here we can say that one thing was left before the introduction of the dynamic neural network model was to make decisions dynamically. For this, it is necessary for the network to process the input as the human brain does and observe which part of the input is useful for further procedure and which is not. This can be done by dynamic neural networks and this is why we call them data-dependent neural networks.
  • Generality: dynamic models can be used for a wide range of applications like image classification, image segmentation, object detection, etc. many of the dynamic models are based on general approaches that can be used for a wide range of applications like models developed for computer vision can solve all the problem of computer vision and also can apply on solving NLP problem.

Categories of Dynamic Neural Networks

The dynamic neural networks are categorized into three categories. Let us discuss in detail all these categories one by one.

Sample wise Dynamic Networks

This type of DNNs majorly focuses on setting a network that can allocate computation based on every kind of sample. For example, if the sample is easy to learn for the network it can behave abundantly and accurately by decreasing the computational energy or if the sample is difficult, the network can increase the computation energy for better accuracy. They consist of adapting network parameters with fixed computational graphs so that the redundancy in computation can not increase the cost. The major goal of the network is to increase the representation power with minimal cost.

The accelerators on the static models make the computation constant where the network performs the same computation with any kind of data. In dynamic neural networks, the dynamic architecture allows the conditioned computation which can be obtained by adjusting the width and depth of the network or by performing dynamic routing within a supernetwork. Network with dynamic architecture saves the representation power to apply on hard input by preserving the computation on the easy input. If the input is easy we can provide a shallow output that does not require all the layers of the model and by skipping some of the layers we can save the representation power of the skipped layer and the output for easy input will also be accurate.

We can skip the layers in the following ways:-

  • Layer skipping based on halting score – in this type the decided halted score of the features classifies whether the feature will go on the layer or not/ the below image represents that feature x4 does not rely on the halting score

Image source

  • Layer skipping based on a gate function– in this type the gate function decides on the basis of intermediate features whether to execute the block.

Image source

The above image represents the block architecture of a sample-wise dynamic network with a gate module.

  • Layer skipping based on a policy network – the policy network generates the skipping decision for the layer in the main network.

Image source

The above image represents a block architecture of a sample-wise dynamic network with a policy network.

Spatial wise Dynamic Networks

This type of dynamic neural network is basically designed for computer vision problems. As we know in most of the image processing tasks the static models do not take all the pixels of the image in computation. Which is a drawback of static neural networks because the required output from the model becomes shallow and the energy invested by the model is very high. Which is directly connected to the accuracy loss and computational energy loss. Where the spatial-wise dynamic network includes spatially dynamic computation which results in reducing the computational redundancy. In other words, we can say working on only those locations which are responsible for generating the output can give a model higher accuracy with less computational energy.

The spatial-wise dynamic networks are built to adapt the inferences of the different locations from the images. In this type of network, the convolutional layers and filters process the dynamic location according to the location granularity. The adaptive nature of these models is achieved by the depth and width at the pixel level. We can say the models provide a spatial allocation of computation.

These models can be divided into three types according to their work method.

  • Pixel-level dynamic network – it is a most common spatial-wise dynamic network where the model performs computation at the pixel level and the computation procedure is adaptive according to the pixel level. The adaptive nature can be acquired in two ways, one by setting the model architecture dynamic specifically for pixels or setting the parameters dynamic specifically to the pixels.

Image source

The above image is a representation of the pixel-level dynamic network where the black portion of the image determines the pixel(green) which is required for computation.

  • Region level dynamic network

The pixel-level dynamic networks require higher levels of computation which can cause a slower speed in the procedure of image processing. Using these models may require accelerating the hardware externally. As an alternative to these models, we can use the region-level dynamic network. Where the model is built for adapting the computation according to the region or patches of the given image as input to the model. Making a model computation adaptive to the region can be done by the transformation of the parameters on a region of the feature map or learning the patch-level dynamic transformation.

Image source

The above image is a representation of the region-level dynamic network where the region selection module generates the transformation parameters and the selected region is further processed by the network.

  • Resolution level dynamic networks

Above discussed methods have the problem of division of feature maps into different areas for making the adaptive computation. Because they are taking pixels or patches of the image into account for computation. In image processing, the resolution is a major factor that can be used in models for better accuracy. Here in this type of model, we try to learn the whole image by processing feature representation with adaptive resolutions. Low-resolution space in the image can be considered as the easy sample and the high-resolution space as the tough sample. Hence we can say that the resolution-level dynamic network exploits the exploit spatial redundancy from the perspective of feature resolution.

Temporal wise Dynamic Networks

As the name suggests adaptive computation can be performed with data that is temporal or sequential like time series or text data. In the spatial-level dynamic model, we have seen how we make the model’s computational procedures adaptive to the features like pixel patch and resolution of the image. We can also make a model where the computational procedure is adaptive to the sequence of the data where we can define the portion of sequential data that is making any major effect on the result or not making any effect. So that they can be separated and treated differently to achieve the higher accuracy of the model. So if the model is trained for differentiating between the portion of sequential data and also can adapt changes in the data, it can be called a Temporal-Wise dynamic network. These models can be categorized in two ways according to the data they use

  • RNN-based Dynamic Text Processing- traditional static RNN mostly works by reading the whole input sequentially to update the hidden state at each step or at each time step. A type of dynamic RNN can be developed to skip the sequence which is not considered in the output by applying adaptive reading procedure to avoid the reading task in output irrelevant tokens which can be achieved by early skipping where the model is already learned about only relevant token s to use for output or by jumping the arbitrary locations where models are using the skipping procedure in real-time. Since the RNN models are well known for sequential data learning, using them dynamically for time series or text data makes a strong tool for better performance.
  • Temporal-wise Dynamic Video Recognition – video data can also be considered as the sequential data where the inputs are sequentially organized frames. With this kind of data, the temporal-wise dynamic networks are designed to allocate the computation in such an adaptive manner where the model can learn from different conditioned frames by skipping those which are not required. This is pretty similar to the RNN-based dynamic text processing where adaptive computation is achieved by dynamically updating the hidden states in each time step of recurrent models or performing an adaptive pre-sampling procedure for keyframes.

Image source

The above-given image represents the block diagram of steps for a temporal-wise dynamic network where the first three approaches are dynamically allocating computation in each step by skipping the update, partially updating the state or conditional computation in a hierarchical structure. The agent in (d) decides where to read in the next step.

Final words

In this article, we have got an overview of the dynamic neural network where we have seen how we can differentiate the dynamic neural network in three ways. The temporal-wise and spatial-wise dynamic networks are task-specific which can be used for modelling with sequential data and images respectively. Sample-wise dynamic networks can be used for predictive analysis because of their feature of applying adaptive computation on the sample according to the easy and hard samples for saving energy and better representation power. Also, we have seen some of the advantages of dynamic neural networks and how they are helpful in improving overall performances.