LTH-image

Dynamic Capacity Networks in Embedded Systems

George Ryrstedt, LTH

Abstract:

Machine learning is becoming more prevalent on embedded hardware. These systems are often lacking in resources such as memory capacity and processing power. This necessitates the need to create machine learning networks that optimise for a lack of resources while maintaining a useful level of accuracy. Dynamic Capacity Networks (DCN) have previously been studied for their ability to reduce processing time in larger systems. The purpose of this thesis was to study the potential for using a DCN-like attention mechanism and network design in embedded systems. This approach uses three different networks: a coarse, a fine, and a top network. The entire input is passed through the coarse network. The output of the coarse network is then used as part of an attention mechanism to select parts of the original image and pass only these parts into the fine network. The output of these two networks are then merged and passed into the top network. Two different implementations were written, one utilising a convolutional network as the fine network and the other utilising a fully connected network for the same purpose. A few different datasets were used for evaluating the results and this approach seems to be able to provide potentially useful improvements for some but not all of them. A modified version of the German Traffic Sign Recognition Dataset was used for testing. In this version all images of traffic signs where downsized to 32x32 pixels and converted to 8bit greyscale format. The network using the fully connected fine network proved to be highly effective at improving accuracy with this dataset compared to only using the coarse and top networks. Using the attention mechanism proved to significantly increase the speed of inference compared to passing the entire input through the fine network. A modified version of the Galaxy Zoo dataset was also used. In this version of the dataset all images were downsized to 212x212 pixels in size and converted to 16bit greyscale. The fully connected network failed to show any improvement on accuracy for this task, nor did the network using convolutional layers.