Abstract
Since 2012, with the introduction of Convolutional Neural Networks (CNN) for image recognition,
great improvements have been made in terms of accuracy, topologies, and understanding of the
challenges associated with the image recognition. Several situations where they have already
been proven successful include self-driving cars, image tagging and face recognition. Most of the
development have been centered in increasing precision while trying to have as little computations
as possible. However, most of the topologies and applications still require expensive and power
hungry Graphical Processors (GPUs) to be able to deliver fast responses. Therefore, these systems
are most of the time located in the very same place where the data is generated or in specially
designed data centers. Recently, there has been a growing interest and research towards lowresources
architectures for its use in embedded systems, although most of it is still in a theoretical
approach.
In addition although CNNs have applications different than image recognition, this last
one has been proven quite controversial due to the use (or misuse) that some companies and
governments have done of them, while most of the research done by universities has been more
theoretical.
The objective of this bachelor thesis is to use the current theoretical knowledge about CNNs
to prove their use in embedded systems while at the same time developing an application that
can be beneficial for the society as a whole. According to this objective, the thesis aims to be able
to get photos from the Swedish deaf’s people sign language alphabet and identify the letters
associated with each of the signs, working on a real time system.
For that purpose, big amounts of data have been collected, analyzed and processed and the
(embedded systems’ friendly) Zynqnet CNN topology has been modified to fit the application. All
together allow more than 85% of the images to be successfully identified using a regular GPU
training system.
In addition, a custom, high throughput hardware accelerator for that topology has been
designed to be placed in an FPGA. Similar precision results than using the GPU have been gotten
while reducing space, weight and power consumption. The FPGA accelerator will also reach
real-time performance, computing the results for each image in less than 1 second.