p.p1 field and we investigate the accuracy they succeed

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 10.0px Helvetica}
p.p2 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica}
p.p3 {margin: 0.0px 0.0px 0.0px 0.0px; font: 9.0px Helvetica}
span.s1 {font: 24.0px Helvetica}
span.s2 {color: #002486}

 Speech recognition is the translation, through
some methodologies, of human speech into text
by computers. In this research review we examine
three di erent methods that are used in speech
recognition field and we investigate the accuracy
they succeed in di erent data sets. We analyze
the state-of-art deep neural networks (DNNs),
that have evolved into complex architectures and
they achieve significant results in many cases.
Afterward, we explain convolutional neural networks
(CNNs) and we explore their dynamic in
this field. Finally, we present the recent research
in highway deep neural networks (HDNNs) that
seem to be more flexible for resource constrained
platforms. Overall, we critically try to compare
these methods and show their strengths and limitations.
We conclude that each method has its
advantages but also has its weaknesses and we
use them for di erent purposes.
I. Introduction
Machine  Learning (ML) is a field of computer science
that gives the computers the ability to learn through
di erent algorithms and techniques without being programmed.
Automatic speech recognition (ASR) is closely
related with ML because it uses methodologies and procedures
of ML 1 , 2 , 3 . ASR has been around for decades
but it was not until recently that there was a tremendous
development because of the advances in both machine learning
methods and computer hardware. New ML techniques
made speech recognition accurate enough to be useful outside
of carefully controlled environments and so it could
easily be deployed in many electronic devices nowadays
(i.e. computers, smart-phones).
Speech is the most important mode of communication
between human beings and that is why from the early part
of the previous century, e orts have been made in order
to make computers do what only humans could perceive.
Research has been conducted through the past five decades
and the main reason was the desire of making tasks automated
using machines 2 . Many motivations using di erent
theories such as probabilistic modeling and reasoning,
pattern recognition and artificial neural networks a ected
the researchers and helped to advance ASR.
The first single advance in the history of ASR occurred
in the middle of 70’s with the introduction of the
expectation-maximization (EM) 4 algorithm for training
hidden Markov models (HMMs). The EM technique gave
the possibility to develop the first speech recognition systems
using Gaussian mixture models (GMMs). Despite
all the advantages of the GMMs, they are statistically ine
cient for modeling data that lie on or near a nonlinear
manifold in the data space. This problem could be solved
by artificial neural networks but the computer hardware of
that era did not allow us to build complex neural networks.
As a result most speech recognition systems were based
on HMMs and later they used the neural network and hidden
Markov model (NN/ HMM) hybrid architecture, first
investigated in the early 1990s 5 . After 2000s and over
the last years the improvement of computer hardware and
the invention of new machine learning algorithms made
possible the training for DNNs. DNNs with many hidden
layers have been shown to outperform GMMs on a variety
of speech recognition benchmarks 6 . Other more complex
neural architectures such as recurrent neural networks
with long short-term memory units (LSTM-RNNs) 7 and
CNNs seem to have their benefits and applications.
In this literature review we present three types of artificial
neural networks (DNNs, CNNs, and HDNNs). We
analyze each method, we explain how they are used for
training and what are their advantages and disadvantages.
Finally we compare these methods, identifying where each
one of them is more suitable and what are their limitations.
Furthermore we draw some conclusions from these comparisons
and we carefully suggest some probable future
II. Methods
A. Deep Neural Networks
B. Convolutional Neural Networks
C. Highway Deep Neural Networks
H DNNs are a depth-gated feed-forward neural network
9 . They are distinguished from the conventional
DNNs for two main reasons. Firstly they use much less
model parameters and secondly they use two types of gate
functions to facilitate the information flow through di erent
HDNNs are a multi-layer network with L hidden layers.
In the first layer we have the transformation of the input
Informatics Research Review (s1736880)
 with the first parameter followed by a nonlinear activation
function and in each next layer we have the transformation
of the previous hidden layer with the current parameter followed
by a nonlinear activation function (i.e. sigmoid function).
The output layer is parameterized with the parameter
and the output functions, which usually is the softmax to
obtain the posterior probability of each class given the input
feature. Given target labels, the network is usually trained
by gradient descent to minimize a loss function such as
cross-entropy. However, as the number of hidden layers increases,
the error surface becomes increasingly non-convex,
and it becomes more likely to find a poor local minimum
using gradient based optimization algorithms with random
initialization 23. Furthermore the variance of the backpropagated
gradients may become small in the lower layers
if the model parameters are not initialized properly 24.
Highway deep neural networks (HDNNs) 17 were
proposed to enable very deep networks to be trained by
augmenting the hidden layers with gate functions. These are
the transform gate that scales the original hidden activations
and the carry gate, which scales the input before passing it
directly to the next hidden layer.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!

order now