Speech Recognition Using Shallow Neural Network Classification Project

Verified

Added on  2022/08/21

|7
|1760
|13
Project
AI Summary
This project focuses on developing a speech recognition system to detect Parkinson's disease using a shallow neural network classification approach. The research utilizes the UCI machine learning repository's Parkinson Speech Dataset, which contains voice sample characteristics from both Parkinson's disease patients and healthy individuals. The methodology involves loading the dataset into MATLAB, preprocessing the data, and training the neural network. The project explores different neural network algorithms and quality measures such as performance scores, cross-entropy plots, and confusion matrices to evaluate the accuracy of the classification. The goal is to identify Parkinson's disease through voice analysis, potentially reducing the need for costly medical tests and enabling early detection. The project also discusses the scope of interest, dataset introduction, proposed methodology, algorithms, and evaluation metrics.
Document Page
Running head: Speech Recognition Using Shallow Neural Network Classification
Speech Recognition Using Shallow Neural Network Classification
Name of the Student
Name of the University
Author Note
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
1Speech Recognition Using Shallow Neural Network Classification
Introduction to research problem:
Speech recognition is a subfield of computer linguistics by which methodologies are
developed by which enables the machines to recognize the spoken language by human by
converting into text. This is also known as the automatic speech recognition, speech to text
conversion or computer speech recognition. The typical speech recognition works by training
a machine by isolated words or vocabulary as spoken by a person with proper accent. The
system typically analyse the voice of the particular person and then fine tune’s voice for
increasing the accuracy of recognition. This type of speech recognition system is known as
speaker dependent speech recognition where a sample voice is used for training and the
systems with no training voice are known as speaker independent speech recognition system.
There are different applications of speech recognition that includes voice user interfaces like
voice dialling, call routing, domotic appliance control and key word searching. However, in
this particular research a different type of speech recognition application is performed where
the by different attributes of speech a person is identified as a healthy or diseased. In
particular the recognition software will be able to recognize when a diseased person (for a
particular disease) speaking to the system by analysing its attributes or when a healthy person
is speaking. Now, instead of considering multiple disease detection the software will be
implemented for only one disease which is chosen to be Parkinson’s disease where the patient
goes through voice change with other change in physical attribute change (Yu and Deng
2016). Hence, for this project a relevant data will be used that contains the voice sample
attributes of Parkinson’s disease patients and voice sample of healthy patient which will be
analysed using neural networks in the software that outputs whether a person/s is diseased or
normal. Hence, this is a speaker dependent speech recognition system where the voice sample
of healthy and diseased patients will be used for training the algorithm with neural network
and then will be tested on a set of people combining both healthy and diseased subjects.
Document Page
2Speech Recognition Using Shallow Neural Network Classification
Scope of interest:
The particular scope of interest of this of this project is to detect patients with
Parkinson’s disease without doing their medical test. This is very much helpful as this
reduces the cost of medical testing and time for Parkinson’s test and quick medication can be
provided the patients. Parkinson’s disease is one type of nervous system disorder which
affects mainly the movement of the individuals. The symptoms of Parkinson’s disease
gradually increases, at an early stage a tremor can be felt at just one hand. In the later stage
the disease causes stiffness or slow movement and reflex. There is no permanent cure for the
Parkinson’s disease but proper medication can help patients to significantly improve their
symptoms (Ascherio and Schwarzschild 2016). The Parkinson’s disease has several
symptoms like tremor, bradykinesia, rigid muscles, impaired balance and posture, reflex loss,
changes in voice, changes in hand writing. Thus this project aims to provide medical
assistance to the patients by detecting Parkinson’s disease at an early stage only checking the
voice sample of the person and hence benefits many of the individuals who are not able to
find medical testing facility near their residence or who are not able to afford the cost of
testing.
Dataset introduction:
The dataset which will be used in the software to identify voice samples as
Parkinson’s diseased positive or negative are retrieved from the UCI machine learning
repository Parkinson Speech. In the training data of the dataset there are voice sample
characteristics of 20 PD(Parkinson’s disease) patients and 20 healthy individuals are included
who appeared in the survey of Neurological department of Cerrahpasha at Istanbul university
(UCI Machine Learning Repository: Parkinson Speech Dataset with Multiple Types of Sound
Recordings Data Set 2020). All the healthy and PD patients are instructed to speak multiple
Document Page
3Speech Recognition Using Shallow Neural Network Classification
type of sound that includes numbers, sustained vowels, specific words and short sentences.
Now, from each of the voice sample 26 linear and frequency based typical features are
extracted by using technology and the data is gathered in a excel sheet. Also, patients are
given the UPDRS (Unified Parkinson Disease Rating Scale) that is computed by an expert
physician which is gathered with the 26 variables. Now, the test dataset is collected by the
same physician who collected the information of voice samples from 28 PD patients under
the same conditions. However, in the test set the 28 PD patients are only asked to speak
sustained vowels ‘a’ and ‘o’ three consecutive times making a total of 168 recordings. Then
in the same way like before the 26 features of the voice are extracted from 168 recordings
and it is considered independent of the training set as there are no common PD patients
between training and test data. The descriptions of the 26 voice samples for the training data
are given below.
1. Sustained vowel (a)
2. Sustained vowel (o)
3. Sustained vowel (u)
4 to 13: 1 to 10 integers
14 to 17: short sentences
18 to 26: words
Test data set voice sample description:
1 to 3: sustained vowel (a)
4 to 6: sustained vowel (o)
Variable information for training data:
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
4Speech Recognition Using Shallow Neural Network Classification
1st column: id number of subject
2nd to 27: features of voice
28th column: UPDRS score
29th column: information of class (1 = PD, 0 = healthy)
Test data file variable information:
1st column: id number of subject
2nd to 27th column: features of voice
28th column: information of class (1 = PD, 0 = healthy)
Proposed methodology:
At first the entire dataset containing training and testing data are loaded in a
programming software. The programming software chosen for this project is MATLAB as
this has many inbuilt libraries and functions for neural networks and artificial intelligence
which are needed for training and testing with the chosen Parkinson’s data. At first the
dataset will be pre-processed to remove any missing or wrong instances and then binary class
variable will be divided in two column such that MATLAB inbuilt training functions can be
applied on the data. The dataset is already divided in train and test and hence there is no need
to specify the training, testing and validation ratio. The test data voice samples is used for
validation purpose also (Kim and Gofman 2018). Now, the accuracy of the test or validation
results depends on initial weights of neurons of the neural network and the bias vectors and
thus different values of weight and bias vector should be tried in a trial and error method until
desired accuracy class detection is achieved. Also, the precision results can be increased by
increasing number of hidden layers in the network and/or training vectors.
Document Page
5Speech Recognition Using Shallow Neural Network Classification
Algorithm/s to be applied:
Now, sometimes the results are not improved by much even after trying with different
initial weights, large hidden layers and training vector. In that case it is required to apply a
different algorithm by analysing the data and the test results. The default training algorithm
of neural network is the scaled conjugate gradient algorithm. Hence, if desired accuracy of
classification is not achieved by the algorithm then a different algorithm must be tried. The
other built in algorithms that are provided by MATLAB are Levenberg-Marquardt, Bayesian
Regularization, BFGS Quasi-Newton, Resilient Backpropagation, Conjugate Gradient with
Powell/Beale Restarts, Fletcher-Powell Conjugate Gradient, Polak-Ribiére Conjugate
Gradient, One Step Secant, Variable Learning Rate Gradient Descent and Gradient Descent
algorithm (CÖMERT and Kocamaz 2017). These large variety of algorithms ensures the
accuracy of results for a wide variety of data type.
Quality measures for evaluation:
The different quality measures which will be used to assess the accuracy of the results
is the performance score for test results, plot of training, testing, validation and best
performance by cross-entropy and the confusion matrix. The number epochs that gives the
best performance can be evaluated from the cross-entropy plot and that number will be used
in the training model. The true positive and false negative percentages of the confusion
matrix are the indicators of the performance of the neural network over test and validation
phase that shows how many instances are correctly classified by the neural network and
algorithm on the test case.
Document Page
6Speech Recognition Using Shallow Neural Network Classification
References:
Archive.ics.uci.edu. 2020. UCI Machine Learning Repository: Parkinson Speech Dataset
With Multiple Types Of Sound Recordings Data Set. [online] Available at:
<https://archive.ics.uci.edu/ml/datasets/Parkinson+Speech+Dataset+with+
+Multiple+Types+of+Sound+Recordings#> [Accessed 19 March 2020].
CÖMERT, Z. and Kocamaz, A.F., 2017. A study of artificial neural network training
algorithms for classification of cardiotocography signals. Bitlis Eren University journal of
science and technology, 7(2), pp.93-103.
Yu, D. and Deng, L., 2016. AUTOMATIC SPEECH RECOGNITION. Springer london
limited.
Ascherio, A. and Schwarzschild, M.A., 2016. The epidemiology of Parkinson's disease: risk
factors and prevention. The Lancet Neurology, 15(12), pp.1257-1272.
Kim, D.E. and Gofman, M., 2018, January. Comparison of shallow and deep neural networks
for network intrusion detection. In 2018 IEEE 8th Annual Computing and Communication
Workshop and Conference (CCWC) (pp. 204-208). IEEE.
chevron_up_icon
1 out of 7
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]