Cryptographic Key Generation from Voice: A Summary Report (2019)

Verified

Added on 2025/04/28

AI Summary

Desklib provides past papers and solved assignments for students. This report details cryptographic key generation from voice data.

Summary Report
(2019)
Cryptographic Key Generation from Voice

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

INDEX
Contents
I. Introduction..................................................................................................................................3
II. “Voice” as a Biometric Data.......................................................................................................3
III. Adopted Criteria for the Key Generation..................................................................................4
A. Automatic Speech Recognition (ASR) and the related Issues................................................5
B. Application of the Derived Cryptographic Keys....................................................................5
IV. Techniques used for Generating the Cryptographic Keys.........................................................5
A. Generation of Key using the Biometrics Data........................................................................6
B. Automatic Speech Processing and Generation Technique.....................................................6
V. Applied Algorithms....................................................................................................................6
A. Frame Segmentation...............................................................................................................7
B. Mapping of the Segments.......................................................................................................8
C. Evaluation & Results..............................................................................................................8
VI. Future Work...............................................................................................................................9
VII. Recommendations....................................................................................................................9
VIII. Conclusion............................................................................................................................10
XI. References...............................................................................................................................11

I. Introduction
The paper is based on proposing a new cryptographic technology to generate a key from the
users’ voice or biometric data. Humans are unable to memorize long passwords and patterns let
alone remember cryptographic codes. This limits the development of security systems in the
current technological industry. In order to address this limitation, a cryptographic code or a key
is generated using the biometric data of the users along with the newly generated password.
The successful implementation of such technology will help in resolving security issues. The
conjunction of the cryptographic code and the normal password can be used in many fields such
as authentication, ID verification, privacy in personal networks, file and document encryption,
and much more. The main focus was on using the users’ voice as the biometric information to
encrypt the password key.
II. “Voice” as a Biometric Data
Biometrics have been used in many industries and technical researches for verification and
security purposes. There are several types of biometrics used for recognition and verification of
an individual’s identity. Commonly used data for biometric recognition are retina, iris,
fingerprint, face, ear, DNA, and voice. But there are many other types of biometric recognition
techniques.
This research and development paper chose to study voice as biometric data. The reasons for
selecting a voice for the cryptographic code generation are:
 Familiarity and regular use of voice by the users.
 Frequently used for communication.
 A lot of advanced devices support voice activation and integration.
 Needs a speaker to be detected which is also a commonly used and cheap device.
 A person, whenever uses his voice to enter the password, change the key synced with the
password. This is because of multiple variations in the voice of the users.
Besides having a lot of variations and versatility options in using voice for a key generation, it
also has a few but significant drawbacks. The password can easily be recorded by someone when

the user is speaking his/her password. Most common biometrics also have similar issues and
drawbacks. Fingerprints can be picked from the scanner surfaces, face and iris can be captured
by a camera, etc. There are always some limitations in developing technology and the attacker
will keep on tampering with the security network and systems. Therefore, the main focus should
be on dealing with such issues and risks. Most of the speech recognition systems will generate a
string or array of words completely influenced on what the device heard. But the main problem
is to identify the target sound correctly. Due to a lot of disturbance in the environment around the
machine or device, it is hard for the device to identify what and differentiate between the user’s
password and other voices coming at the same time.
III. Adopted Criteria for the Key Generation
The development of the security system and associated software is mainly focused on its
usability and improved security. To fulfill the usability requirements of voice verification, the
device will simply generate a key derived from the user’s speech patterns. The key does not need
to be accurate because the key generation is definitely going to be affected by the environmental
disturbance. The password input from a close range will help the device in regenerating the key
without any significant delay.
The main concern for the implementation of the device is to find a way to keep the device
physically safe. There is a possibility that the device, having all the information about the keys
and users’ voice patterns stored in it, gets stolen by an attacker. The attacker would be able to
retrieve all the information stored in that device and can use it to manipulate important data.
Therefore, any kind of plain-text data is not recorded by the device before the user speaks his/her
password.
The attacker can anytime cryptoanalysis the stored information and derived keys by using the
brute force method. To improve the key generation the device will generate a 46-bit password
key from just 2 seconds uttered password. And this key can be further lengthened with the help
of some salting techniques to a 56-bit password key. The length of the keys can be increased
even further but it might affect the generation time taken by the device. It can be done efficiently
if the developers could understand the entropy pattern in the voice of different users.

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

A. Automatic Speech Recognition (ASR) and the related Issues
It is the implementation of computing technologies, including the hardware and the software, to
recognize and verify human voice. ASR is used for multiple devices which are designed for
authentication and recognition purposes. This technology converts the uttered words directly into
text which is then compared to the password stored previously in the system memory.
The main drawback of this system is that it requires some stored password or voice data to
verify a user’s speech input. The use of ASR in the proposed technology is removed from the
consideration because:
 The small range of vocabulary (i.e., 104) available in ASR, can easily be searched by a
hacker.
 The ASR needs a lot of volatile memory storage from multiple devices.
B. Application of the Derived Cryptographic Keys
The generation of the cryptographic key with the help of users’ voice patterns and spoken words
can be efficiently used in secured telephony. The password which is uttered by the user is
converted into a key, which triggers a semi-random process to make another key which will be
kept private. The key stored privately forms a pair with the one generated when the user speaks
the password. The main use of this key pair is in decrypting the voices. The user does not have to
carry a token to activate the device. The voice of the user acts as a keying tool.
For the purpose of secured telephony, the device and techniques used in it can be used to decrypt
the encrypted email received on a mobile device. This will help people to read the email on their
phones even if they are encrypted. Thus, an email with a plain-text will be safe from exposure
outside the mobile phone, even if it gets stolen. The continuous advancement in the technology
will enable the execution of the algorithm proposed in this paper.
IV. Techniques used for Generating the Cryptographic Keys
This part of the summary report is useful to understand the techniques and methods used in the
generation of the password keys.

A. Generation of Key using the Biometrics Data
The methods to generate a key using the biometric data can be divided into two mainstages. The
first stage includes the detection of the biometric data using a measuring instrument or device.
The device collects the raw data of the user and computes the feature descriptor. Them-bit string
produced by multiple users should be different from one another. The difference in the
descriptors is then used for the key generation process in the second stage. The 2nd stage does not
depend on the biometric data but the first stage does. The main focus of the encryption of the
features collected from the user’s voice is to perfectly collect the description of the features.
Multiple combinations are made from various techniques to generate an accurate key.
B. Automatic Speech Processing and Generation Technique
The topics of SV & ASR have been researched for a long time. The initial study helps the users
to understand how a computer understand and receives voice information from the environment.
The input sound is collected from a microphone connected to the computer port. The mic
contains a sensitive diaphragm which vibrates whenever a sound wave concert with it. These
vibrations in the diaphragm are then converted into high-quality electric signals which are
transferred to the computer. A converter is used to convert the analog signals received by the
computer into digital ones that are easy to understand by the computer devices [3, 5].
Various study on the ASR devices compared the speech with acoustic speech models. Such
models can be easily developed with the help of voices of a few users. The main necessity of the
proposed device is to generate a model which is independent of the users and texts. If such
models are stored then it would be very easy for the attackers to extract information and codes
from those models [3]. The techniques used in our system takes its data from a huge database of
multiple speeches and voice data. But this needs efficient methods to compress data using a
lossless compression technique [5].
V. Applied Algorithms
The algorithms or methods used for developing a cryptographic key from the input voice in a
precise and optimal way, are explained in this part of the report. The algorithms are used for
implementing cryptography into the security devices based on voice data, in a reliable way. It
focuses on the ways to create such a key that is secure from cryptoanalysis and data decoding

attacks. The approach starts by dividing the voice into samples of utterance under various time
windows. Various frames are identified for the silence in between and at the end of the speech.
For a better experience and achieve better accuracy, removing frames of silence is necessary.
Various elements are used to construct the keys. The challenge that is remaining is to find a
method to develop a descriptor from the input frames in such a way that a new frame can be
constructed form it again [1, 2].
The development starts from the segmentation of the incoming frame in the device and
converting into contiguous subframes and sequences of the input speech. This process is
necessary to determine a continuous speech pattern of the password. The small segments
generated by the device are then examined and their features are observed [2].
The key generation process is divided into the following steps:
1. Capture speech or utterance
2. Prepare a window
3. Understanding and deriving the spectrum
4. Quantizing the acoustic space
5. Segmentation of the Utterance
6. Finalizing the segments
7. Generation of the feature descriptor
8. Final feature descriptor
9. Table lookup structure
10. Key recovery and file decryption
A. Frame Segmentation
The utterance captured by the devices is segmented using a unique acoustic model generated
from various speech samples. A segment can be represented as a part of the speech frames of the
password. The acoustic model is used for the segmentation process. Various formulations and
mathematical calculations are performed in order to get accurate results and likelihoods of the
segments. An optimal segment generation technique is used which are used to map the speech
and utterance patterns [4].

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

The algorithm used here to create the best segments of the utterance has a repeating approach in
it. A range is first initialized in the device and the sub-segments are of equal lengths. The ranges
selected are then continuously refined by the algorithm so that some improvements in these
ranges can be noticed. After the completion of the algorithm, a set of refined segments are
generated as an output result. The number of iterations covered by this algorithm is
approximately three [4]. Heuristic principles used along with the algorithms also guide the
segmentation process. The segment, which is having only one frame contained in it, is discarded
by the system.
B. Mapping of the Segments
Now that the segmentations are generated by the algorithm, the main aim of the developer is to
explain the features of the generated segments. These features are mostly similar when the voice
inputs are from a single user. The features identified from the segmentations are used to generate
the descriptor for the key generation system. The main approaches which are proposed here
ignore one of the features from each segment. Then it generates a single feature descriptor from
the selected features. The plaintext data stored in the device memory should not be exploited by
the algorithm at any point in time. If any exploitation is encountered in the system, it will open
many ways for the attacker to capture important information. There are multiple descriptor
development features that can be used by the developers [8]. These features include:
 Parity of the Centroids
 Centroid Distances
 Relative position through the centroid
C. Evaluation & Results
The main calculations that were used in the evaluation process are “False-Negative-Value”. It
can be understood as the total percentage of encountered failures in the attempts made by the
users. For a particular user’s utterance, the device will attempt to make him a derived key by the
algorithms. Even after a successful attempt, it will retry the process to attain accurate results.
Precision is the key aim of the login device. The process of developing alternate units of
descriptors is done because the sound produced by the user can be manipulated or affected by the
environment noises. Even with such corrective actions, negative values are encountered by the

device [8]. This can be due to bad quality devices used, problems in the mic, the improper and
inconsistent pronunciation of the user, etc.
The system to be developed and implemented should be able to identify the negative values from
multiple input values. There are a lot of trends in the guessing of the entropy. If the entropy is
guessed correctly, the overall percentage of the negative value in the device will decrease. The
detailed analysis shows that the proposed system might be a good implementation in the security
systems to be implemented in various industries. Thus, it is very important to design a system
which can guess the entropy of the system and voices more accurately.
VI. Future Work
This project and technology can be the first step in the investigation od cryptography and key
generation from speech and voice utterances. The main task to be focused on in the future is to
understand and study the feature descriptors and a way to increase their length. The increased
length of the descriptors will provide better security to the login systems. The proposed goal for
this is to reach such a length of descriptors which will remove any need for encryption. This will
definitely provide some additional strength to the login devices based on voice detection
technology. A better empirical analysis is needed for the proposal. The development task should
be able to employ more people in the project. A lot of improvements can be proposed and
various alternative approaches should be analyzed by the users [6, 7].
VII. Recommendations
The potential and capabilities of voice detection technologies are improving at a very fast pace.
Speech identification systems and related technologies are also being implemented in many
important industries. The problem is to find an optimal methodology to be used in the
development of these technologies.
The devices should possess a strong integration and conjunction of the speech signals and the
features caught by the receivers. Companies and institutes developing such devices should
understand the importance of making the use of the device easier for people with disabilities. A
complete hands-free experience should be provided by these technologies so that users having a
disability can easily access the device. It should also have great grasping power for the input

voice signals so that there is no strain on the user’s throat. The technology, if developed
efficiently, can also be used in computing systems which will save a lot of time for the users. For
this to happen, the entropy guessing methodology and algorithms should be enhanced properly
[6].
VIII. Conclusion
In the paper, the main focus was given on the implementation of cryptographic keys in the voice
recognition devices and systems, and also on the methodologies used to improve the accuracy of
these devices. According to the knowledge gained from the research of this paper, it was
observed that the report gave detailed information about the algorithms to be used in the key
generation in a cryptographic structure. The algorithms can be used for implementing
cryptography into voice identification devices in a reliable way. It focuses on the ways to
generate such a key that is safe from cryptoanalysis and brute-force attacks to decode some data.
The key should be immune even if the attacker captures the login/authentication devices.
The report identified various problems and issues that can be encountered in the development of
this technology. A lot of focus is needed on the development of an algorithm that is capable of
guessing the entropy of the voice and speech patterns of the user. The strong integration between
the descriptors and the guessed entropy is the key to design a highly accurate voice-recognition
device. The results of the empirical evaluation conducted on the security system played an
important role in the analysis of the security system. It showed the importance of high reliability
of the authentication device on the key generation process and the entropy.

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

XI. References
[1] Paulini, Marco, "Multi-bit allocation: preparing voice biometrics for template protection."
Odyssey 2016. 2016.
[2] Moosavi, Sanaz Rahimi, et al. "Cryptographic key generation using ECG signal." 2017
14th IEEE Annual Consumer Communications & Networking Conference (CCNC).
IEEE, 2017.
[3] Yu, Dong, and Li Deng, AUTOMATIC SPEECH RECOGNITION. Springer london
limited, 2016.
[4] Sadkhan, Eng Sattar B., Baheeja K. Al-Shukur, and Ali K. Mattar, "Survey of biometrie
based key generation to enhance security of cryptosystems." 2016 Al-Sadeq International
Conference on Multidisciplinary in IT and Communication Science and Applications
(AIC-MITCSA). IEEE, 2016.
[5] Vitaladevuni, and Shiv Naga Prasad, "Audio output masking for improved automatic
speech recognition." U.S. Patent No. 9,704,478. 11 Jul. 2017.
[6] Hammond, S. “How to improve voice and speech recognition accuracy” | Appen Blog.
[online] Appen. Available at: https://appen.com/blog/when-speech-recognition-goes-
wrong/ (Accessed 9 Apr. 2019).
[7] E. Schwartz, PDAs lean to listen up. Infr,nwrld.com, February 4, 2000.
[8] G. I. Davida, Y. Frankel, and B. J. Matt, On enabling secure applications through off-line
biometric identification. In Proceedings uf the 1998 IEEE Sympusium on Security und
Privacy, pages 148-157, 1998.