CNSCC.160 – Focus on Speech Processing
Practical Session for Week 6
Audio Data and Quantization Effects using MATLAB
In previous practical sessions, you familiarised yourselves with the MATLAB environment.
You are now expected to know how to operate MATLAB and how to use basic commands of
MATLAB. This practical session focuses on audio signals instead of images and will help you
to learn some basic operations for speech and audio data, including how to read/write
audio files, play their audio content and plot figures.
Basic MATLAB Operations for Speech and Audio Data
In these notes, all commands that can be typed in MATLAB are typed in a plain font. You
can cut and paste commands from this document directly into the MATLAB command
window. Alternatively you can create your own MATLAB command file with:
edit exer1.m
You can then type or cut-and-paste in commands. Save your work and then run the contents
of the file at the MATLAB prompt:
exer1
For more information on any MATLAB commands, you can access the help pages either
from the main menu, or by typing“doc <MATLAB command>”or“help <MATLAB
command>. For example, type the following line if you want to learn about the syntax of the
MATLAB command“plot”:
doc plot
In order to read a speech/audio data file, use“audioread()”i.e.
[Y, sr] = audioread(filename);
This loads the WAV file specified by the string variable <filename>, returning the sampled
data in variable Y. Append the“.wav”extension. The command also returns the sample rate
(sr) in samples/second or, equivalently, Hertz (Hz).
Let us read an audio file called‘bach16.wav’, which can be downloaded from Moodle. You
will use the command:
[Y,sr] = audioread(‘bach16.wav’);
Y MATLAB vector containing the speech/audio data.
sr The sampling rate (16,000 Hertz for the Bach file).
bach16.wav The filename from which the data is loaded.
In the main MATLAB window, use the menu to display the Workspace Window (e.g., go to
“Desktop”and select“Workspace”). Alternatively, type the following command at the
MATLAB command prompt:
whos
Note the size (number of samples in) the variable Y.
To save data in WAV format, use the command“audiowrite()”, For example, type:
audiowrite(Y,sr, ‘myfile.wav’);
where Y and sr are as above and‘myfile.wav’is the name of the file where the speech data
will be stored.
To listen to the audio/speech (you need headphones or PC speakers), use the command:
soundsc(Y, sr);
To change the sampling rate of the data use the command:
d = resample(Y, p, q);
This command is part of Signal Processing Toolbox. It will resample the sequence in vector Y
at p/q times the original sampling rate. The result is placed in the output variable d. Both
variables p and q should be positive integers selected by you. In other words, Y is
interpolated by a factor p and then decimated by a factor q. For example, define:
example_vector = [1 2 3 4];
and then type:
interpolate = resample(example_vector, 3, 1)
When the contents of interpolate are displayed, observe that the new vector has three
times the length of example_vector because p=3 (while q=1). Effectively, two new
samples have been estimated by interpolation and inserted between each original sample.
Now, try the following line:
decimate = resample(interpolate, 1, 4)
and observe that decimate contains only 3 elements, i.e., it has 1/4 of the length of the
vector interpolate. In this case, decimate is a rough representation of
interpolate if a quarter of the samples are required. Remember that if we need to
sample at p/q times the original rate, then resample(Y, p, q)generates a better
estimate than resample(Y, p, 1) followed by resample(Y, 1, q).
Listen to the resampled data but remember to alter the sampling rate appropriately in the
soundsc() command. For example:
soundsc(d,(p/q)*sr);
The amplitude of the wav file can be plotted by:
plot(Y)
The x-axis displays the sample number (which is related to time). The y-coordinate shows
the amplitude of the samples. To change the x-axis from sample number to time, close the
previous figure, calculate the time and plot Y again:
time = (1:length(Y))*(1/sr);
plot(time, Y);
Your plot can be improved by adding labels and a title. To do this, use the following
commands:
title(’WAV File Plot’);
xlabel(’Sample Number’);
ylabel(’Amplitude’);
What to include in the report
Exercise 1 (total: 6+6=12 marks)
Part 1.a (6 marks)
Download the three sample speech WAV files that are available on Moodle
(‘speech01.wav’,‘speech02.wav’and‘speech03.wav’).
For each of these sample WAV files, read the data into MATLAB, listen to the sample and
then check the size of the loaded data. Given that you can find the sampling rate (sr),
what is the time interval between samples (e.g., between Y(1) and Y(2))? What is
the entire duration of each WAV file in seconds?
Select one of the sample WAV files (state which). The file contains 3.4 KHz bandwidth
speech sampled at 8 KHz. Read the data into a MATLAB variable called DataA.
Perform the following:
o Resample DataA to 4 KHz, i.e., use the resample command to reduce the number
of samples by half. Call the resampled data DataB.
o Resample DataA to 16 KHz. Call this DataC.
o Resample DataB and DataC back to 8 KHz, and produce DataD and DataE
respectively. Figure 1 makes this process clearer.
o Plot the waveforms of the reproduced DataD and DataE and compare them with
the plot of the original data in DataA.
Figure 1: The re-sampling process
In some cases, you will be plotting fewer samples than are in the original data. For this
reason, you may find the following hint useful.
time = (1:length(Y))*(1/sr);
from_s=1;to_s=500;
plot(time(from_s: to_s),Y(from_s:to_s));
xlabel(‘Time [Sec]’);
ylabel(‘Amplitude’);
where from_s is the index of the first sample and to_s is the index of the last sample that
you wish to plot. The maximum value of to_s is equal to length(Y).
Experiment with different values of from_s and to_s. For example, plot all the data, half
the data as well as data lasting for 0.5 seconds. Select values of from_s and to_s that
give good comparison between the original and re-sampled data.
Part 1.b (6 marks)
Listen to DataA, DataD and DataE.
Describe and comment on your listening experience and relate this to the Nyquist
Theorem.
Calculate and provide the signal to noise ratio (SNR) between DataA and DataD and
then DataA and DataE using the formula:
snr = 10*log10((sum(abs(Original).^2)/sum(abs(Original-Recovered).^2)));
In this formula, Original is the data from DataA and Recovered is the data from
DataD and then DataE. You will need to edit the above command to match your choice of
MATLAB variable names for each of the two SNR’s required.
As before, relate your SNR results to the Nyquist theorem.
Introduction to Quantization
In signal processing, quantization is the process of approximating a continuous range of
values (or a very large set of possible discrete values) by a relatively small set of discrete
symbols or integer values.
This task examines quantization for audio/speech data in MATLAB and explores the impact
of quantization on audio/speech data.
You are going to use a MATLAB function to alter the quantization levels in a WAV file.
Download the file‘QUAN_demo.m’from Moodle. At the MATLAB command prompt, type:
edit QUAN_demo.m
Carefully examine the MATLAB code of this function and work out how the function
performs the quantization.
This function achieves quantization by adjusting the quantizer amplitude range to be the
same as that of the input speech signals.
Exercise 2 (total: 5+5+3=13 marks)
The function QUAN_demo.m will be implemented in this exercise.
Part 2.a (5 marks)
Clear all data in the MATLAB workspace
clear all;
In the command window call the QUAN_demo.m function as follows:
[snr, ori_data, quan_data, sr] = QUAN_demo(3,filename);
where filename is one of the speech WAV files (state which file you selected).
Listen to the two types of data, the original file and the quantized version with 3 bits per
sample.
soundsc(ori_data, sr);
soundsc(quan_data, sr);
Comment on the differences you hear.
Part 2.b (5 marks)
Using the following MATLAB commands, plot the amplitude of the original signal and its
quantized version within a single window:
subplot(2,1,1);
plot(ori_data);
title(‘Amplitude of Original Signals’)
xlabel(‘Sample Number’);
ylabel(‘Amplitude’);
subplot(2,1,2);
plot(quan_data);
title(‘Amplitude of quantized Signals’);
xlabel(‘Sample Number’);
ylabel(‘Amplitude’);
Plot the SNR as a function of the number of quantization bits R, where R varies from 1 to
- This requires you to call the quantization function once for each value of R, that is,
R=1,2,3,…,10, and record the resulting SNR value. You may wish to automate this
process using a loop:
SNR = zeros(1, 10);
for R=1:10
[snr, ori_data, quan_data, sr] = QUAN_demo(R, filename);
SNR(R) = <ADD YOUR CODE HERE>
end;
Note that the returned snr value from QUAN_demo is not in decibels (dB). You will
need to convert this value into dB, as explained before.
Part 2.c (3 marks)
Listen to the output quantized speech files as R varies from 1 to 10. What value of R (i.e.,
bits/sample) and corresponding bit rate provides, in your opinion, the level of quality that
you associate with that in the fixed public telephone network (landline)? (Hint: bit rate =
sample rate x bits/sample)