INDEX HISTORY DIGITAL TO TEXT SOFTWARE APPLICATIONS FUTURE REFERENCES
How Analog Becomes Digital...
Analog sound is transformed into digital data by mathematics and electronics. By using hardware tools and a processor to do the calculations, the transformations take place.
When sound is created, it releases energy that is called acoustic pressure. Talking produces changes in acoustic pressure. A microphone can pick up these changes and then transmits them to a sound card in a computer. The sound card then processes them into digital data.
The Sine Wave
There are several different elements working together to make recording sound work. Sound travels through the air in a shape of a sine wave. Nyquist Theory states that any sine wave can be recreated as long as it is sampled twice per cycle. Humans can hear frequency from about 20Hz to 22,500Hz. The sampling rate from Nyquist’s Theory explains why the best digital audio is recorded at 44.1 KHz. (22.5KHz *2) Nyquist’s Plot for frequency domain works with electronic feedback circuit analysis and trigonometric knowledge. This is the basis of the mathematical calculations that the sound card’s processor uses to transform analog information into digital.
Fourier transform is a mathematical calculation that transforms sine waves into 3D graphs. One type of tool to look at these graphs is called a spectrogram. The spectrogram visually shows sound wave. Frequency, amplitude, phase and time are all part of the sound wave. These factors of the sound wave are much easier to watch on a spectrogram. This graph is a frequency * time graph.
How does sound travel into the computer?
A microphone has the ability to pick up sound waves with a diaphragm. The diaphragm moves with the acoustic pressure.These very small movements from the diaphragm are measured in voltage. This voltage is then transmitted to a sound card. The sound card takes these voltages, calculates them into electronic pulses, and records them. The analog to digital signal processor handles these computations. The sound card needs to have a fast processor to capture all the peaks and troughs of the original waveform. The sound card runs independently from the computer’s main processor.
What is quantizing your voice?
Pulse Code Modulation is the most common method of encoding (quantizing) an analog voice signal into a digital (binary) bit stream. Alex Reeves improved Nyquist’s original formula of sampling. Reeves proposed that instead of using Bell's 'voice-shaped current' that sound should be sampled at steady intervals. The values of the samples at these intervals would be represented in binary numbers and transmitted as pulses. This is commonly called Pulse Code Modulation.
Pulse Code Modulation (PCM) formula has been enhanced in the last 50 years. A more effective modulation is adaptive differential pulse-code modulation(ADPCM). ADPCM quantizes the difference in the speech signal then converts them to 4-bit samples, yielding a compression rate of 4:1. In the mid 1980’s, ADPCM was standardized. It was often called A-law PCM. Since the 1980’s then there have been several improvements and name changes to the standardization.
An overlaying problem of analog to digital speech is that most sound does not arrive in single tones but in multi-frequency tones. Nyquist’s formula only took in consideration for single frequency tones. There are certain vowels in the English language that have only one frequency, such as ma, maw, mow, and moo. There are also certain vowels that have two frequencies, such as mat, met, mate, and meat. Constants hold a higher frequency than vowels. This complexity of having many frequencies for a single vowel creates difficulties in calculating the correct quantization in the sound card. These difficulties in creating the correct calculation is why pattern matching for words evolved. These are two examples of audio spectrum analysis wave files. One picture shows how voice singing a single tone is represented on a spectrograph. The other picture shows how a sentence is represented. You can see how dramatically the two differ. This difference shows the complexity of a human language. These differences must be accounted for in the calculations of the sound card’s processor and in the software using this information.
To hear the waveform, click on the picture.
INDEX HISTORY DIGITAL TO TEXT SOFTWARE APPLICATIONS FUTURE REFERENCES
©2003 St. Norbert College |
Email the Webmaster |