SPEECH TO MUSIC - Blog

SPEECH TO MUSIC

FIN

4/29/2018

I cannot believe its over. The highs and lows over the past three months have been incredible. What an experience. I thought the presentation went well. I really hope people could understand my explanations of the math behind my project.

To whom ever gets the project next year, please continue on with what I have started. I believe That this project is the start of the finish of this project.

This project was so rewarding, I cannot thank Dr. Pankratz, Dr. McVey, and Dr. Meyer enough.

For the last time
-Chance Browning

SO CLOSE!

4/22/2018

HUGE NEWS! I was grinding pretty hard on Friday night, again trying to get the fft frequencies of the chunks of a wav file. And I finally did it! So I was testing using my sine wave wav file with a known frequency of 440. So I ran it through my algorithm and got 440 each time it went through. I thought, well maybe its just reading the same portion or something like that. So I then tested it with several files that were music files and speech files. Each time I went through the loop I was getting different frequencies, which is what we want! I have not tested it together yet, so I am hoping the transitions do not sound too bad.

For the rest of the night, I am going to be pulling together my code that gets the frequency chunks and my method for shifting pitches in chunks. This should be the final step before the project presentation on Thursday, which I will obviously have to complete as well. I am kind of bummed that I will not be able to get to functionality of the user being able to select the pieces that they want. but hey, I got some real progressed, learned a lot, and hopefully next years victim will figure out what I couldn't!

Things are FINALLY coming together

We Got Something

4/17/2018

This week was a grind. I had a FFT that I thought was working, but upon further inspection, It was wrong. The FFT is supposed to be an array where the index of the "peaks", or large values of the array, indicate the frequency that is most dominate throughout the wav file (or portion of the file).

I was eventually able to find an FFT that was able to do what I wanted. But there was an issue with it. While testing, I was using a sine wave with a known frequency of 440. But when I went to check the largest value of the array, I found it to actually be at index 4400. Which was a little strange, so I tried it with a wave and frequency of 22., again I got the index of the largest value to be 2200. This was an issue with the any wave file I used. I tried many different things to try and I could not get this factor of 10 to go away. So for now, I left it be. Dr. Pankratz believes he may know why it is happening, but at this point, if it stays consitant, then I don't see a problem with it. If a peak is to occur at a low index (like 10 or 11) then the frequency would be 1hz, which a human cannot hear anyway. Because of this I do not see an issue with the factor of 10.

Using this FFT, which by the way is a method of the dll MathNet, it uses the Naudio class AudioFileReader, which reads the whole wav file. The concern I have is that the result of the FFT will give me the frequency, but the Pitch Shift class will be using a different array, gathered in a different manner. This kind of worries me for when I start breaking up the files and determine the frequencies for that section.

There are some concerns stated above, but I actually did get my current algorithm to work for shifting a frequency of a speech file to match the frequency of the 440 sine wave and it sounds great! I need to work on the current math I have for shifting up and down by octaves. I need to do this independently because the pitch shift cannot go more than an octave up or down. Because of this, if it needs to be shifted by more than an octave, there needs to be some math done in the back end to get a consistent sound.

I hope to post another time during this week about progress throughout the week.

- Chance Browning

Frequency pt 2. Too Many Black Boxes

4/8/2018

I think I finally got the frequency. Now I what your are thinking, I thought you already did this? Well, I thought I already did it too, but then I realized, by looking closer at the documentation, that I was actually getting the decibel of the wav files, which, after a few Google searches, I found it to be vastly different from hz (the measurement unit of frequency).

I also still fear that I will still be unable to use my new found frequencies with my PitchShift function. I think this because even small numbers in the PitchShift make drastic changes, and the numbers that I have with this new implementation are fairly large. The source that I acquired this code from also has a pretty rough description of what the methods that I took are supposed to be getting. I tried my best to copy what they had done to achieve their goals of the methods, but they way they are using them is slightly different and that makes me worry a little bit.

Other fears that I have at this moment, not knowing what the FFT is supposed to be doing. Over this past week, I was doing some reading up on FFT, and the more I read, the more confused I was getting. A lot of it has to do with the jargon. I hope to clear this up with DCP. We will be meeting to discuss FFT and hopefully get a better understanding of them.

I plan on posting another blog entry soon with some more details about how these new frequencies, and my new discoveries about FFT are coming along.

I really hope this post makes sense in the morning, for I am far too tired to really comprehend much more tonight.

Till next time,
Chance Browning

Too Slow!

4/2/2018

So, I will keep this short, but I believe I have the correct algorithm, its just to slow! I will try to briefly explain what I am doing. Like I said last time, I am able to create a double array from the speech and music wav files. So my plan was to take the difference of the two at a specific point (using a for loop, each double array being the same size) alter a copy of the speech wave so that the whole file pitch is shifted to the difference, then copy only the portion from that copy wav onto the speech file. Seems like it should work, but here is the problem, it is taking so long that it cannot finish. It is hard to tell if it is stuck, or that it is just taking a really really long time. There reason it is so slow is because the array of doubles are 1 million and 700 thousand indexes long.

To combat this problem, I need to find a good number of sample points so that the algorithm produces a good final product without taking up too much time. Too much time in my book would be something over 5 minutes. Another idea I had was that I could let the user pick how many samples points they would like to use, the problem with that is if they pick something small, don't like it and want to try again, it will be a lot of time that the user is just sitting there looking at the screen doing nothing. I would also like to implement a window that tells the user the files are being processed so that they are not just staring at the screen wondering if it is working or not.

I want to meet with Dr. Meyer to talk about this sometime at the begining of the week to talk about the number of sample points, and I want to talk to Dr. McVey about the sample points as well as having a "Processing Window"

Until next time,
- Chance Browning

Rollin'

3/27/2018

HUGE things to talk about this week. Lots of progress and the right moves made. I came into this past week knowing that it was going to be a grind. I didn't have to much going on in my other classes, so I really started grinding out this project.

Coming into this past week, I had the idea of the soundtouch class, a class that can change the tempo and pitch of a wave file. I got it working and was like "YES HERE WE GO!" I had finally altered a wav file. But there was a problem, I didn't know how to display that file like the original files. I thought I should save it and then open it using the wavereader object of naudio. Another problem rose, how can I save this? I couldn't pull the data from the altered file and save it, and didn't know how to write a header. Very frustrating.

Come Friday, I decided to scrap the soundtouch class. Kind of risky considering I was able to produce so results. But I found an amazing PitchShift project. This project was very simple and included a class in which I could shift the pitch of the whole wav file AND save that file. Worked like a charm. Then, because I could save it, I could output the wav file like all the others. DONE.

Next task, how do I compare the frequency of the speech file to the music file? Once again, I got pretty lucky but found an implementation of a Fourier Transform that takes the bits of the wav file (produced in our PitchShift project) and converters their frequencies to a double. So now, I have arrays of doubles that represent frequency.

For this next week, I need to find the best want to interpret this data. Right now I am thinking that if I take the diference and alter the speech double array, then find away to convert that double array back into a wav file, that should do the trick! I also want to meet with Dr. Meyer to talk about the sample points and how to make my Fourier Transform algorithm faster.

For now,
Chance Browning

Spring Break! Woo!

3/19/2018

this past week was spring break, and I spent mine home with friends and family. I also spent it sitting in front of this laptop screen busily trying to find answers and solutions to this project. The only thing I found was more and more questions. I tried implementing a SoundTouch and SoundProcessor class, with little to no success.

The SoundTouch class had a C# wrapper class (because the library is actually written in C++). The class complies, but once I try to use an object of the class, I get an error message and I am not sure what to do about this. I want to meet with Dr. McVey about this and hopefully can get a solution.

Looking further along, I can see that this SoundTouch class can help me change the pitch, but not detect the pitch. Unless I am not seeing something, only half of the problem can be solved.

This week is going to be a grind and this project is my #1 priority. I need to make some strides this week in order to have a better product for next week.

Houston, We Have Bytes!

3/4/2018

Bytes maybe a small item in the grand scheme of things, but this week's break through was huge and I am excited that I have meet my goal for the week. The highs, lows, confusion and accomplishments of programming can all be seen though out my findings (or struggling) of this past week.

I started out having big hopes for the week, having found an auto-tune project that looked to be everything I wanted and needed for my own project. I set up a meeting time with Dr. McVey to discus the code and break it down to see how it could be of use to me. When we first started looking, we both that the project was perfect, having used libraries that I was already using and containing useful and descriptive functions.

However, there was a problem with the way the project was presented. There was no source code. He presented the code by have a paragraph or two in between the pieces of code. The paragraphs were helpful, but there was no actual files and source code. This posed an issue. We couldn't figure out all of the libraries he was using. Non the less, we gave it a shot. So for one night I was made a couple classes like the project said, added them to a project, then went searching for all of the other classes that needed to be added. This took way to much time, and I was searching for what seemed to be hours for classes. At first it didn't seem to bad, add one or two classes and then be done, but as I dug farther and farther in, I had to add more and more classes. Then, by the time I got to what seemed to be the end, there were errors in the code. These errors were things that I could not fix because I realized that I had absolutely no idea what was going on.

Not knowing what is going on, does not sit well with me. So, I decided to go back and take a look at an earlier form post. It talked about using the WaveFileReader and its classes to read chunks of data. The big break through was realizing that I didn't need to add the class in because it was already apart of the NAudio dll. I already did it earlier when displaying the wave. So, by using this object, I was able to get the byte array of the data.

Above is an example of some of the data I was able to pull from the wav file. The first number is the number of RiffChunks in a list object pulled from the WaveFileReader class. I was superseded to see that there is only one chunk. I want to look more into the size of these chunks. This is only about a 30 second audio clip, but bytes are very small. I just thought there would be more. The second label there is the actual bytes in the RiffChunk. This is pretty cool considering this is what the wav is made up of. I need to dig more into the bytes and how they correlate with the sound, but this seems to be mystery to most people.

Here is another example:

This is using a completely different song, the whole song, a song that is over 4 minuets long, but there is only one RiffChunk. Again, this is something that I have to look into. I am not sure what makes up a chunk.

Goals for this week: research Riff Chunks and start to combine wav files. I think I am going to first try to average the the byte arrays into one and see what happens. I did a little bit of research tonight and saw that people suggest to do this you must convert bytes of both wav files into shorts. Now I am not sure what shorts are but we are going to find out.

-Chance Browning

Progress In Data

2/25/2018

A successful week of research under the belt. Huge finds setting up for big progress this week. All that and more in this blog post!

The week started with some research. I am trying to detect the pitch of a WAV file, and I found one that seemed promising. I found some open sourced code related with the NAudio dll I have been using. Perfect! But there were some problems.

I met with Dr. McVey to look cover the code, and she even had a hard time deciphering some of the code. The problem with the code is poor documentation. There are little to no comments in the code, and some of the decisions made puzzled us both. Grabbing chunks of code to read (awesome), but then resting it to go back to what it just read (what??). Non the less, is was a promising start.

I told Dr. McVey that I would continue to look into the code I presented her with, but my curiosity kept me googling searching for more code. I then stumbled upon something fantastic. An open sourced auto-tuner, using the same NAudion dll. PERFECT! This code showed so much promise, great documentation, and positive feedback from other users. I have not meet with Dr. McVey to look over some of this code, but I hope to do that soon, as well as finish reading the whole document. Things are looking up.

From there, I went and met with Dr. Meyer again. We talked about Fast Fourier Transforms. I was confused how these were going to be incorporated into my project yet. I understand their purpose, to reduce noise in the wav file, but failing to see how I can do this. I had a great meeting with Dr. Meyer, we met for almost and hour and covered some huge topics. Not only did we talk about FFT, we talked about ways to progress through this project. He raised questions to me that made me think about the direction and what to expect during different points of the project. The meeting was helpful for me and I look forward to meeting with Dr. Meyer more regularly as he has expressed great interest in my project.

Goals for this week. Get a byte array to pull raw data from my wav file, and be able to register that data. I think this is a legitimate goal for the week with the code that I have found. I am very excited to continue digging more into this source code and implementing it.

-Chance Browning

Hello World

2/18/2018

This week I was able to get a Hello World up and running. My idea for a Hello World program was to be able to see the WAV file. To start, I went back and viewed a previous capstone similar to this project. In that project, the student used this dll NAudio. That is were I started.

Digging into NAudio, I found that they have an object called a waveViewer. This waveView is very customization. I watched a few tutorial videos and had some pretty decent looking programs.

At this point, I am going to stick with something that looks like the first one. There are some cool things that I could do with customization. Things like changing the color, getting a cursor to appear, and even being able to zoom in and trim. I ran into a few issues when trying to trim and the cursor, but think they will be easy fixes. I like the first test better because I can picks the number of sample points. I am curious as to if I can override some functions in order to incorporate a Fast Fourier Transform.

During my mini poster session, I was given a few good ideas and things to look into, some ideas about how I should approach changing the pitch, and some tools and applications that already do it. The next thing that I want to accomplish will be to pull information from the WAV file, try to find out if I can detect the pitch. At the same time, I am going to start digging deep into Fast Fourier Transform. I want to do some more research and hopefully find a time to sit down with Dr. Meyer once again.

- Chance Browning