![]() ![]() Photosounder 1.1 actually only has a frequency resolution (or should I say pitch resolution) of 24 pixels per octave, and the time resolution varies depending on the frequency, which is why bass sounds are usually poorly retranscribed, but overall it’s the best one size fits all compromise I could find. Therefore, in such programs as Photosounder, you have to balance frequency resolution with time resolution, and find a satisfying compromise. you can’t tell when something happens in time. You’re right in that FFTs will do a great job at identifying the harmonic content of a sound, but the thing is, FFTs are run on entire chunks of sound, which is not exactly the case here.īasically, while you can get all the frequency resolution you want with a FFT on the whole chunk of sound, you’ll have no time resolution at all, i.e. It’s actually not really like FFTs, I mean sure, Photosounder uses FFTs, but only for the sake of speed, it doesn’t have to, and doesn’t use the concept directly. Not quite sure.Ĭhaircrusher : You’re right about transient sounds being the first casualties in that sort of processing, but it’s not quite as simple as “pitch is preserved, transients are lost”. I haven’t tested that in a while, but if I recall correctly if you push the blurring quite a bit then a “regular” instrument such as this electric piano should sound a bit more like a flute or like strings. I believe that effect works good on speech. Firstly, it blurs horizontally, which can be a pretty weird effect if you push it quite a bit (understand until you have the equivalent of less than 10 pixels per second) it sounds like the sound is slowed down, but if you have an idea what the original sound is you can tell it’s going at its normal rate. There’s really two things the Gaussian blur does, that I believe are two separate effects. There’s a bunch of spectral processing programs out there that can do creative stuff with sound this is just one particular application. Given that the transients that represent percussive note attacks are sometimes only a fraction of a millesecond long, some smearing is inevitable, even if you don’t monkey with the image in Photoshop.īut yeah, this is great for getting all sorts of crazy vague blurry effects. ![]() have a 2 second sound converted to a 2000-pixel-wide image, you will get out 2000 one-millesecond slices, each containing the spectral content of a millesecond of the original sound. FFTs will do a near-perfect job of identifying the harmonic content of a sound, but if you e.g. Part of the problem is the photo resolution, while it might be huge in terms of your computer screen, is not ‘wide’ enough to accurately represent sounds. To test this, run a drum loop through Photosounder - convert to an image and then back again, and compare with the original. If you do any processing at all in the frequency domain, you’re going to have blurred transients. random cropping, scaling & brightness adjustment) on images to (a) convert odd-sized images to a constant size, (b) synthesize more data and (c) encourage the network to generalise.I haven’t messed around with Photosounder yet, but I know a little about FFTs and IFFTs. Using smaller images can also help your network generalise better, too, as there is less data to overfit.Ī technique often used in image classification networks is to perform distortions (e.g. Often this is not the case - you can probably resize a photo of a bus down to say 128x128 and still recognize that it's a bus. 100-1000 images in one pass, which you might not be able to do on a single machine with high res imagery).Īs to whether to resize, you need to ask yourself if every pixel in that image is critical to your task. Smaller images will train significantly faster, and possibly even converge quicker (all other factors held constant) as you will be able to train on bigger batches (e.g. There may be some cases where it speeds things up (e.g. ![]() ![]() It's certainly not a requirement that your images be powers of two. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |