Return to site

Stringed 2 8 – Shift Pitch And Manipulate Tempo Music

broken image


  1. Stringed 2 8 – Shift Pitch And Manipulate Tempo Music Video
  2. Stringed 2 8 – Shift Pitch And Manipulate Tempo Music Examples
  3. Stringed 2 8 – Shift Pitch And Manipulate Tempo Music Youtube

Time stretching is the process of changing the speed or duration of an audio signal without affecting its pitch. Pitch scaling is the opposite: the process of changing the pitch without affecting the speed.Pitch shift is pitch scaling implemented in an effects unit and intended for live performance. Pitch control is a simpler process which affects pitch and speed simultaneously. The world's most advanced time and pitch shifter plugin, SoundShifter makes it easy to change keys, tempos, durations, and manipulate the time-pitch continuum. 3758b9b5-045c-4b7d-b020-80f9b068d990 07:10 AM EST. Pitch Switch is the fast, easy way to change the tempo or key of virtually any music file on PC or Mac. Just open up your favorite music and use the 'Pitch' and 'Tempo' sliders. Like magic your songs are automatically adjusted to suit your needs, right before your ears! It still doesn't offer pitch and tempo processing in real time, and it's still pretty impenetrable in its implementation. Recently, Apple have included new pitch and tempo manipulation algorithms into Tiger alongside AUPitch, a real-time pitch-changing plug-in. Logic can utilise these new algorithms in the Time Machine as well as AUPitch. Manipulate sounds from the real world to create compositions that exist only on magnetic tape. Nationalism A general trend in nineteenth-century music in which composers sought to emphasize their national, cultural, and ethnic heritage by incorporating national stories and myths, folk tunes, and dances into their.

(Redirected from Audio time-scale/pitch modification)

Time stretching is the process of changing the speed or duration of an audio signal without affecting its pitch. Pitch scaling is the opposite: the process of changing the pitch without affecting the speed. Pitch shift is pitch scaling implemented in an effects unit and intended for live performance. Pitch control is a simpler process which affects pitch and speed simultaneously by slowing down or speeding up a recording.

These processes are often used to match the pitches and tempos of two pre-recorded clips for mixing when the clips cannot be reperformed or resampled. Time stretching is often used to adjust radio commercials[1] and the audio of television advertisements[2] to fit exactly into the 30 or 60 seconds available. It can be used to conform longer material to a designated time slot, such as a 1-hour broadcast.

Resampling[edit]

The simplest way to change the duration or pitch of a digital audio clip is through sample rate conversion. This is a mathematical operation that effectively rebuilds a continuous waveform from its samples and then samples that waveform again at a different rate. When the new samples are played at the original sampling frequency, the audio clip sounds faster or slower. Unfortunately, the frequencies in the sample are always scaled at the same rate as the speed, transposing its perceived pitch up or down in the process. In other words, slowing down the recording lowers the pitch, speeding it up raises the pitch. This is analogous to speeding up or slowing down an analogue recording, like a phonograph record or tape, creating the Chipmunk effect. Using this method the two effects cannot be separated. A drum track containing no pitched instruments can be moderately sample-rate converted for tempo without adverse effects, but a pitched track cannot.

Frequency domain[edit]

Phase vocoder[edit]

One way of stretching the length of a signal without affecting the pitch is to build a phase vocoder after Flanagan, Golden, and Portnoff.

Basic steps: https://youtubekindl177.weebly.com/best-paying-video-poker-machines-las-vegas.html.

  1. compute the instantaneous frequency/amplitude relationship of the signal using the STFT, which is the discrete Fourier transform of a short, overlapping and smoothly windowed block of samples;
  2. apply some processing to the Fourier transform magnitudes and phases (like resampling the FFT blocks); and
  3. perform an inverse STFT by taking the inverse Fourier transform on each chunk and adding the resulting waveform chunks, also called overlap and add (OLA).[3]

The phase vocoder handles sinusoid components well, but early implementations introduced considerable smearing on transient ('beat') waveforms at all non-integer compression/expansion rates, which renders the results phasey and diffuse. Recent improvements allow better quality results at all compression/expansion ratios but a residual smearing effect still remains.

The phase vocoder technique can also be used to perform pitch shifting, chorusing, timbre manipulation, harmonizing, and other unusual modifications, all of which can be changed as a function of time.

Sinusoidal analysis/synthesis system (based on McAulay & Quatieri 1988, p. 161)[4]

Sinusoidal spectral modeling[edit]

Another method for time stretching relies on a spectral model of the signal. In this method, peaks are identified in frames using the STFT of the signal, and sinusoidal 'tracks' are created by connecting peaks in adjacent frames. The tracks are then re-synthesized at a new time scale. This method can yield good results on both polyphonic and percussive material, especially when the signal is separated into sub-bands. However, this method is more computationally demanding than other methods.[citation needed]

Modelling a monophonic sound as observation along a helix of a function with a cylinder domain

Time domain[edit]

SOLA[edit]

Rabiner and Schafer in 1978 put forth an alternate solution that works in the time domain: attempt to find the period (or equivalently the fundamental frequency) of a given section of the wave using some pitch detection algorithm (commonly the peak of the signal's autocorrelation, or sometimes cepstral processing), and crossfade one period into another.

This is called time-domain harmonic scaling[5] or the synchronized overlap-add method (SOLA) and performs somewhat faster than the phase vocoder on slower machines but fails when the autocorrelation mis-estimates the period of a signal with complicated harmonics (such as orchestral pieces).

Adobe Audition (formerly Cool Edit Pro) seems to solve this by looking for the period closest to a center period that the user specifies, which should be an integer multiple of the tempo, and between 30 Hz and the lowest bass frequency.

This is much more limited in scope than the phase vocoder based processing, but can be made much less processor intensive, for real-time applications. It provides the most coherent results[citation needed] for single-pitched sounds like voice or musically monophonic instrument recordings.

High-end commercial audio processing packages either combine the two techniques (for example by separating the signal into sinusoid and transient waveforms), or use other techniques based on the wavelet transform, or artificial neural network processing[citation needed], producing the highest-quality time stretching.

Frame-based approach[edit]

Frame-based approach of many TSM procedures

https://hererfile676.weebly.com/cleopatra-games-online-free.html. In order to preserve an audio signal's pitch when stretching or compressing its duration, many time-scale modification (TSM) procedures follow a frame-based approach.[6]Given an original discrete-time audio signal, this strategy's first step is to split the signal into short analysis frames of fixed length.The analysis frames are spaced by a fixed number of samples, called the analysis hopsizeHa∈N{displaystyle H_{a}in mathbb {N} }.To achieve the actual time-scale modification, the analysis frames are then temporally relocatedto have a synthesis hopsizeHs∈N{displaystyle H_{s}in mathbb {N} }.This frame relocation results in a modification of the signal's duration by a stretching factor ofα=Hs/Ha{displaystyle alpha =H_{s}/H_{a}}.However, simply superimposing the unmodified analysis frames typically results in undesired artifactssuch as phase discontinuities or amplitude fluctuations.To prevent these kinds of artifacts, the analysis frames are adapted to form synthesis frames, prior tothe reconstruction of the time-scale modified output signal.

The strategy of how to derive the synthesis frames from the analysis frames is a key difference amongdifferent TSM procedures.

Speed hearing and speed talking[edit]

For the specific case of speech, time stretching can be performed using PSOLA.

While one might expect speeding up to reduce comprehension,Herb Friedman says that 'Experiments have shown that the brain works most efficiently if the information rate through the ears—via speech—is the 'average' reading rate, which is about 200–300 wpm (words per minute), yet the average rate of speech is in the neighborhood of 100–150 wpm.'[7]

Speeding up audio is seen as the equivalent of speed reading.[8][9]

Pitch scaling[edit]

Pitch shifting (Frequency scaling) is provided on EventideHarmonizer
Frequency shifting provided by Bode Frequency Shifter does not keep frequency ratio and harmony.

These techniques can also be used to transpose an audio sample while holding speed or duration constant. This may be accomplished by time stretching and then resampling back to the original length. Alternatively, the frequency of the sinusoids in a sinusoidal model may be altered directly, and the signal reconstructed at the appropriate time scale.

Transposing can be called frequency scaling or pitch shifting, depending on perspective.

For example, one could move the pitch of every note up by a perfect fifth, keeping the tempo the same.One can view this transposition as 'pitch shifting', 'shifting' each note up 7 keys on a piano keyboard, or adding a fixed amount on the Mel scale, or adding a fixed amount in linear pitch space.One can view the same transposition as 'frequency scaling', 'scaling' (multiplying) the frequency of every note by 3/2.

Musical transposition preserves the ratios of the harmonic frequencies that determine the sound's timbre, unlike the frequency shift performed by amplitude modulation, which adds a fixed frequency offset to the frequency of every note. (In theory one could perform a literal pitch scaling in which the musical pitch space location is scaled [a higher note would be shifted at a greater interval in linear pitch space than a lower note], but that is highly unusual, and not musical[citation needed]).

Time domain processing works much better here, as smearing is less noticeable, but scaling vocal samples distorts the formants into a sort of Alvin and the Chipmunks-like effect, which may be desirable or undesirable.A process that preserves the formants and character of a voice involves analyzing the signal with a channel vocoder or LPC vocoder plus any of several pitch detection algorithms and then resynthesizing it at a different fundamental frequency.

A detailed description of older analog recording techniques for pitch shifting can be found within the Alvin and the Chipmunks entry.

See also[edit]

others
  • Dynamic tonality — the real-time changes of tuning and timbre for new chord progressions, musical temperament modulations, etc.

References[edit]

  1. ^https://web.archive.org/web/20080527184101/http://www.tvtechnology.com/features/audio_notes/f_audionotes.shtml
  2. ^http://www.atarimagazines.com/creative/v9n7/122_Variable_speech.php
  3. ^Jont B. Allen (June 1977). 'Short Time Spectral Analysis, Synthesis, and Modification by Discrete Fourier Transform'. IEEE Transactions on Acoustics, Speech, and Signal Processing. ASSP-25 (3): 235–238.
  4. ^McAulay, R. J.; Quatieri, T. F. (1988), 'Speech Processing Based on a Sinusoidal Model'(PDF), The Lincoln Laboratory Journal, 1 (2): 153–167, archived from the original(PDF) on 2012-05-21, retrieved 2014-09-07
  5. ^David Malah (April 1979). 'Time-domain algorithms for harmonic bandwidth reduction and time scaling of speech signals'. IEEE Transactions on Acoustics, Speech, and Signal Processing. ASSP-27 (2): 121–133.
  6. ^Jonathan Driedger and Meinard Müller (2016). 'A Review of Time-Scale Modification of Music Signals'. Applied Sciences. 6 (2): 57. doi:10.3390/app6020057.
  7. ^Variable Speech, Creative Computing Vol. 9, No. 7 / July 1983 / p. 122
  8. ^http://www.nevsblog.com/2006/06/23/listen-to-podcasts-in-half-the-time/
  9. ^https://web.archive.org/web/20060902102443/http://cid.lib.byu.edu/?p=128

External links[edit]

  • Time Stretching and Pitch Shifting Overview A comprehensive overview of current time and pitch modification techniques by Stephan Bernsee
  • Stephan Bernsee's smbPitchShift C source code C source code for doing frequency domain pitch manipulation
  • pitchshift.js from KievII A Javascript pitchshifter based on smbPitchShift code, from the open source KievII library
  • The Phase Vocoder: A Tutorial - A good description of the phase vocoder
  • How to build a pitch shifter Theory, equations, figures and performances of a real-time guitar pitch shifter running on a DSP chip
  • ZTX Time Stretching Library Free and commercial versions of a popular 3rd party time stretching library for iOS, Linux, Windows and Mac OS X
  • Elastique by zplane commercial cross-platform library, mainly used by DJ and DAW manufacturers
  • Voice Synth from Qneo - specialized synthesizer for creative voice sculpting
  • TSM toolbox Free MATLAB implementations of various Time-Scale Modification procedures
  • Pitch Shifter Audio Tool Online pitch-shifting audio tool implemented by SoundTouch algorithm
Retrieved from 'https://en.wikipedia.org/w/index.php?title=Audio_time_stretching_and_pitch_scaling&oldid=977379317'

This tutorial gives a brief overview of the most popular algorithms used for achieving time stretching and pitch shifting in a musical context, along with their advantages and disadvantages. We provide audio examples to demonstrate common artifacts and effects associated with these processes, and provide pointers to papers and other resources on the net.

1. Introduction – Pitch Shifting
As opposed to the process of pitch transposition achieved using a simple sample rate conversion, Pitch Shifting is a way to change the pitch of a signal without changing its length. In practical applications, this is usually achieved by changing the length of a sound using one of the below methods and then performing a sample rate conversion to change the pitch. There exists a certain confusion in terminology, as Pitch Shifting is often also incorrectly named ‘Frequency Shifting'. A true Frequency Shift (as obtainable by modulating an analytic signal by a complex exponential) will shift the spectrum of a sound, while Pitch Shifting will dilate it, retaining the harmonic relationship of the sound. Frequency Shifting yields a metallic, inharmonic sound which may well be an interesting special effect but which is a totally inadequate process for changing the pitch of any harmonic sound except a single sine wave.

1.1 Audio Examples:

Original Sound
(WAVE, 106k)
Pitch Shifted Sound
(WAVE, 106k)
Frequency Shifted Sound
(WAVE, 106k)

1.2 Time Compression/Expansion
Time Compression/Expansion, also known as 'Time Stretching' is the reciprocal process to Pitch Shifting. It leaves the pitch of the signal intact while changing its speed (tempo). This is a useful application when you wish to change the speed of a voiceover without messing with the timbre of the voice. There are several fairly good methods to do time compression/expansion and pitch shifting but most of them will not perform well on all different kinds of signals and for any desired amount of shift/stretch ratio. Typically, good algorithms allow pitch shifting up to 5 semitones on average or stretching the length by 130%. When time stretching and pitch shifting single instrument recordings you might even be able to achieve a 200% time stretch, or a one-octave pitch shift with no audible loss in quality Export for itunes 2 0 2.

2. Techniques Used for Time Compression/Expansion and Pitch Shifting
Currently, there are two different principal time compression/expansion and pitch shifting schemes employed in most of today's applications:

2.1 Phase Vocoder.

This method was introduced by Flanagan and Golden in 1966 and digitally implemented by Portnoff ten years later. It uses a Short Time Fourier Transform (which we will abbreviate as STFT from here on) to convert the audio signal to the complex Fourier representation. Since the STFT returns the frequency domain representation of the signal at a fixed frequency grid, the actual frequencies of the partial bins have to be found by converting the relative phase change between two STFT outputs to actual frequency changes. Note the term ‘partial' has nothing to do with the signal harmonics. In fact, a STFT will never readily give you any information about true harmonics if you are not matching the STFT length the fundamental frequency of the signal – and even then is the frequency domain resolution quite different to what our ear and auditory system perceives. The timebase of the signal is changed by calculating the frequency changes in the Fourier domain on a different time basis, and then an iSTFT is done to regain the time domain representation of the signal.

Table 1: Fourier Transform Pointers:
Jean Baptiste Joseph Fourier bio
Discrete Time FT Basics
Dave Hales FFT Laboratory (requires Java capable browser)
S.M.Bernsee's DFT à Pied article (with C code)
Chris Bores' Online DSP Courses

Phase vocoder algorithms are used mainly in scientific and educational software products (to show the use and limitations of the Fourier Transform) but have gained in popularity over the past few years due to improvements that made it possible to greatly reduce the artifacts of the 'original' phase vocoder algorithm. The basic phase vocoder suffers from a severe drawback because it introduces a considerable amount of artifacts audible as ‘smearing' and ‘reverberation' (even at low expansion ratios) due to the non-synchronized vertical coherence of the sine and cosine basis functions that are used to change the timebase. Puckette, Laroche and Dolson have shown that the phasiness can be greatly reduced by picking peaks in the Fourier spectrum and keeping the relative phases around the peaks unchanged. Even though this improves the quality considerably it still renders the result somewhat phasey and diffuse when compared to time domain methods. Current research focuses on improving the phase vocoder by applying intra-frame sinusoidal sweep and ramp rate correction (Bristow-Johnson and Bogdanowicz) and multi-resolution phase vocoder concepts (Bonada).

2.1.1 Related topics

There often is a certain confusion between a ‘regular' (channel) and the phase vocoder. Both of them are, aside from technical details, obviously different in that they are used to achieve different effects. The channel vocoder uses two input signals to produce a single output channel while the phase vocoder has a one-in, one-out signal path. In the channel vocoder as applied to music processing, the modulator input signal is split into different filter bands whose amplitudes are modulating the (usually) corresponding filter bands splitting the carrier signal. More sophisticated (and expensive) approaches also separate voiced and unvoiced components in the modulator (or, for historical reasons ‘speech') input, i.e. vowels and sibilancies, for independent processing.The channel vocoder can not be successfully applied to the time/pitch scaling problem, in the musical context it mainly is a device for analyzing and imposing formant frequencies from one sound on another. Both are similar in that they use filter banks (the STFT can be seen as a filter bank consisting of steep and slightly overlapping constant bandwidth filters) but a maximum of 22 are typical for channel vocoders while a phase vocoder usually employs a minimum of 512 or 1024 filter bands. The term Voice Coder (Vocoder) refers to the original application of the two processes in speech coding for military purposes.

2.1.2 Why Phase?

The term ‘phase' in phase vocoder refers to the fact that the temporal development of a sound is contained in its phase information – while the amplitudes just denote that a component is present in a sound, phase contains the structural information. The phase relationship between the different bins will reconstruct time-limited events when the time domain representation is resynthesized. The phase difference of each bin between two successive analysis frames is used to determine that bin's frequencies deviation from its mid frequency, thus providing information about the bin's true frequency (if it is not a multiple of the STFT frame in its period) and thus making a reconstruction on a different time basis possible.

Table 2: Pointers, Phase Vocoder:
The MIT Lab Phase Vocoder
WaveMasher – GPL/Open Source Phase Vocoder by Kenneth Sturgis
Sculptor: A Real Time Phase Vocoder by Nick Bailey
A Phase Vocoder implementation using Matlab
The IRCAM 'Super Phase Vocoder'
S.M.Bernsee's Pitch Shifting Using The Fourier Transform article (with C code)
Table 3: Pointers, sinusoidal modelling (Phase Vocoder-related technique):
SMS sound processing package (incl. executables for several platforms)
Lemur (Mac program along with references and documentation)

2.2 Time Domain Harmonic Scaling (TDHS).

This is based on a method proposed by Rabiner and Schafer in 1978. It is heavily based on a correct estimate of the fundamental frequency of the sound processed. In one of the numerous possible implementations, the Short Time Autocorrelation of the signal is taken and the fundamental frequency is estimated by picking the maximum value from all autocorrelation lags. Alternatively, one can use the Short Time Average Magnitude Difference function and find the minimum value, which is usually faster on an average CISC based computer systems. The timebase is changed by copying the input to the output in an overlap-and-add manner (therefore it's also sometimes referred to as ‘(P)SOLA' – (pitch) synchronized overlap-add method) while simultaneously incrementing the input pointer by the overlap-size minus a multiple of the fundamental period. This results in the input being traversed at a different speed than the original data was recorded at while aligning to the fundamental period estimated by the above method. This algorithm works well with signals having a prominent fundamental frequency and can be used with all kinds of signals consisting of a single (musically monophonic) signal source. When it comes to mixed-source (musically polyphonic) signals, this method will produce satisfactory results only if the size of the overlapping segments is increased to include a multiple of cycles thus averaging the phase error over a longer segment making it less audible. For Time Domain Harmonic Scaling the basic problem is estimating the pitch period of the signal, especially in cases where the actual fundamental frequency is missing. Numerous pitch estimation algorithms have been proposed and some of them can be found in the following references:

Table 4: Pointers and References, TDHS/Pitch estimation
‘C Algorithms for Realtime DSP' by Paul M. Embree, Prentice Hall, 1995 (incl. source code diskette)
Numerical Recipes in C‘ by W. Press, S. Teukolsky, W. Vetterling, B. Flannery, Cambridge University Press, 1988/92 (incl. source code examples, click title to read it online)
‘Digital Processing of Speech Signals' by L.R. Rabiner and R.W.Schafer, Prentice Hall, 1978 (no source code, covers TDHS basics)
‘An Edge Detection Method for Time Scale Modification of Acoustic Signals', Rui Ren, Computer Science Department, Hong Kong University of Science and Technology.
Dichotic time compression and spatialization‘ by Barry Arons, MIT Media Laboratory
Other papers related to Time Compression/Expansion by Barry Arons, MIT Media Lab

2.3 More recent approaches.

Due to the amount of objectionable artifacts produced by both of the above methods, there have been a number of more advanced approaches to the problem of time stretching and pitch shifting in the past years. One particular problem of both the TDHS and Phase Vocoder approaches is the high localization of the basis functions (where this term is applicable) in one domain with no localization in the other. The sines and cosines used in the Phase Vocoder have no localization in the Time Domain, which without further treatment contributes to the inherent signal smearing. The sample snippets used in the TDHS approach can be seen as having no localization in the frequency domain, thus causing multi-pitched signals to produce distortion.

Improving existing techniques: Scientific research currently focuses on improving both time and frequency domain methods by investigating and eliminating the possible causes of the artifacts in both domains. For example, there have been numerous improvements to the phase vocoder that were implemented in commercial products recently due to the availability of fast CPU speeds on desktop computers. Among these there is the idea to vertically synchronize phases across a phase vocoder analysis frame which was an idea originally conceived by Miller Puckette in 1995. This assumes tracking and identifying individual harmonics py peak-picking and peak-tracking, which in itself poses new problems, but the result is much more agreeable than that of a 'crude' phase vocoder. One commercial product utilizing this improved phase vocoder is Serato's Pitch'n Time whose algorithm is explained in detail here.

Adaptive basis transform algorithms: Aside from this, several entirely new methods have been devised. A method which was developed by Prosoniq uses an approach of representing the signal in terms of more complex basis functions that have a good localization in both the time and frequency domain (like certain types of wavelets have). The signal is transformed on the basis of the proprietary MCFE (Multiple Component Feature Extraction), for which the details are shrouded in trade secret but some information is available at the MPEX web site.

Wavelet and multiresolution techniques: Zynaptiq's proprietary ZTX technology comes in a free cross-platform C/C++ object library that exploits the good localization of wavelets in both time and frequency to build an algorithm for time and pitch manipulation that uses an arbitrary time-frequency tiling depending on the underlying signal. Additionally, the time and frequency localization parameter of the basis can be user-defined, making the algorithm smoothly scalable to provide either the phase coherence properties of a time domain process or the good frequency resolution of the phase vocoder.

Goofs: It is also worth mentioning that there have been some approaches that are flawed or nonsensical. Table 5a lists the most obvious one. The method proposed by Garas and Sommen for example will not work at all in the form he proposed, it is curious (yet understandable) that noone has noticed this. The sound files he has initially provided were flawed, too, and were silently buried when I began asking questions. In the light of recent developments in the area of improving the phase vocoder, they might be still some day prove to be interesting.

Table 5: Pointers, More recent approaches
The free ZTX Cross-Platform Library
The Prosoniq MPEX Time/Pitch manipulation technology (licensing of binary object code)
Scott Levine, Tony Verma, Julius O. Smith III. Alias-Free, Multiresolution Sinusoidal Modeling for Polyphonic, Wideband Audio. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, Mohnonk, NY, 1997.
Scott Levine, Julius O. Smith III. A Sines+Transients+Noise Audio Representation for Data Compression and Time/Pitch-Scale Modifications. 105th Audio Engineering Society Convention, San Francisco 1998.
Aaron Master. Peak-adaptive Phase Vocoder. ICASSP-02.
Table 5a: Pointers, nonsensical approaches
Time/Pitch Scaling Using The Constant-Q Phase Vocoder, J. Garas, P. Sommen, Eindhoven University of Technology, The Netherlands

3. Comparison
It is very difficult to objectively rate or compare various time compression/expansion and pitch shifting algorithms with regard to quality due to their nonlinearity and signal dependency. It is highly difficult to establish a solid measure to estimate their overall performance from simple test signals, because most of them tend to do very well with test signals due to their simple structure. There have been some proposals by Laroche and Dolson to estimate the 'phase coherence' from a set of variables obtained from analyzing the sound via the STFT. This is a good approach worth further studies, but still far from providing the type of judgements you can get from extensive listening tests, which I believe is still the method of choice for estimating the quality of a time compression/expansion algorithm. It is safe to say that none of the algorithms available today is free from flaws and problems across an arbitrary range of stretch ratios, even though many of them come very close to achieving a good quality. As is to be expected, the phase vocoder-based algorithms have to fight residual smearing which renders the results less 'punchy' and direct. The time domain methods have to cope with residual distortion, most notably when processing sounds that have critical harmonic relationships in their harmonics. And even though I realize that this might be a futile attempt to provide a comprehensive overview, I have produced a small number of excerpt audio examples of the various methods as well as some screen shots of impulse responses to show the performance in quality and coherence of each method in comparison.

3.1 Which Method To Use.

Principally, this is dependent on the constraints imposed on the actual task, which may be one of the following:

Speed. If you plan on using the method in a realtime application that has many parallel audio tracks or needs many pitch shifted voices, TDHS is probably the best option unless you have a STFT representation of the signal already at hand. Using different optimization techniques, the performance of this approach can be fine tuned to run on any of today's computer in realtime.

Material. If you have a prior knowledge about the signal the algorithm is supposed to work well with, you can further choose and optimize your algorithm accordingly (see below).

Stringed 2 8 – Shift Pitch And Manipulate Tempo Music

Quality. If the ultimate goal of your application is to provide the highest possible quality without performance restrictions, you should decide with the following two important factors in mind:

1) TDHS gives better results for small timebase and pitch changes, but will not work well with most polyphonic material.

2) Phase Vocoder gives smoother results for larger changes and will also work well with polyphonic material but introduces signal smearing with impulsive signals if this is not being dealt with. Even though some methods might indicate that the CPU power can be reduced by preventing the phasiness, ultimately reducing it can cost significantly more CPU cycles than the 'regular' phase vocoder.

3.2 Pitch Shifting Considerations

If your goal is to alter the pitch, not the timebase, bear in mind that when upscaling the pitch, echoes and the repetituous behaviour of TDHS are less obvious since the pitch change moves adjacent peaks (echoes) closer to each other in time, thus masking them to the ear. The (pre)smearing behaviour of the Phase Vocoder will be more disturbing in this case, since it occurs before the transient sounds and will easily be recognized by the listener.

3.3 Audio Examples:

Example 1: Original Sound (WAVE, 106k) Phase Vocoder 200% Time Stretch (WAVE, 209k) TDHS 200% Time Stretch (WAVE, 209k) MCFE 200% Time Stretch (WAVE, 209k)
block size: 2048 samples, STFT size: 8192 samples, frame overlap: 1024 samples block size: 2048 samples, frame overlap: 1536 samples block size: 1806 samples, frame overlap: 903 samples
Example 2: Original Sound 2 (WAVE, 230k) Phase Vocoder 200% Time Stretch (2) (WAVE, 432k) TDHS 200% Time Stretch (2) (WAVE, 451k) MCFE 200% Time Stretch (2) (WAVE, 451k)
block size: 2048 samples, STFT size: 8192 samples, frame overlap: 1024 samples block size: 2048 samples, frame overlap: 1536 samples block size: 1806 samples, frame overlap: 903 samples

Impulse Response Diagrams (achieved using the same settings as for the above audio examples, click to view in detail):

Original Phase Vocoder TDHS MCFE

4. Timbre and Formants
Since timbre (formant) manipulation is actually a pitch shifting related topic, it will also be discussed here. Formants are prominent frequency regions produced by the resonances in the instrument's body that very much determine the timbre of a sound. For human voice, they come from the resonances and cancellations of the vocal tract, contributing to the specific characteristics of a speaker's and singer's voice. If the pitch of a recording is shifted, formants will be moved thus producing the well known ‘Mickey-Mouse' effect audible when shifting the pitch. This is usually an unwanted side effect since the formants of a human singing at a higher pitch do not change their position. To compensate for this, there exist formant correction algorithms that restore the position of the formant frequencies after or during the pitch shifting process. They also allow changing the gender of a singer by scaling formants without changing pitch. For each of the above pitch shifting methods there exists a corresponding method for changing the formants to compensate for the side effects of the transposition.

4.1 Phase Vocoder and Formants.

Stringed 2 8 – Shift Pitch And Manipulate Tempo Music Video

Formant manipulation in the STFT representation can be done by first normalizing the spectral amplitude envelope and then multiplying it by a non-pitch shifted copy of it. This removes the new formant information generated through the pitch shifting and superimposes the original formant information thus yielding a sound similar to the original voice. This is an amplitude-only operation in the frequency domain and therefore does not involve great additional computational complexity. However, the quality may not be optimal in all cases due to STFT resolution issues.

4.2 Time Domain Harmonic Scaling and Formants.

Changing the formants in the time domain is simple, however, efficient implementation is tricky. TDHS in essence can be implemented and regarded as a granular synthesis using grains of one cycle of the fundamental in length being output at the destination new fundamental frequency rate. Simply put: if each grain is 1 cycle in length and since [cycles/sec] is the definition of fundamental pitch in this case, the output rate of these grains determines the new pitch of the sample. In order to not lengthen the sample, some grains have to be discarded in the process. Since no transposition takes place, the formants will not move. On the other hand, applying a sample rate change to the individual grains results in a change of formants without affecting the pitch. Thus, pitch and formants can be independently moved. The obvious disadvantage of the process is its dependency on the fundamental frequency of the signal, making it unsuited for the application to polyphonic material. See also: ‘A Detailed Analysis of a Time-Domain Formant-Corrected Pitch-Shifting Algorithm', by Robert Bristow-Johnson, Journal of the Audio Engineering Society, May 1995. This paper discusses an algorithm previously proposed by Keith Lent in the Computer Music Journal.

Table 6: Pointers, Formant Manipulation
The DSP Dimension Formant Correction page.
An LPC Approach: ‘Voice Gender Transformation with a Modified Vocoder' (May 1996), Yoon Kim at CCRMA

Stringed 2 8 – Shift Pitch And Manipulate Tempo Music Examples

The following newsgroups can be acessed for more information and help on the time compression/expansion and pitch shifting topic.

Table 7: News Groups

Stringed 2 8 – Shift Pitch And Manipulate Tempo Music Youtube

If you're seeking general information on DSP, browse to th DSPguru homepage. Sqlpro studio 1 0 177 – powerful database manager.





broken image