专利摘要:
apparatus and method for modifying an audio signal using harmonic blocking. an apparatus for modifying an audio signal comprises a filterbank processor, a fundamental determiner, an overtone determiner, a signal processor and a combiner. the filterbank processor generates a plurality of passband signals based on an audio signal and the fundamental determiner selects a passband signal from the plurality of passband signals to obtain a fundamental passband signal. . further, the overtone determiner identifies a passband signal from the plurality of passband signals that meets an overtone criterion with respect to the selected fundamental passband signal to obtain an overtone passband signal associated with the selected fundamental pass band signal. the signal processor modifies the selected fundamental passband signal based on a predefined modification target. additionally, the signal processor modifies an identified overtone passband signal associated with the selected fundamental passband signal depending on the modification of the selected fundamental passband signal.
公开号:BR112012021540B1
申请号:R112012021540-0
申请日:2011-02-25
公开日:2021-07-27
发明作者:Sascha Disch
申请人:Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.;
IPC主号:
专利说明:

DESCRIPTION
Embodiments according to the invention relate to audio processing and particularly to an apparatus and a method for modifying an audio signal.
There is an increasing demand for digital signal processing techniques that address the need for extreme signal manipulations in order to adjust pre-recorded audio signals, for example, obtained from a database, in a new musical context. In order to do this, high-level semantic signal properties such as intensity, musical key and scale mode need to be adapted. All of these manipulations have in common that they aim to substantially alter the musical properties of the original audio material, while preserving the best possible subjective sound quality. In other words, these edits strongly alter the musical content of audio material, but independently, they are necessary to preserve the naturalness of the processed audio sample and thus maintain credibility. This ideally requires signal processing methods that are widely applicable to different classes of signals, including polyphonic mixed music content.
Nowadays, many concepts for modifying audio signals are known. Some of these concepts are based on vocoders.
For example, in "S. Disch and B. Edler, "An amplitude- and frequency modulation vocoder for audio signal processing," Proc, of the Int. Conf on Digital Audio Effects' (DAFx), 2008.", "S. Disch and B. Edler, "Multiband perceptual C modulation analysis, processing and Synthesis of audio signals," Proc, of the IEEE-ICASSP, 2009." or "S. Disch and B. Edler, "An iterative segmentation algorithm for audio signal spectra depending on estimated local centers of gravity," 12th
International Conference on Digital Audio Effects (DAFx-09), 2009.", the modulation vocoder concept (MODVOC) was introduced and its overall ability to perform significant selective transposition into polyphonic music content was highlighted.
These are applications of possible interpretations aimed at changing the key mode of prerecorded PCM music samples (see, for example, "S. Disch and B. Edler, "Multiband perceptual modulation analysis, processing and Synthesis of audio signals," Proc, of the IEEE-ICASSP, 2009."). Also, a first commercially available software that can handle this polyphonic manipulation task (Melodyne editor by Celemony) is available. The software implements a technology that is branded and marketed under the term direct note access (DNA) . A patent application (EP2099024, P. Neubãcker, "Method for acoustic object-oriented analysis and note object-oriented processing of polyphonic sound recordings," September 2009.) was recently published, which presumably covers and thus reveals functionality essential DNA. Regardless of the method used to modify an audio signal, you want an audio signal with high perceptual quality.
It is the aim of the present invention to provide an improved concept for modifying an audio signal, which allows to increase the perceptual quality of the modified audio signal.
This object is solved by an apparatus according to claim 1, a method according to claim 14 or a computer program according to claim 15.
An embodiment of the invention provides an apparatus for modifying an audio signal comprising a filterbank processor, a fundamental determiner, an overtone determiner, a signal processor and a combiner. The filterbank processor is configured to generate a plurality of passband signals based on an audio signal. Further, the fundamental determiner is configured to select a passband signal from the plurality of passband signals to obtain a fundamental passband signal. The overtone determiner is configured to identify a passband signal from the plurality of passband signals that meets an overtone criterion with respect to the selected fundamental passband signal to obtain an associated overtone passband signal. to the selected fundamental passband signal. In addition, the signal processor is configured to modify the selected fundamental passband signal based on a predefined modification target. Additionally, the signal processor is configured to modify an identified overtone passband signal associated with the selected fundamental passband signal depending on the modification of the selected fundamental passband signal. Furthermore, the combiner is configured to combine the plurality of passband signals to obtain a modified audio signal.
By identifying fundamental frequency overtones and modifying the overtones in the same way as the corresponding fundamentals, a different modification of the fundamentals and their overtones can be avoided, so that the timbre of a modified audio signal can be preserved more precisely compared to the signal. of original audio. In this way, the perceptual quality of the modified audio signal can be significantly improved. For example, if a selective intensity transposition is desired (for example, changing the key mode from C major to C minor of a given musical signal), the modification of an identified overtone passband signal is correlated with the modification of the signal. of fundamental passing lane. In comparison, known methods modify the frequency region of the passband signal that represents overtones differently from the fundamental passband signal. In other words, an identified overtone passband signal is locked to the fundamental passband signal using the described concept.
In some embodiments of the invention, an overtone passband signal can be identified by comparing frequencies of the fundamental passband signal and passband signals from the plurality of passband signals by comparing an energy content of the signal. of fundamental passband signal and a passband signal from the plurality of passband signals and/or by evaluating a correlation of a temporal envelope of the fundamental passband signal and temporal envelope of a passband signal of the plurality of crossing lane signs. In this way, one or more overtone * criteria can be defined to minimize the identification of erroneous overtones.
Some embodiments in accordance with the invention relate to an iterative determination of fundamental passband signals and identification of overtone passband signals from the plurality of passband signals. Already selected fundamental passband signals and already identified overtone passband signals may be removed from the search space or, in other words, may not be considered for determining an additional fundamental passband signal or an additional overtone passband signal. In this way, each passband signal of the plurality of passband signals can be selected as a fundamental passband signal (and therefore can be modified independently of the other fundamental passband signals) or a fundamental passband signal. overtone pass band (and therefore can be modified depending on the associated selected fundamental pass band signal).
Another embodiment of the invention provides an apparatus for modifying an audio signal comprising a case shaper, a filterbank processor, a signal processor, a combiner and a case shaper. The wrap shape determiner is configured to determine wrap shape coefficients based on a frequency domain audio signal that represents a time domain input audio signal. Further, the filterbank processor is configured to generate a plurality of ‘passband signals in a subband domain based on a frequency domain audio signal. The signal processor is configured to modify a subband domain passband signal from the plurality of subband domain passband signals with predefined modification target. Further, the combiner is configured to combine at least a subset of the plurality of subband domain passband signals to obtain a time domain audio signal. Additionally, the envelope shaper is configured to form a envelope of the time domain audio signal based on the envelope shape coefficients to form a envelope of the plurality of subband domain passband signals containing the modified subband domain passband based on the envelope shape coefficients or to form a envelope of the plurality of subband domain passband signals based on the envelope shape coefficients before a passband signal subband domain is modified by the signal processor to obtain a modulated audio signal.
By determining sheath shape coefficients of the frequency domain audio signal before the frequency domain audio signal is separated into a plurality of subband domain passband signals, an information about the spectral coherence of the signal. Audio can be preserved and can be used to form the envelope of the time domain audio signal after modification of one or more subband domain passband signals. In this way, the spectral coherence of the modified audio signal can be more precisely preserved, although only some (or only one) subband domain passband signals are modified or the subband domain passband signals are modified differently, which can disrupt the spectral coherence of the audio signal. In this way, the perceptual quality of the modified audio signal can be significantly improved.
Some embodiments in accordance with the invention relate to a signal processor configured to modify a second subband domain passband signal from the plurality of subband domain passband signals based on a second target of default modification. The default modification target and the second default modification target are different. Although the passband signals are modified differently, the spectral coherence of the modified audio signal can be more precisely preserved due to the formation of the envelope after individual modification of the passband signals.
The embodiments according to the invention will be detailed subsequently with reference to the attached drawings, in which:
Figure 1 is a block diagram of an apparatus for modifying an audio signal;
Figure 2 is a block diagram of an apparatus for modifying an audio signal;
Figure 3 is a flowchart of a method for modifying an audio signal;
Figure 4 is a block diagram of a part of a modulating vocoder using harmonic blocking;
Figure 5 is a flowchart of a method for modifying an audio signal;
Figures 6a, 6b, 6c, 6d are a block diagram of an apparatus for modifying an audio signal;
Figure 7 is a block diagram of a filterbank processor;
Figure 8 is a block diagram of an envelope shaper;
Figure 9 is a schematic illustration of a modulation analysis with shell formation;
Figure 10 is a schematic illustration of a modulation synthesis with shell formation;
Figure 11 is a flowchart of a method for modifying an audio signal;
Figure 12 is a block diagram of an apparatus for modifying an audio signal;
Figure 13 is a schematic illustration of a modulation analysis;
Figure 14 is a schematic illustration of an implementation of a modulation analysis;
Figure 15 is a schematic illustration of a modulation synthesis;
Figure 16 is a schematic illustration of a selective transposition in a modulating vocoder component;
Figure 17 is a schematic illustration of a procedure to generate the test set for evaluating the subjective quality of modulation vocoder processing for the selective intensity transposition task;
Figure 18 is a diagram indicating absolute MUSHRA scores and 95% confidence intervals of the hearing test that addresses selective intensity transposition;
Figure 19 is a diagram indicating MUSHRA scores of difference from modulating vocoder status and 95% confidence intervals from the listening test that addresses selective intensity transposition; and
Figure 20 is a diagram indicating MUSHRA scores of difference with DNA condition and 95% confidence intervals of the hearing test that addresses selective intensity transposition.
In the following, the same reference numbers are partially used for objects and functional units having the same or similar functional properties and their description in relation to one figure should also apply to the other figures, in order to reduce redundancy in the description of the realizations.
A selective frequency band modification, also called selective intensity transposition, can be performed, for example, by a modulating vocoder or vocoder.
A decomposition of multiple-band modulation (see, for example, "S. Disch and B. Edler, "Multiband perceptual modulation analysis, processing and Synthesis of audio signals," Proc, of the IEEE-ICASSP, 2009.") dissects the audio signal into a signal adaptive (analytic) set of passband signals, each of which is further divided into a sinusoidal carrier and its amplitude modulation (AM) and frequency modulation (FM). The set of passband filters can be computed so that, on the one hand, the full spectrum is covered without problems and, on the other hand, the filters are aligned to the total centers of gravity (COGs), for example . Additionally, human auditory perception can be considered to choose the range of filters to match a perceptual scale, eg the ERB scale (see, for example, "BCJ Moore and BR Glasberg, "A revision of zwicker's loudness model "ActaAcustica, vol. 82, pp. 335-345, 1996.").
For example, the local COG corresponds to the average frequency that is perceived by a listener, due to spectral contributions in that frequency region. Furthermore, bands centered on local COG positions may correspond to phase-blocking regions based on the influence of classical phase vocoders (see, for example, "J. Laroche and M. Dolson, "Improved phase vocoder timescale modification of audio ," IEEE Transactions on Speech and Audio Processing, vol. 7, no. 3, pp. 323-332, 1999." or "C. Duxbury, M. Davies, and M. Sandler, "Improved timescaling of musical audio using phase locking at transients," in 112th AES Convention, 2002."). The passband signal envelope representation and the traditional influencing phase lock region preserves the temporal envelope of a passband signal: either intrinsically or, in the latter case, by ensuring local spectral phase coherence during synthesis . With respect to a sinusoidal carrier of a frequency corresponding to the estimated local COG, AM and FM are captured in the amplitude envelope and a heterodyne phase of the analytical passband signals, respectively. A dedicated synthesis method interprets the output signal from carrier frequencies *, AM and FM.
A block diagram of a possible 1300 implementation of signal decomposition into carrier signals and their associated modulation components is depicted in Figure 13. In the figure, the schematic signal flow for extracting one of the multi-band components (band-band signals). passage) is displayed. All other components are obtained in a similar way. First, a wideband x input signal is fed into a bandpass filter which has been designated a signal, adaptively producing an output signal. Then, the analytical signal is derived by the Hilbert transform, according to Equation (1).

The AM (Amplitude Modulation Signal) is given by the amplitude envelope of ®
while FM (frequency modulation signal) is obtained by the derivative phase of the analytic signal heterodinated by a fixed sinusoidal carrier with angular frequency ac. Carrier frequency is determined to be an estimate of the local COG. Therefore, FM can be interpreted as the variation of TF (instant frequency) in the carrier frequency fc.

The local COG estimation and adaptive signal design of the front filterbank is described, for example, in * a dedicated publication (see "S. Disch and B. Edler, "An iterative segmentation algorithm for audio signal spectra depending on estimated local centers of gravity," 12th International Conference on Digital Audio Effects (DAFx-09), 2009.").
Practically, in a different time system, component extraction can be performed together for all components, as illustrated in Figure 14. The processing scheme can support real-time computing. The processing of a given block of time is only dependent on parameters from previous blocks. As a result, no anticipation is required in order to keep the overall processing delay as low as possible. Processing is computed block by block, using, for example, 75% block overlap of analysis and application of a discrete Fourier transform (DFT) on each block of windowed signal. The window can be a flat top window, according to Equation (4) . This ensures that centered N/2 samples that are passed to subsequent modulation synthesis using 50% overlap are not affected by the analysis window margins. A greater degree of overlap can be used for improved accuracy at the expense of increased computational complexity.

Given the spectral representation, then a set of signal adaptive spectral passband weighting functions that is aligned to the local COG positions is calculated. After applying the passband weighting to the spectrum, the signal is transferred to the time domain and the analytical signal can be derived by Hilbert transform. These two processing steps can be efficiently combined by computing a one-way IDFT on each passband signal. Given the discrete-time pass band signal, the IF estimation by equation (3) is implemented by phase differentiation, as defined in Equation (5), where * denotes the complex conjugate. This expression is conveniently used, as it avoids phase ambiguities and, therefore, the need for phase unwinding.

The signal is synthesized on an all-component additive basis. Successive blocks are mixed by overlap-addition (OLA) which is controlled by the binding mechanism. Component binding ensures a smooth transition between the edges of adjacent blocks, even if the components are substantially altered by modulation domain processing. The link does not only take into account the previous block, thus possibly allowing real-time processing. The link essentially performs a pair match of the components in the current block to their predecessors in the previous block. Additionally, the link aligns the absolute component phases of the current block with those of the previous block. For components that do not match between the time blocks, a fading or fading is applied, respectively.
For one component, the processing chain is shown in Figure 15. In detail, first, the FM signal is added to the fixed carrier frequency and the resulting signal is transmitted to an OLA stage, whose output is subsequently temporally integrated. A sinusoidal oscillator is powered by the resulting phase signal. The AM signal is processed by a second OLA stage. Then, the oscillator output is modulated in its amplitude by the AM signal to obtain the component's additive contribution to the output signal. In a final step, the contributions of all components are added together to obtain the output signal y.
In other words, Figures 13 and 14 illustrate a modulation analyzer 1300. Modulation analyzer 1300 preferably comprises a bandpass filter 1320a, which provides a bandpass signal. This is inserted into a 1320b analytic signal converter. The output of block 1320b is useful for calculating AM information and FM information. To calculate AM information, the magnitude of the analytic signal is calculated by block 1320c. The output of the analytic signal block 1320b is inserted into a multiplier 1320d, which receives, at its other input, an oscillator signal from an oscillator 1320e, which is controlled by the actual carrier frequency fc 1310 of the passband 1320a. Then, the phase of the multiplier output is determined in block 1320f. The instantaneous phase is differentiated in block 1320g in order to finally obtain FM information. In addition, Figure 14 presents a preprocessor 1410 that generates a DFT spectrum of the audio signal.
Multi-band modulation decomposition dissects the audio signal into an adaptive set of passband (analytic) signals, each of which is further divided into a sinusoidal carrier and its amplitude modulation (AM) and modulation. frequency (FM) . The set of passband filters is computed so that, on the one hand, the full-band spectrum is covered without problems and, on the other hand, the filters are aligned to the local COGs each. Additionally, human auditory perception is considered when choosing the range of filters to match a perceptual scale, for example, the ERB scale (see "BCJ Moore and BR Glasberg, "A revision of Zwicker's loudness model," Acta Acústica, vol. 82, pp. 335-345, 1996"). The local COG corresponds to the average frequency that is perceived by the listener due to spectral contributions in that frequency region. Furthermore, the bands centered on the local COG positions correspond to phase blocking regions based on the influence of classical phase vocoders (see "J. Laroche and M. Dolson, "Improved phase vocoder timescale modification of audio", IEEE Transactions on Speech and Audio Processing, Vol. 7, No. 3, pp. 323-332, 1999", "Ch. Duxbury, M. Davies, and M. Sandler, "Improved timescaling of musical audio using phase locking at transients," in 112th AES Convention, 2002", "A. Robel, "A new approach to transient processing in the phase vocoder," Proc. Of the Int. Conf, on Digital Audio Effects (DAFx), pp. 344-349, 2003", "A. Robel, "Transient detection and preservation in the phase vocoder", Int. Computer Music Conference (ICMC'03), pp. 247-250, 2003"). The passband signal envelope representation and the traditional influence phase-lock region preserve the temporal envelope of a passband signal: either intrinsically or, in the latter case, by ensuring local spectral phase coherence during the synthesis. With respect to a sinusoidal carrier of a frequency corresponding to the estimated local COG, both AM and FM are captured in the amplitude envelope and the heterodyne phase of the analytic passband signals, respectively. A dedicated synthesis method interprets the output signal from carrier frequencies, AM and FM.
A block diagram of signal decomposition into carrier signals and their associated modulation components is depicted in Figure 12. In the image, the schematic signal flow for extracting a component is presented. All other components are obtained in a similar way. Practically, extraction is performed together for all block-in-block components, using, for example, a block size of N = 214 at 48kHz sampling frequency and 75% analysis overlap - roughly corresponding to an interval of time of 340 ms and a transposition of 85 ms - by applying a discrete Fourier transform (DFT) in each block of windowed signal. The window can be a 'top flat' window, according to Equation (a). This can ensure that the centered N/2 samples that are transmitted to the subsequent modulation synthesis are not affected by the slopes of the analysis window. A greater degree of overlap can be utilized for improved accuracy at the expense of increased computational complexity.

Given the spectral representation, then, a set of signal adaptive spectral weighting functions (having passband characteristic) that is aligned to the local COG positions can be calculated (by the carrier frequency determiner 1330 in terms of a frequency estimate carrier or a frequency estimate of multiple carrier COG). After applying the passband weighting to the spectrum, the signal is transformed into time domain and the analytical signal is derived by the Hilbert transform. These two processing steps can be efficiently combined by calculating a one-way IDFT on each passband signal. Subsequently, each analytic signal is heterodyned by its estimated carrier frequency. Finally, the signal is further decomposed into its amplitude envelope and its instantaneous frequency (IF) tracking, obtained by computing the phase derivative, producing the desired AM and FM signal (see also "S. Disch and B. Edler , "An amplitude-and frequency modulation vocoder for audio signal processing," Proc, of the Int. Conf, on Digital Audio Effects (DAFx), 2008").
Appropriately, Figure 15 presents a block diagram of a modifying synthesizer 1500 of a parameterized representation of an audio signal. For example, an advantageous implementation is based on an overlap-add (OLA) operation in the modulation domain, that is, in the domain before generating the time domain passband signal. The input signal, which may be a bit stream, but which may also be a direct connection to an analyzer or modifier, is separated into the AM 1502 component, the FM 1504 component, and the 1504 carrier frequency component. AM preferably comprises a stacker-adder 1510 and additionally a component link controller 1520 which preferably comprises not only block 1510, but also block 1530 which is a stacker-adder within the FM synthesizer. The FM synthesizer further comprises a frequency superpositioner-adder 1530, an instantaneous frequency integrator 1532, a phase combiner 1534 which again can be implemented as a regular adder and a phase switcher 1536 which is controllable by the link controller of component 1520 in order to regenerate a constant phase from block to block so that the phase of a signal from a previous block is continuous with the phase of a current block. Therefore, it can be said that the addition of phase in elements 1534, 1536 corresponds to a regeneration of a constant that was lost during differentiation in block 1520g in Figure 13 on the analyzer side. From an information loss perspective in the perceptual domain, it should be noted that this is only the loss of information, that is, the loss of a constant part by the 1320g differencing device in Figure 13. This loss can be compensated for by adding a constant phase determined by component connector 1520.
Overlay-add (OLA) is applied to the parameter domain rather than to the readily synthesized signal, in order to avoid knocking effects between adjacent time blocks. The OLA is controlled by a component binding mechanism, which, driven by spectral proximity (measured on an ERB scale), performs a pair match of components in the current block to their predecessors in the previous block. Additionally, the link aligns the absolute component phases of the current block to those of the previous block.
In detail, first the FM signal is added to the carrier frequency and the result is transmitted to the OLA stage, whose output is subsequently integrated. A 1540 sinusoidal oscillator is powered by the resulting phase signal. The AM signal is processed by a second OLA stage. Finally, the output of the oscillator is modulated 1550 in its amplitude by the resulting AM signal to obtain the component's additive contribution to the 1560 output signal.
It must be emphasized that proper spectral segmentation of the signal within the modulation analysis is of paramount importance for a convincing result of any further modulation parameter processing. So here, an example for a suitable segmentation algorithm is described.
Appropriately, Figure 16 presents an example 1600 for an application for polyphonic key mode changes. The figure shows a selective transposition in the modulating vocoder components. Carrier frequencies are quantified to MIDI notes which are mapped to the appropriate corresponding MIDI notes. Preservation of relative FM modulation by multiplying the mapped components by the original and modified carrier frequency ratio.
Transposing an audio signal while maintaining the original playback speed is a challenging task. Using the proposed system, this is achieved directly by multiplying all conveyor components with a constant factor. Since the temporal structure of the input signal is only captured by AM signals, it is not affected by the lengthening of the carrier's spectral spacing.
An even more demanding effect can be achieved by selective processing. The key mode of a music part can be changed, for example, from minor to major or vice versa. Therefore, only a subset of carriers corresponding to certain predefined frequency ranges are mapped to suitable new values. To achieve this, the carrier frequencies are quantified 1670 to MIDI pitches which are subsequently mapped 1672 to new appropriate MIDI pitches (using an a-priori knowledge of the mode and key of the musical item to be processed).
Then, the mapped MIDI notes are converted back to 1574 in order to obtain the modified carrier frequencies that are used for synthesis. A dedicated MIDI note start detection/compensation is not required, as the temporal characteristics are predominantly represented by unmodified AM and therefore preserved. Arbitrary mapping tables can be defined, allowing conversion to and from other minor aromas (eg minor harmonic).
One application in the field of audio effects is the global transposition of an audio signal. The processing required for this audio effect is a simple multiplication of the carriers with a constant transpose factor. Also, by multiplying the FM by the same factor, it is ensured that, for each component, the relative FM modulation depth is preserved. Since the temporal structure of the input signal is only captured by AM signals, it is not affected by processing. Global transposition changes the original key of a musical signal to a target key (eg, from C major to G major), while preserving the original tempo.
However, due to the adaptive nature of the signal of the proposed modulation analysis, the modulation vocoder has the potential to go beyond this task. Now, even transposing selected components of polyphonic music becomes feasible, enabling applications that, for example, change the key mode (eg, from C major to C minor) of a given musical signal (see, for example, "S Disch and B. Edler, "Multiband perceptual modulation analysis, processing and Synthesis of audio signals," Proc, of the IEEE-ICASSP, 2009."). This is possible due to the fact that each component transporter strictly corresponds to the intensity perceived in its spectral region. If only carriers that relate to certain original intensities are mapped to new target values, the overall musical characteristic which is determined by the key mode is manipulated.
The processing required in MODVOC components is depicted in Figure 16, as mentioned above. Within the MODVOC decomposition domain, carrier frequencies are quantized to MIDI notes, which are subsequently mapped to appropriate corresponding MIDI notes. For meaningful re-assignment of midi intensities and note names, a-priori knowledge of the mode and key of the original musical item may be necessary. The AM of all components is not always practiced as it does not contain intensity information.
Specifically, the f component carrier frequencies, which represent component strength, are converted to MIDI m depth values according to Equation 6, where fstd denotes the default strength that corresponds to MIDI 69 depth, the note AO .

Subsequently, MIDI velocities are quantified to MIDI notes n(f ) and, in addition, the velocity offset o(f) of each note is determined. By using a MIDI note mapping table, which is dependent on the key, the original mode and the target mode, these MIDI notes are transformed into suitable n' target values. In the Table below, an exemplary mapping is given for natural major to minor C key. The table presents a MIDI note mapping table for a natural C major to C minor scale mode transformation. The mapping applies to notes of all octaves.


Finally, the mapped MIDI notes, including their pitch compensations, are converted back to frequency f' in order to obtain the modified carrier frequencies that are used for synthesis (Equation 7). Additionally, in order to preserve the relative FM modulation depth, the FM of a mapped component is multiplied by the individual intensity transposition factor which is obtained as the original and modified carrier frequency ratio. A dedicated MIDI note start detection/compensation may not be necessary as the temporal characteristics are predominantly represented by unmodified AM and therefore preserved.
The modulation vocoder described is a possibility to modify different frequency variations (passband signals) of audio signals in a different way, which has been referred to as selective intensity transposition. The inventive concept allows for the enhancement of the perceptual quality of these modified audio signals. Although some embodiments of the inventive concept are described in connection with a vocoder or a modulating vocoder, it can also generally be used to improve the perceptual quality of modified audio signals irrespective of the use of a vocoder.
Figure 1 presents a block diagram of an apparatus 100 for modifying an audio signal 102, in accordance with an embodiment of the invention. Apparatus 100 comprises a filterbank processor 110, a fundamental determiner 120, an overtone determiner 130, a signal processor 140 and a combiner 150. The filterbank processor 110 is connected to the fundamental determiner 120, the overtone 130 and the signal processor 140, as well as the fundamental determiner 120 is connected to the overtone determiner 130 and the signal processor 140. In addition, the overtone determiner 130 is connected to the signal processor 140 and the signal processor 140 is connected to combiner 150. Filterbank processor 110 generates a plurality of bandpass signals 112 based on an audio signal 102. Further, the fundamental determiner selects a bandpass signal 112 from the plurality of band signals to obtain a fundamental passband signal 122. The overtone determiner identifies a passband signal 112 of the plurality of passband signals. pass that meets an overtone criterion with respect to the selected fundamental passband signal 122 to obtain an overtone passband signal 132 associated with the selected fundamental passband signal 122. In addition, signal processor 140 modifies the 122 selected fundamental passband signal based on a predefined modification target. Additionally, signal processor 140 modifies an identified overtone passband signal 132 associated with selected fundamental passband signal 122 depending on the modification of selected fundamental passband signal 122. Combiner 150 combines the plurality of signals of passband which contains the modified selected fundamental passband signal and the modified identified overtone passband signal to obtain a modified audio signal 152.
By modifying the fundamental passband signal 122 and the identified overtone passband signal 132 associated with the fundamental passband signal 122 in the same way, a common behavior of these harmonies can be preserved, although other passband signals of the plurality of passband signals can be modified in different ways. In this way, the timbre of the original 102 audio signal can be maintained more precisely so that the perceptual quality of the modified audio signal can be significantly improved. For example, most instruments excite overtones that consist of a fundamental frequency part and its harmony. If the fundamental frequency part is to be modified, then a correlated modification of the harmony, according to the described concept, can produce significantly better perceptual quality of the modified audio signal. Also, the audio signal can be modified in real time, since a-priori information about the entire audio signal (eg, the entire polyphonic music theme) may not be needed.
Audio signal 102 may be, for example, a time domain input audio signal or a frequency domain audio signal representing a time domain input audio signal.
The fundamental determiner 120 may provide the selected fundamental passband signal 122 to signal processor 140 for modification or may provide a trigger signal 122 (e.g., a ze[0.../-l] index of the bandpass signal. selected fundamental pass, where I is the number of passband signals from the plurality of passband signals) for triggering signal processor 140 to modify the passband signal selected from the plurality of passband signals, according to the predefined modification target. Accordingly, also, overtone determiner 130 may provide the identified overtone passband signal 132 for modification to signal processor 140 or may provide a trigger signal 132 (e.g., an index indicating the passband signal of the the plurality of passband signals which is identified as overtone passband signal) to trigger signal processor 140 to modify the identified passband signal from the plurality of passband signals. $
The overtone criterion can comprise one or more norms to identify an overtone of the fundamental. There may be one or more overtone criteria to be met to identify a passband signal from the plurality of passband signals as an overtone of the selected fundamental passband signal 122.
The predefined modification target may be different for passband signals comprising different frequency range and may depend on the desired modification of the audio signal 102. For example, the original key of an audio signal must be changed to a target key. An exemplary mapping was given for the natural major to minor C key from the table above. For example, if a frequency variation of a passband signal from the plurality of passband signals corresponds to an original note C, the target note would be C as well, so that that passband signal is not modified ( except to be identified as the overtone passband signal of an associated fundamental passband signal, which is modified). In that case, the modification target must keep that passband signal unmodified. On the other hand, a passband signal of the plurality of passband signals comprising a frequency variation that correlates to an original note A can be modified so that the modified passband signal may contain a variation of frequency that correlates to target note Ab (except the case where the passband signal is identified as an overtone passband signal of a fundamental passband signal to be modified according to another modification target) . Further, the identified overtone passband signals (bandpass signals comprising a frequency variation correlated to an overtone of the original note A) can be modified so that the modified overtone passband signal comprises a variation of frequency correlated to an overtone of the target note Ab.
All passband signals 112 of the plurality of passband signals may comprise a carrier frequency. The carrier frequency can be a frequency characteristic of the frequency variation represented or contained by a passband signal, such as, for example, an average frequency of the frequency variation, an upper limit frequency of the frequency variation, a threshold frequency of the frequency range or a center of gravity of the frequency range of the passband signal. The carrier frequency of a passband signal may be different from the carrier frequency of each of the other passband signals. These carrier frequencies can be used by the overtone determiner 130 to identify overtone passband signals. For example, the overtone determiner 130 can compare the carrier frequency of a passband signal 112 of the plurality of passband signals with the carrier frequency of the selected fundamental passband signal 122. Since an overtone can be approximately a fundamental frequency multiple, an overtone criterion can be met if the carrier frequency of a passband signal 112 is a multiple of the carrier frequency of the selected fundamental passband signal 122 (with a predefined carrier frequency tolerance , for example, 100 Hz, 50 Hz, 20 Hz or less). In other words, an overtone criterion may be, for example, that the carrier frequency of a passband signal 112 is a multiple of the carrier frequency of the selected fundamental passband signal 122 with a predefined carrier frequency tolerance.
Additionally or alternatively, the overtone determiner 130 may compare an energy content of the passband signal 112 of the plurality of passband signals with an energy content of the selected fundamental passband signal 122. In this example, a Overtone criterion can be met if a proportion of the energy content of the passband signal 112 and the energy content of the selected fundamental passband signal 122 can be within a predefined energy tolerance range. This overtone criterion takes into account that harmony generally has lower energy than fundamentals. The preset power tolerance range can be, for example, 0.3 to 0.9, 0.5 to 0.8, 0.6 to 0.7 or other range. This overtone criterion based on energy content can be combined with the overtone criterion based on carrier frequency mentioned above.
Additionally or alternatively, the overtone determiner 130 can calculate the correlation value indicating a correlation of a temporal envelope of the passband signal 112 of the plurality of passband signals with a temporal envelope of the selected fundamental passband signal. 122. In this case, an overtone criterion can be met if the correlation value is greater than a predefined correlation threshold. This overtone criterion takes into account the fact that a foundation and its harmony share a very similar temporal envelope. The default correlation threshold can be, for example, 0.2, 0.3, 0.4 or more. The correlation-based overtone criterion described can be combined with the carrier frequency-based overtone criterion and/or the energy content-based overtone criterion mentioned above.
The fundamental determiner 120 can select an additional passband signal 112 from the plurality of passband signals without considering all the already selected fundamental passband signals 122 and all the already identified overtone passband signals 132. in other words, the fundamental determiner 120 can select iterative fundamental passband signals 122 from a set of passband signals that contain as-yet-unselected passband signals, fundamental passband signals, and passband signals. of overtones already identified 132. This can be done until all of the passband signals of the plurality of passband signals can be selected as a fundamental passband signal or identified as an overtone of a passband signal. fundamental. Accordingly, the overtone determiner 130 can identify a passband signal 112 of the plurality of passband signals that meets an overtone criterion with respect to the additional selected fundamental passband signal without considering all of the passband signals. already identified overtone pass and without considering all already selected fundamental pass band signals 122.
In addition, signal processor 140 may modify the additional selected fundamental passband signal 122 based on an additional predefined modification target independent of all other selected fundamental passband signals. In other words, for each fundamental passband signal or for some of the selected fundamental passband signals, different modification targets can be defined. For example, modification targets can be defined by a table mentioned above, indicating a transition from one keynote to another. Since the fundamental passband signals can be modified independently of each other, for example also selectively, only the fundamentals and harmony of a specific instrument can be modified to change the key mode or loudness of that instrument.
The passband signal 112 can be selected by the fundamental determiner 120 based on an energy criterion. For example, the passband signal with the highest or one of the highest energy contents (eg greater than 70% or more of the other passband signals) can be selected. In this example, an already selected fundamental pass band signal can be excluded from further selection by setting an energy content parameter that indicates the energy content of the selected fundamental pass band signal equal to zero. For the selection of the passband signal 112, the energy content of each passband signal (indicated, for example, by an energy content parameter determined by the fundamental determiner) can be weighted (for example, by a weighting of a) to emphasize the selection of perceptually important passband signals.
Signal processor 140 can modify the selected fundamental passband signals 132 and the associated overtone passband signals 132 in a variety of ways. For example, signal processor 140 can modify the selected fundamental passband signal 122 by multiplying a carrier frequency of the selected fundamental passband signal 122 with a transpose factor (e.g., depending on key mode change) or by adding a transpose frequency to the carrier frequency of the selected fundamental passband signal 122. In addition, signal modifier 140 can modify the identified overtone passband signal 132 by multiplying a carrier frequency from the identified passband signal 132 as a transpose factor (eg with a tolerance of 20%, 10%, 5%, 1% or below) or by adding a multiple of the transposition frequency (eg with a tolerance of 20%, 10%, 5%, 1% or below) at the carrier frequency of the identified overtone passband signal 132. In other words, for example, a key mode change can be performed by multiplying the ground and associated harmony by the same transpose factor or by adding a transpose frequency to the ground and a multiple of the transpose frequency to the overtone. In this way, the identified overtone passband signal 132 is modified depending on (similarly) the selected fundamental passband signal 122.
Figure 2 presents a block diagram of an apparatus 200 for modifying an audio signal 102, in accordance with an embodiment of the invention. The apparatus 200 is similar to the apparatus shown in Figure 1, but further comprises a carrier frequency determiner 260 and the filterbank processor 110 comprises a filterbank 212 and a signal converter 214. The filterbank 212 is connected to the signal converter 214 and signal converter 214 are connected to signal processor 140. Optional carrier frequency determiner 260 is connected to filterbank 212 of filterbank processor 110 and signal processor 140.
The filterbank 212 can generate passband signals based on the audio signal 102 and the signal converter 214 can convert the generated passband signals into a subband domain to obtain the plurality of passband signals provided. to the fundamental determiner 120, the overtone determiner 130 and the signal processor 140. The signal converter 214 can be embodied, for example, as a one-way inverse discrete Fourier transform unit, so that each passband signal 112 of the plurality of passband signals can represent an analytic signal. In that sub-band domain, the fundamental determiner 120 may select one such sub-band domain band pass signals from the plurality of band pass signals to obtain the fundamental pass band signal 122. In addition, the overtone determiner may identify an such sub-band domain passband signals from the plurality of passband signals.
Additionally, carrier frequency determiner 260 can determine a plurality of carrier frequencies based on the audio signal 102, and filterbank 212 of filterbank processor 110 can generate the passband signals such that each signal of passband comprises a frequency range containing a different carrier frequency 262 of the plurality of carrier frequencies to obtain a passband signal associated with each carrier frequency 262 of the plurality of carrier frequencies. In other words, the band width and average frequencies of the pass band signals generated by filterbank 212 can be controlled by the carrier frequency determiner 260. This can be done in a number of ways, for example, by calculating the center of gravities (COG) of the 102 audio signal, as described above.
As already mentioned, the passband signals 112 can be modified in various ways. For example, signal processor 140 may generate an amplitude modulation (AM) signal and a frequency modulation (FM) signal for each passband signal 112 of the plurality of passband signals. Since each passband signal represents an analytic signal in the subband domain, the signal processor 140 can generate the amplitude modulating signal and the frequency modulating signal as mentioned above in connection with the modulating vocoder, for example. Further, the signal processor 140 can modify the amplitude modulating signal or the frequency modulating signal of the selected fundamental passband signal 122 based on the predefined modification target and can modify the amplitude modulating signal or the signal. of the frequency modulation of the identified overtone passband signal 132 associated with the selected fundamental passband signal 122 depending on the modification of the selected fundamental passband signal 122.
The filterbank processor 110, the fundamental determiner 120, the overtone determiner 130, the signal processor 140, the combiner 150 and/or the carrier frequency determiner 260 can be, for example, individual hardware units or part of a digital signal processor, a computer or microcontroller, as well as a computer program or software product configured to run on a digital signal processor, computer or microcontroller.
Some embodiments in accordance with the invention relate to a method 300 for modifying an audio signal in accordance with an embodiment of the invention. Method 300 may comprise generating 310 a plurality of passband signals based on an audio signal and selecting 320 a passband signal from the plurality of passband signals to obtain a passband signal. fundamental. Further, method 300 may comprise identifying 330 a passband signal from the plurality of passband signals that meets an overtone criterion with respect to the selected fundamental passband signal to obtain a passband signal. overtone associated with the selected fundamental passband signal. Further, the selected fundamental passband signal is modified 340 based on a predefined modification target and the identified overtone passband signal associated with the selected fundamental passband signal is modified 350 depending on the modification of the band signal. key pass key selected. Further, method 300 may comprise 360 combining the plurality of passband signals containing the selected modified fundamental passband signal and the identified modified overtone passband signal to obtain a modified audio signal.
Optionally, method 300 may comprise additional steps representing optional aspects of the inventive concept mentioned above and mentioned below.
Next, the concept described is illustrated in more detail by an example for an implementation using a modulation vocoder, although the proposed concept can also be used more generally for other implementations.
Most instruments excite harmonic sounds that consist of a part of the fundamental frequency and its harmony being approximately integer multiples of the fundamental frequency. Since musical intervals follow a logarithmic scale, each harmonic overtone resembles a different musical interval in relation to the fundamental (and its octaves). The table below lists the correspondence of harmonic numbers and musical intervals for the first seven harmonies. The table presents harmonic numbers and relative musical intervals in relation to the fundamental and its octaves.

Thus, in the task of selectively transposing polyphonic musical content, there is an inherent ambiguity regarding the musical function of a MODVOC component. If the component originates from a foundation, it has to be transposed according to the desired scale mapping, if it is dominated by a harmony to be assigned to a foundation, it has to be transposed along with that foundation in order to better preserve the original tone of the tone. From this, emerges the need for an assignment of each MODVOC (band-band signal) component in order to select the most suitable transposition factor.
To achieve this, the simple processing scheme introduced earlier has been extended by harmonic blocking functionality. Harmonic blocking examines all MODVOC components before transposing whether a component (band-pass signal) will be assigned to a ground or will be treated as an independent entity. This can be accomplished by an iterative algorithm. The flowchart of this algorithm is depicted in Figure 5. The algorithm evaluates 510 frequency ratios, energy ratios, and envelope cross-correlations of a t-test component (fundamental pass band signal) to all other components (signals of passband) indexed by i E [0...I - 1]t with I denoting the total number of components (number of passband signals of the plurality of passband signals). The succession of the test components (fundamental passband signal) during the iteration is determined by their energy-weighted A 520 so that the order of evaluation is in the decreasing energy sequence. The A-weighting (ANSI, "Ansi standard sl.4-1983," 1983.), (ANSI, "Ansi standard sl.42-2001," 2001.) is applied to model the perceptual prominent of each component in terms of their sound (see, for example, "H. Fletcher and WA Munson, "Loudness, its definition, measurement and • calculation," J. Acoust Soc Amer., vol. 5, pp. 82-108, 1933.").
A harmonic carrier frequency match, a 5 harmonic carrier frequency mismatch, a component energy, and/or a zero delay normalized amplitude envelope correlation can be examined for limiting.
Frequency matching and mismatching can be defined according to Equation 8 with ft 10 being the test component carrier frequency (carrier frequency of the selected fundamental passband signal) and fi being the component with the index i ( a lane pass signal from the plurality of pass lane signals). For frequency matching, all multiples greater than 1 are possible harmonies. A suitable threshold value (carrier frequency limit) for permissible frequency mismatch for possible harmony is, for example, 22 Hz.

It may be necessary for the a-weighted component energy ratio (Equation 9) of harmony versus ground to be less than a predefined threshold, reflecting the fact that for the vast majority of instruments harmony is less energy than ground. A suitable threshold value (power tolerance range), for example, is a ratio of 0.6.

The normalized zero-delay cross-correlation of the envt test component envelope and the component envj envelope as index i is defined by Equation 10. This measure exploits the fact that the ground and its harmony share a very similar temporal envelope within the span of the block M. A suitable threshold value (limit of correlation) was determined to be 0.4 by informal experiments.

After being examined, all i components that meet 570 all boundary conditions are marked 580 as harmonies to be locked against the test component and are subsequently removed from the search. Then, the test component is also excluded from further iterations by setting 542 its energy to zero. The algorithm is repeated until all components are assigned, which is indicated by the maximum component energy being zero.
Figure 4 presents the enhanced processing scheme of selective transposition by MODVOC that incorporates harmonic blocking. As opposed to Figure 16, only unblocked components enter the transposition stage, while blocked components are modified in a second stage by the same transposition factor that was applied to their assigned fundamentals.
In other words, Figure 5 presents a flowchart of the described harmonic blocking (method 500 for modifying an audio signal). Components that match the conditions of being harmonics of a test ground (selected fundamental pass band signal) are iteratively marked and removed from the search space. For this, each passband signal of the plurality of passband signals comprises a carrier frequency, an energy content, and a temporal envelope or the carrier frequency, the energy content and/or the temporal envelope (sheath parameters 510) are determined for each passband signal of the plurality of passband signals. Further, the energy content (energy content parameter) of each passband signal is weighted from 520. Then, a fundamental passband signal (test ground ft) comprising a maximum energy (content parameter of power) is selected 530. Since all already selected fundamental passband signals are set to zero and all identified overtone passband signals are excluded from the search space, the selected fundamental passband signal can understand a parameter energy content equal to zero, so that the iterative algorithms stop 540 at that point. Otherwise, frequency matching (or mismatch), energy content cross-correlation and/or temporal wrapping of the selected fundamental passband signal and the remaining passband signals of the plurality of passband signals. 560 are compared. If one, some or all of the conditions (overtone criteria) are met 570, the respective pass band signal is • identified 580 as overtone pass band signal and harmonic blocking data can be generated (e.g., storing an index of the identified band-pass signal 5 in an overtone list) as well as the identified overtone band-pass signal is removed from the search space. Harmonic blocking data can be saved 590 with reference to the associated selected fundamental pass band signal. After identifying all overtone 10 band pass signals of the selected fundamental pass band signal, the energy (the energy content parameter) of the selected fundamental pass band signal is set 592 to zero and the next band signal fundamental pass-through comprising the highest energy is selected 530.
The signal processor can use harmonic blocking data to modify the passband signals. One possible implementation is shown in Figure 4. In this implementation, for example, the signal processor comprises a 1600 MIDI mapper and an overtone 400 modifier. The 1600 MIDI mapper 20 can modify the carrier frequency of each passband signal selected fundamental, according to the individual modification target (which may also include that a fundamental passband signal is not modified). MIDI mapper 1600 can be implemented, for example, as shown and described in Figure 16. Overtone modifier 400 can comprise an overtone modification controller 410, an overtone multiplier 420, and an overtone modification provider 430. overtone modification controller 410 can be connected to overtone multiplier 420 and overtone modification provider 430 and overtone modification provider 420 can be connected to overtone modification provider 430. Overtone multiplier 420 can multiply carrier frequency f of an identified overtone passband signal with the same transpose factor (with the tolerance mentioned above) the associated fundamental passband signal is multiplied with and can provide the modified carrier frequency f' to the overtone modification provider 430 The overtone modification controller 410 can trigger the overtone modification provider 430 to provide the modified carrier frequency of the identified overtone passband signal, if overtone modifier 400 identifies the carrier frequency as a carrier frequency of an identified overtone passband signal (e.g., based on blocking data harmonic). Otherwise, the overtone modification provider 430 can provide the output of the MIDI mapper 1600. Furthermore, Figure 4 presents an implementation of the proposed concept in a vocoder, so that in addition to the carrier frequency of the passband signal , also the corresponding frequency modulation (FM) signal is modified by a multiplication with the proportion of the carrier frequency before the modification and the modified carrier frequency. Alternatively, for a frequency change or additionally for a frequency change, the loudness of the audio signal can be selectively changed from the passband signal. For this, the amplitude modulation (AM) signal of a passband signal can be modified.
In other words, Figure 4 presents an improved selective transposition in modulating vocoder components (band pass signals) using harmonic blocking (modifying the identified overtone pass band signals depending on the modification of the associated fundamental pass band signal ). Only unblocked carrier frequencies (which can then be the fundamental passband signals) are quantified to MIDI notes that are not mapped to the proper corresponding MIDI notes (according to the individual modification target). Blocked components (identified overtone pass band signals) can be transposed by multiplying by the original and modified carrier frequency ratio of the assigned ground (associated ground pass band).
Figure 6a presents a block diagram of an apparatus 600 for modifying an audio signal, in accordance with an embodiment of the invention. Apparatus 600 comprises a case shaper 610, a filter bank processor 620, a signal processor 630, a combiner 640 and a case shaper 650. The case shaper 610 is connected to the case shaper 650 , filterbank processor 620 is connected to signal processor 630, signal processor 630 is connected to combiner 640, and combiner 640 is connected to case shaper 650. housing 612 based on a frequency domain audio signal 602 representing a time domain input audio signal.
In addition, filterbank processor 620 generates a plurality of bandpass signals 622 in a subband domain based on frequency domain audio signal 602. Signal processor 630 modifies a domain passband signal 622 of the plurality of subband domain passband signals based on a predefined modification target. Further, combiner 640 combines at least a subset of the plurality of sub-band-domain band-pass signal (e.g., containing the modified sub-band-domain band-pass signal) to obtain a time domain audio signal 642 The envelope shaper 650 forms a envelope of the time domain audio signal 642 based on the envelope shape coefficients 612 to obtain a modulated audio signal 652.
Alternatively, case shaper 650 may be located between signal processor 630 and combiner 640 (signal processor 630 is connected to case shaper 650 and case shaper 650 is connected to combiner 640) and can form a envelope of the plurality of sub-band domain passband signals containing the modified subband domain passband signal based on envelope shape coefficients 612.
By extracting envelope shape coefficients 612 before the audio signal is the selectively processed passband signal and using the envelope shape coefficients 612 to form the envelope shape of the audio signal after modifying one or more passband signals pass, the spectral coherence of the differently modified passband signals can be preserved more precisely. Yet, especially for temporary signals, a quantization noise propagated over time can be formed by the 650 wrapper modeler as well. In this way, the perceptual quality of the modified audio signal can be significantly improved. Also, the audio signal can be modified on the fly, since a-priori information about the entire audio signal (eg, the entire polyphonic music theme) may not be needed.
Alternatively, still, the wrapper 650 may be located between the 630 signal processor and the 620 filterbank processor (the 620 filterbank processor is connected to the 650 wrapper shaper and the 650 wrapper is connected to signal processor 630) and can form a envelope of the plurality of subband domain passband signals based on envelope shape coefficients 612 before a subband domain passband signal is modified by the signal 630 to obtain a modulated audio signal 652.
By extracting envelope shape coefficients 612 before the audio signal is the selectively processed passband signal and using the envelope shape coefficients 612 to form the envelope of the plurality of passband signals 622 after the plurality of passband signals 622 is generated by filterbank processor 620 in the subband domain, an adaptive filterbank can be implemented, which can increase local coherence especially for temporary signals (see, for example, "J. Herre and JD Johnston, "A continuously signal-adaptive filter bank for high-quality perceptual audio coding," IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics, Mohonk, 1997.") . In that case, the modified signal (or the modified bandpass signal) is not formed, but the quality of the bandpass signals generated in terms of transient reproduction can be increased before the modification.
The frequency domain audio signal 602 may be provided, for example, with a preprocessor that generates the frequency domain audio signal 602 based on a time domain input audio signal (e.g. a discrete Fourier transform) or it can be provided with a storage unit. The envelope shape coefficients 612 determined by the envelope shape determiner 610 may be, for example, linear predication coefficients or other coefficients that parameterize the spectrum of frequency domain audio signal 602.
Signal processor 630 may modify one, some, or all of the sub-band domain passband signals 622 of the plurality of sub-band domain passband signals. The predefined modification target may be different, for example, for all or for some sub-band domain passband signals. For example, to change a key mode of the audio signal, the preset modification targets of the sub-band domain passband signals can be set as already mentioned in connection with the table above.
Frequency domain audio signal 602 may comprise spectral lines obtained, for example, by the Fourier transform. The difference between spectral lines of the frequency domain audio signal (which can also be assumed as passband signals) and a passband signal generated by filterbank processor 620 may be that of a spectral line of the frequency domain audio signal 602 represents a narrower band amplitude than a band amplitude represented by a subband domain passband signal 622 generated by filterbank processor 620. For example, the audio signal of frequency domain 602 indicates a frequency spectrum obtained by a discrete Fourier transform, which is divided into the plurality of passband signals by filterbank processor 620, wherein several passband signals (e.g. , 16, 20 or more) of the plurality of passband signals are significantly smaller than several spectral values or 15 spectral lines of the frequency spectrum (e.g. example 512 or more spectral values).
The casing shape determiner 610 can determine the casing shape coefficients based on a prediction about the frequency of the frequency domain audio signal 602, which can be performed, for example, as already mentioned by a coefficient determination. prediction tools.
Filterbank processor 620 may provide the plurality of passband signals, each passband signal 622 representing a specific frequency range of frequency domain audio signal 602. Alternatively, the bank processor of filters 620 may comprise a predictive filter 710, a signal subtractor 720, and a filterbank 730 for obtaining the plurality of passband signals 622 based on a residual audio signal 722, as shown in Figure 7. To that, the predictive filter 710 can generate a predictive audio signal 712 based on a frequency domain audio signal 602 and the envelope shape coefficients 612 (e.g., linear prediction filter). In addition, signal subtractor 720 can subtract predictive audio signal 712 from frequency domain audio signal 602 to obtain a residual audio signal 722. Such residual audio signal 722 can be used by filterbank 730 to generate lane pass signals to obtain the plurality of pass lane signals.
In addition, filterbank processor 620 may comprise an optional signal converter. Such a signal converter (e.g., discrete one-way inverse Fourier transformer) can convert the passband signals generated by filterbank 730 to the underband domain to obtain the plurality of passband signals 622. Alternatively, the signal converter can also be part of the 630 signal processor.
In some embodiments, according to the invention, a low frequency part of the input audio signal can be excluded from possible modification to avoid a generation of artifacts in the low frequency part of the modified audio signal. To do this, an apparatus 680 for modifying an audio signal may comprise a high-pass/low-pass filter, as shown, for example, in Figure 6b. The high pass/low pass filter 660 high pass filters the time domain input audio signal or the frequency domain audio signal representing the time domain input audio signal, so that the case shape 610 determines the case shape coefficients 612 based on the high pass frequency domain audio signal 602 and the filterbank processor 620 generates the plurality of pass band signals 622 in a subband domain with based on the high pass frequency domain audio signal 602. Further, the high pass/low pass filter 660 low pass filters the time domain input audio signal or the frequency domain audio signal representing the time domain input audio signal to obtain a low pass audio signal 662. Further, apparatus 680 comprises a full range signal provider 670 configured to combine the audio signal modulated 652 and low pass audio signal 662 to obtain a full range audio signal. In other words, the high pass/low pass filter 660 can separate the time domain input audio signal or the frequency domain audio signal by representing the time domain input audio signal into an audio signal. high pass and a low pass audio signal. The high pass audio signal or a frequency domain representation of the high pass audio signal can be provided to the case shaper 610 and to the filter bank processor 620. This depends on whether the high pass/low filter pass is implemented in the time domain followed by a signal preprocessor which generates the frequency domain audio signal based on the high pass audio signal or the high pass/low pass filter is implemented in the frequency domain which already received a frequency domain audio signal representing the time domain input audio signal.
The high pass/low pass filter 660 can filter the time domain input audio signal or the frequency domain audio signal representing the time domain input audio signal, so that the audio signal of low pass contains frequencies up to a predefined threshold frequency (eg 100 Hz or more). Consequently, the high pass audio signal may comprise frequencies below the predefined threshold frequency. In other words, frequencies higher than the preset threshold frequency can be attenuated by the high pass/low pass filter 660 to provide the low pass audio signal 662 and frequencies lower than the preset threshold frequency can be attenuated by the high filter 660 pass/low pass to provide the high pass signal.
Alternatively, case shaper 650 is located between signal processor 630 and combiner 640, as shown in Figure 6c. In that case, high pass/low pass filter 660 provides the low pass audio signal to combiner 640. Combiner 640 combines the plurality of subband domain passband signals containing the domain passband signal. modified underband and the low pass audio signal 662 to obtain a time domain audio signal 642. In that case, the envelope modeler 650 can determine a set of pass band envelope shape coefficients based on the shape coefficients. of envelope 612 (e.g., by coefficient converter 810) for each subband domain passband signal corresponding to the respective subband domain passband signal (e.g., corresponding to the frequency region contained by the respective subband domain signal. sub-band domain crossing lane) . Then, for example, each time sample of a subband domain passband signal can be multiplied with a passband envelope shape coefficient of the corresponding set of envelope shape coefficients. For example, in the vocoder implementation shown in Figure 15, the wrapper modeler 650 may be located between the multiplier 1550 and the combiner 1560.
Alternatively, still, the wrapper 650 may be located between the 630 signal processor and the 620 filterbank processor (the 620 filterbank processor is connected to the 650 wrapper shaper and the 650 wrapper is connected to signal processor 630) and can form a casing of the plurality of subband domain passband signals based on envelope shape coefficients 612 before a subband domain passband signal is modified by the signal 630 to obtain a modulated audio signal 652.
In some embodiments, in accordance with the invention, a low frequency portion of the input audio signal can be excluded from forming the envelope to prevent a generation of artifacts in the low frequency portion of the modified audio signal. For this purpose, an apparatus 680 for modifying an audio signal may comprise a high-pass/low-pass filter, as shown, for example, in Figure 6d. The high pass/low pass filter 660 high pass filters the time domain input audio signal or the frequency domain audio signal representing the time domain input audio signal. Further, high pass/low pass filter 660 low pass filters the time domain input audio signal or the frequency domain audio signal representing the time domain input audio signal to obtain a low pass audio 662. The casing shape determiner 610 determines the casing shape coefficients 612 based on the high pass frequency domain audio signal 602 without considering the low pass audio signal 622. The bank processor of filters 620 generates the plurality of bandpass signals 622 in a subband domain based on the high pass frequency domain audio signal 602 and the low pass audio signal 622. If a predictive filter is used, as, for example, shown in Figure 7, only the high-pass frequency domain audio signal 602 is provided to the predictive filter and the signal subtractor to generate a high-pass audio signal. residual. The low pass audio signal 622 can be provided directly to the filterbank to generate subband domain bandpass signals. Signal processor 630 may modify a subband domain passband signal corresponding to high pass frequency domain audio signal 602 or low pass audio signal 622. Alternatively, signal processor 630 may modifying a subband domain passband signal corresponding to high pass frequency domain audio signal 602 and a subband domain passband signal corresponding to low pass audio signal 622. Combiner 640 can combine only the subband domain passband signal corresponding to the high pass frequency domain audio signal 602, so that only the subband domain passband signals corresponding to the high frequency domain audio signal pass 602 (and not the sub-band domain pass band signals corresponding to the low pass audio signal 622) can be formed by the model casing ador 650.
Further, apparatus 680 comprises a full-band signal provider 670 configured to combine the modulated audio signal 652 and the sub-band domain passband signals corresponding to the low pass audio signal 662 to obtain a low pass audio signal. full track. To do this, signal processor 630 may provide sub-band domain bandpass signals corresponding to low pass audio signal 662 to full band signal provider 670.
Alternatively, the envelope shaper 650 is located between the signal processor 630 and the combiner 640. In that case, the signal processor 630 can provide the underband domain passband signals corresponding to the low pass audio signal. 662 to combiner 640. Combiner 640 combines the plurality of subband domain passband signals (the subband domain passband signals corresponding to the low pass audio signal 662 and the domain passband signals values corresponding to the high pass frequency domain audio signal 602) containing the modified sub-band domain band pass signal to obtain a time domain audio signal 642. In that case, the envelope shaper 650 can determine a set of range-passage shell shape coefficients based on the shell shape coefficients 612 (e.g., by coefficient converter 810) to each subband domain passband signal corresponding to the respective subband domain passband signal (e.g. corresponding to the frequency region contained by the respective subband domain passband signal) of the passband signals domains corresponding to the high-pass frequency domain audio signal 602. Then, for example, each time sample of a sub-band domain band-pass signal can be multiplied with a band envelope shape coefficient. of passage of the corresponding set of shell shape coefficients. For example, in the vocoder implementation shown in Figure 15, the wrapper modeler 650 may be located between the multiplier 1550 and the combiner 1560.
Alternatively, still, the wrapper 650 may be located between the 630 signal processor and the 620 filterbank processor (the 620 filterbank processor is connected to the 650 wrapper shaper and the 650 wrapper is connected to signal processor 630) and can form a casing of the sub-band domain passband signals corresponding to the high pass frequency domain audio signal 602 based on envelope shape coefficients 612 before a band signal The sub-band domain pass-through is modified by signal processor 630 to obtain a modulated audio signal 652.
In this way, a low-frequency part of the input audio signal can be exempted from wrapping. However, the low frequency part is routed for the remaining processing (eg modification of a subband domain passband signal). Also, a predictive filter (for example, as shown in Figure 7) can only be applied above the predefined threshold frequency. Alternatively, if the high pass/low pass separation is already performed on the analyzing side, the envelope high pass signal can be modified in the time domain by a reciprocal of the envelope shape coefficients.
For example, in applications for selective transposition, the displayed placement may provide equivalent results as a placement after processing, since AM may not be modified.
According to one aspect, the envelope shaper 650 can determine an energy ratio of an EFDAS energy content of the frequency domain audio signal 602 and an ERAS energy content of the residual audio signal 722. Based on that ratio of power, the wrapper 650 can stop wrapping the time domain audio signal 642, if the power ratio is less than a preset power limit PET (0.1, 0.2, 0.5, 0.8, 1, 2 or even more or less).

In other words, the shell formation obligation can be the signal turned on or off adaptively depending on the goodness of the prediction. The prediction goodness can be measured by the prediction gain which can be set to be the ratio of signal energy (frequency domain audio signal) and prediction error (residual audio signal). If the time domain audio signal envelope formation 642 is interrupted, the modulated audio signal 652 may be the same as the time domain audio signal 642 provided by combiner 640.
The wrapper modeler 650 can be implemented in a variety of ways. An example is shown in Figure 8. The envelope modeler 650 may comprise a coefficient converter 810 and a multiplier 820. The coefficient converter 810 may convert the envelope shape coefficients 612 to the time domain so that the coefficients Converted envelope shapes 812 can be multiplied with the time domain audio signal 642 to form the temporal envelope of the time domain audio signal and to obtain the modulated audio signal 652. This can be done by the multiplier 820. For example, a time block of time domain audio signal 642 may contain 512 (or more) time samples and coefficient converter 810 may provide 512 (or more) converted envelope shape coefficients 812 to multiply each sample of time with a converted envelope shape coefficient. 812.
As already mentioned, apparatus 600 can modify different sub-band domain bandpass signals differently. More generally, this means that signal processor 630 can modify a second or additional subband domain passband signal 622 of the plurality of subband domain passband signals based on a second predefined modification target or additional. The aforementioned or first predefined modification target and the second predefined or additional modification target may be different.
In some embodiments, the described concept can be used in connection with modulating vocoders or vocoders. In that case, signal processor 630 may generate an amplitude modulation (AM) signal and a frequency modulation (FM) signal for each of the subband domain passband signals 622 of the plurality of bandpass signals. Subband domain pass. In addition, signal processor 630 may modify the amplitude modulation signal or the frequency modulation signal of the subband domain passband signal to be modified based on the predefined modification target.
In addition, apparatus 600 may optionally comprise a carrier frequency determiner, as already described for apparatus 200 and shown in Figure 2. The carrier frequency determiner may determine a plurality of carrier frequencies based on the frequency domain audio signal 602 These determined carrier frequencies may be used by filterbank processor 620 or in the implementation shown in Figure 7 by filterbank 730 of filterbank processor 620 to generate subband domain passband signals such that each signal The sub-domain passband comprises a frequency range containing a carrier frequency different from the plurality of carrier frequencies to obtain a sub-domain band passband signal associated with each carrier frequency of the plurality of carrier frequencies. This can be done, for example, by determining the center of gravities of the frequency domain audio signal as mentioned above.
The case shaper 610, the filterbank processor 620, the signal processor 630, the combiner 640 and/or the case shaper 650 can be, for example, individual hardware units or part of a signal processor digital means a computer or microcontroller, as well as a computer program or software product configured to run on a digital signal processor, computer or microcontroller.
Some embodiments according to the invention concern an implementation of the described concept in a modulating vocoder. For this example, the concept is described in more detail below. The mentioned aspects can also be used in other implementations or applications.
It was stated earlier that MODVOC processing preserves spectral coherence in the passband area surrounding the carrier locations. However, global wide-band spectral coherence is not preserved. For quasi-fixed signals, this may only have an unimportant impact on the perceptual quality of the synthesized signal. If the signal contains prominent transients, such as drum beats or castanets, preserving overall coherence can significantly improve the quality of reproduction of these signals.
The preservation of global coherence can be improved by linear prediction in the spectral domain. Some approaches are used in audio coders, for example, by the temporal noise shaping tool (TNS) (see, for example, "J. Herre and JD Johnston, "Enhancing the performance of perceptual audio coders by using temporal noise shaping ( tns)," 101st AES convention, Los Angeles, No. Preprint 4384, 1996.") in MPEG 2/4 Advanced Audio Coding (AAC). In "J. Herre and JD Johnston, "A continuously signal-adaptive filter bank for high-quality perceptual audio coding," IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics, Mohonk, 1997.", the combination of a transform high resolution frequency and time analysis and spectral prediction is presented to essentially correspond to an adaptive signal transform.
Figure 9 outlines the integration of the concept described in the MODVOC processing scheme. In the analysis, subsequent to the initial DFT of the input signal x, the linear prediction coefficients (LPC) of a direct predictor along the frequency having the impulse response h (w) are derived, for example, by the autocorrelation method that minimizes the prediction error in a least squares sense. Subsequently, the filter is applied to the spectral values and the residual signal is further processed by the MODVOC algorithm. The filter coefficients, which represent the global envelope, are passed on to the synthesis stage. In the synthesis, the global envelope, derived by evaluating the prediction filter in the circle of the unit |/f(eJ")|, is re-stored by a multiplicative application of it to the sum signal that produces the output signal y, as illustrated in Figure 10.
In other words, Figures 9 and 10 present an implementation of the concept described in a modulation vocoder. Figure 9 shows the part of the modulation analyzer comprising a preprocessor 910, which performs, for example, a discrete Fourier transform of a time domain audio signal to obtain a frequency domain audio signal 602 and provides the frequency domain audio signal 602 to the case shape determiner 610, the predictive filter 710 (e.g., LPC filter h(to)), the signal subtractor 710, and the carrier frequency determiner 920. The subtractor of signal 720 can provide residual audio signal 722 to filterbank 730. Carrier frequency determiner 920 can estimate multiple carrier center of gravity frequencies and provide those carrier frequencies to filterbank 730 to control the spectral weights of the band. passing. Filterbank 730 can provide the passband signals to a signal converter 930 that performs a one-way inverse discrete Fourier transform for each passband signal to provide the plurality of subband domain passband signals to the signal processor. The modulation vocoder components are already described in more detail above. Still, Figure 10 shows the synthesis part of the modulation vocoder. It comprises the combiner 640 and the wrapper modeler comprising a coefficient converter 810 and a multiplier 820. Additional details to the modulating vocoder components and the wrapper modeler are already explained above.
Figure 11 presents a flowchart of method 1100 for modifying an audio signal, in accordance with an embodiment of the invention. Method 1100 comprises determining 1110 of envelope shape coefficients based on a frequency domain audio signal representing a time domain input audio signal and generating 1120 a plurality of passband signals in one subband domain based on the frequency domain audio signal. Further, method 1100 comprises modifying 1130 a subband domain passband signal from the plurality of subband domain passband signals based on a predefined modification target. Additionally, at least a subset of the plurality of subband domain passband signals is combined 1140 to obtain a time domain audio signal. Further, method 110 comprises forming 1150 a envelope of the time domain audio signal based on the envelope shape coefficients, forming 1150 a envelope of the plurality of subband domain passband signals containing the signal. of modified sub-band domain passband signals based on the envelope shape coefficients or forming a envelope 1150 of the plurality of sub-band domain passband signals based on the envelope shape coefficients before a band signal subband domain pass-through is modified by the signal processor to obtain a modulated audio signal.
Optionally, method 1100 may comprise additional steps that represent aspects of the described concept, . mentioned above.
Some embodiments in accordance with the invention relate to an apparatus for modifying an audio signal by combining the aspects of the apparatus shown in Figure 1 or 2 with the aspects of the apparatus shown in Figure 6. Appropriately, Figure 12 presents a diagram of blocks of an apparatus 1200, in accordance with an embodiment of the invention.
Starting from the apparatus shown in Figure 1, apparatus 1200 further comprises a case shaper 610 and a case shaper 650. In this connection, the audio signal may be a frequency domain audio signal representing an audio signal of time domain input, which can be used by the case shape determiner to determine case shape coefficients based on the frequency domain audio signal. Further, the plurality of passband signals generated by the filterbank can be generated in a subband domain based on the frequency domain audio signal. After combining the plurality of subband domain passband signals containing the modified selected fundamental passband signal and the modified identified overtone passband signal, the obtained time domain audio signal 152, 642 can be provided to the envelope shaper 650. The envelope modeler 650 can form a envelope of the time domain audio signal based on the envelope shape coefficients 612 to obtain the modulated audio signal 652.
Otherwise, starting from the apparatus shown in Figure 6, the apparatus 1200 further comprises a fundamental determiner 120 and an overtone determiner 130, as described in connection with the apparatus shown in Figure 1. The fundamental determiner 120 may select a range signal from the plurality of sub-band domain pass band signals to obtain the fundamental pass band signals 122. In addition, the overtone determiner 130 can identify a sub-band domain pass band signal 112 of the plurality of underband domain passband signals that meet an overtone criterion with respect to the selected fundamental passband signal 122 to obtain an overtone passband signal 132 associated with the selected fundamental passband signal 122 Signal processor 140, 630 can modify the selected fundamental passband signal based on an al. modifying the preset modifying range and modifying an identified overtone passband signal 132 associated with the selected fundamental passband signal 122 depending on the modification of the selected fundamental passband signal 122, as mentioned above.
In this way, overtones of fundamentals and overtones can be treated equivalent during the modification of the audio signal and the spectral coherence of the plurality of passband signals can be preserved very accurately when forming the time-domain modified audio signal based on the shell shape coefficients derived prior to modification of the passband signals. In this way, the perceptual quality of the modified audio signal can be significantly improved.
Apparatus 1200 can carry out additional aspects of the different implementation examples mentioned above.
Next, the improvement in the perceptual quality of the modified audio signals is presented by the results of the listening tests. For this hearing test, a modulation vocoder-based implementation (MODVOC) was used, but the results are also generally valid for the proposed concept.
In order to assess the subjective audio quality of the modulation vocoder (MODVOC) for the application of selective intensity transposition and, in addition, the merit of the proposed improvements to the basic MODVOC principle, a set of exemplary audio files was assembled and processed. properly. Additionally, the MODVOC technology is compared to a commercially available audio software for manipulating polyphonic audio, Celemony's Melodyne editor, which has been purchased since late 2009.
Since test processing drastically alters the audio content of a signal, a direct comparison of the original and processed signal - often an inherent part of standard listening tests - is apparently not convenient in this case. In order to still measure subjective audio quality significantly, a special hearing test procedure was applied: the hearing test set originates from symbolic MIDI data that is interpreted into waveforms using an expander high-quality MIDI files. This approach allows for a direct comparison of similarly altered audio files within the test and allows for an investigation into the effect of selective intensity processing alone. The test set generation procedure is summarized in Figure 17. The original test signals are prepared in the symbolic MIDI data representation (top left). A second version of these signals is generated by symbolic MIDI processing that resembles the processing of . target under test in the original audio presented from the waveform (upper right). Subsequently, these signal pairs are interpreted by a high-quality MIDI expander into waveform (WAV) files (lower left and right). In the listening test, the interpreted waveform of the processed MIDI file and various processed modulation vocoder (MODVOC) versions of the original interpreted MIDI file are compared (lower right). Additionally, the output of MODVOC is compared to the output of editor Melodyne.
In addition to the MODVOC processed conditions, the test includes a condition obtained using the Melodyne editor which is currently the only commercial application to handle this type of audio manipulation and therefore can be seen as the industry standard. Melodyne Editor initially performs an automatic analysis of the entire audio file. After the initialization phase, Melodyne suggests a decomposition of the audio file. Through user interaction, this decomposition can be further refined. For a fair comparison to the MODVOC processing results, the evaluation is based on the result of this automatic initial analysis, since, in addition to the a-priori knowledge of the standard key and intensity, the MODVOC decomposition is also completely automatic.
The hearing test setup was based on a standard Multiple Stimulus with Hidden Reference and Anchor (MUSHRA) test, in accordance with the recommendation of ITÜ BS.1534 (ITU-R, "Method for the subjective assessment of intermediate sound quality ( mushra)," 2001.). MUSHRA is a blind hearing test. Only one Person at a time is subject to the test. For each item, the test presents all test conditions along with the hidden reference and a low-pass filtered anchor hidden from the listener in a time-aligned mode. Hidden reference and bottom anchor are included in order to check the reliability of listeners. Switching between conditions while the listener is allowed and therefore is looping in arbitrarily selected divisions of the item as suggested in BS.1116-1 (ITU-R, "Methods for the subjective assessment of small impairments in audio systems including multichannel sound Systems," 1994-1997.) and is applicable to MUSHRA tests as well. There is no limit to the number of repetitions test subjects could hear before rating the item and moving on to the next test item, thus allowing for a very close comparison and thorough examination of different conditions. The perceptual quality of the items is rated on a scale ranging from «excellent» (100 points) to «good» and «fair» to «unsatisfactory» (0 points). The test item sequences are randomly ordered and, in addition, the order of the conditions of each item is also randomized.
The eight test items originated from the MUTOPIA project (http://www.mutopiaproject.org/), which provides free scores for public use. Suitable excerpts having a duration of approximately 20 seconds maximum were extracted from various classical music parts containing single instruments (eg G, E) and dense full orchestra parts (eg F). Also, the dominant instrumental solo melodies accompanied by other instruments (eg C) are included in the test set. In addition to the near-fixed short-term tom parts, too, the percussive elements are contained in several items (strumming guitar starts in C and piano in G) that have a spectral challenge in transient response. of the System under test. The following table lists all items in the set.

MIDI processing to obtain the original transposed signals was done in Sonarδ, manufactured by Cakewalk. High-quality waveform interpretation was performed using Native Instruments' Band-stand in sound library version 1.0.1 R3. MODVOC processing was evaluated in three different combinations with the two improvement processing steps that are harmonic blocking and casing formation. For comparison to the Melodyne editor, version 1.0.11 was used. All conditions are listed in the table below.


Subjective hearing tests were conducted in an acoustically insulated laboratory that is designed to allow high quality hearing tests in an environment similar to an “ideal” room. The listeners were equipped with STAX electrostatic headphones that were powered from an Edirol USB com interface connected to an Apple MAC mini. The Hearing Test Software was powered by Fraunhofer IIS, operated in MUSHRA mode, providing a simple GUI to support the listener in performing the test. Listeners can switch between reference (1) and different conditions (2-7) during playback. Each listener can individually decide how long to listen to each item and condition. During the current switch, the sound reproduction is muted. In the GUI, Vertical Bars display the rating assigned to each condition. Experienced listeners who are familiar with audio coding but also have a musical background were chosen in order to obtain, on the one hand, an educated judgment about typical signal processing artifacts such as pre- and post-echoes or dispersion of transients and, on the other hand, musical parameters such as spectral intensity, melody and timbre. In addition, listeners were asked to provide their informal observations and impressions.
Fifteen individuals in total contributed to the test result, while one listener had to be post-selected due, of course, to failing to successfully identify the hidden original (by grading it at 64 points).
Figure 18 summarizes the results of the hearing test. The perceptual quality for items processed by selective intensity transposition ranges from fair to good. The lower anchor was classified between unsatisfactory and poor, so that the distance between the processed items and the anchor equals approx. 40 points of MUSHRA.
Absolute scores provide information that quantifies the perceptual quality of each item (in each of the test conditions) and thereby classifies the difference in quality between the items in the test set, but not adequate to compare the different conditions within the test of hearing, since the classifications of these conditions are not independent. For a direct comparison of the conditions that originate from the different selective transposition processing schemes, the punctuation differences are considered below.
Figure 19 depicts the result based on the score differences of the improved MODVOC variants (conditions 4 and 5) versus the simple MODOVC results (condition 3). Here, all improved MODVOC variants score considerably better than simple MODVOC processing (all scores are well located above zero). There is significance in the sense of 95% confidence for all items and conditions, except for the application of harmonic blocking only in item A and C.
Figure 20 displays the test scores as score differences from condition 6 (Melodyne editor). For item C, the MODVOC in condition 5 scores significantly better than the Melodyne editor while condition 4, despite being slightly positive, and condition 3 are inconclusive in a sense of 95% confidence interval (confidence intervals overlap to 0) . For items B (condition 2) , F, G (condition 5) no meaningful conclusion can be collected either, but a trend of better MODVOC performance can also be seen for item C in condition 4 and item F in conditions 4 and 5. In all other cases, MODVOC scores significantly worse than editor Melodyne.
The score reflects an overall quality judgment comprising aspects such as unnatural sound artifacts such as transient degradation by pre- or post-echoes, intensity accuracy, melody correction, and timbre preservation. In order to interpret the results in more detail, listeners were asked to write down their informal observations next to the actual score annotation. From these observations, it can be concluded that timbre preservation and the absence of unnatural sound artifacts were represented in the overall score to a higher degree than, for example, the melody preservation wave. Furthermore, if a particular melody is unknown to the listener, it appears that the test persons were not able to memorize the reference melody on short notices during the test and thus were not sure about the actual melody. This may be an explanation of Melodyne editor's higher overall rating of processed items, which had a higher fidelity towards timbre preservation, especially for sounds that originate from single instruments. However, this comes at the expense of serious melody errors that occur accidentally that can presumably happen due to lack of classification. MODVOC is stronger in this regard as it does not predominantly rely on aspect-based classification techniques.
Some embodiments according to the invention relate to an improved modulation vocoder for selective intensity transposition. The concept of the modulating vocoder (MODVOC) was introduced and its general ability to selectively transpose polyphonic music content was highlighted. This presents possible applications aimed at changing the key mode of pre-recorded PCM music samples. Two enhancement techniques for selective intensity transposition by MODVOC are proposed. The performance of selective transposition application and the merits of these techniques are gauged by the results obtained from a specially designed hearing test methodology that is capable of managing extreme changes in intensity relative to the original audio stimulus. The results of this subjective perceptual quality assessment are presented for the items that were converted between minor and major key mode by MODVOC and, additionally, by the first commercially available software that is also capable of handling this task.
It is worth noting that while the Melodyne editor initially performs an automatic analysis of the entire audio file before allowing any manipulations, MODVOC operates block-by-block, thus possibly allowing real-time operation.
Improvement techniques have been proposed for the modulation vocoder (MODVOC) for selective intensity transposition. From the listening test results obtained for MIDI interpreted test signals, it can be concluded that the perceptual quality of simple MODVOC is greatly enhanced by harmonic blocking and casing formation. On all items, an increase of up to 10 MUSHRA points can be expected. One . main sharing of the enhancement comes from harmonic blocking.
Furthermore, MODVOC's comparison with a commercially available Software (Melodyne publisher) revealed that the overall quality level that can be achieved in the selective intensity transposition, at that point in time, may be located between 'reasonable' and 'good'. MODVOC is more resistant to lack of melody interpretation, as it does not depend primarily, in an essential way, on classification decisions.
As opposed to the multi-pass analysis performed by the Melodyne editor on the entire audio file prior to manipulation, MODVOC is only based on single-pass block processing that possibly allows for live or broadcast operation scenarios.
Although some aspects of the concepts described have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or an aspect of a method step. Similarly, aspects described in the context of a method step also represent a description of a corresponding block or item or corresponding apparatus aspect.
The inventive encoded audio signal may be stored on a digital storage medium or it may be transmitted on a transmission medium such as a wireless transmission medium or a wired transmission medium such as the Internet.
Depending on certain implementation requirements, embodiments of the invention can be implemented. in hardware or in software. The implementation can be performed using a digital storage medium, for example, a floppy disk, a DVD, a Blu-Ray, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having control signals electronically readable files stored therein, which cooperate (or are capable of cooperating) with a programmable computer system so that the respective method is carried out. Therefore, the digital storage medium can be computer readable.
Some embodiments, in accordance with the invention, comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, so that one of the methods described herein is carried out.
Generally, embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operated to perform one of the methods when the computer program product runs on a computer. Program code can, for example, be stored in a machine-readable loader.
Other embodiments comprise the computer program for performing one of the methods described herein, stored in a machine readable loader.
In other words, an embodiment of the inventive method is therefore a computer program having program code for performing one of the methods described herein, when the computer program runs on a computer.
A further embodiment of the inventive methods is therefore a data loader (either a digital storage medium or a computer readable medium) comprising, recorded thereon, the computer program for carrying out one of the methods described herein.
A further embodiment of the inventive method is therefore a data stream or signal sequence representing the computer program for carrying out one of the methods described herein. The data stream or a sequence of signals can, for example, be configured to be transferred via a data communication connection, for example, via the Internet.
A further embodiment comprises a processing means, for example a computer or a programmable logic device, configured or adapted to carry out one of the methods described herein.
A further embodiment comprises a computer having computer program installed thereon for performing one of the methods described herein.
In some embodiments, a programmable logic device (e.g., a programmable field gate array) can be used to perform some or all of the functionality of the methods described herein. In some embodiments, a programmable field gate array can cooperate with a microprocessor in order to perform one of the methods described herein. Generally speaking, the methods are preferably performed by any hardware device.
The embodiments described above are merely illustrative for the principles of the present invention. It is understood that • modifications and variations to the provisions and details described herein will be apparent to others skilled in the art. It is therefore intended to be limited only by the scope of the impending patent claims and not by the specific details presented 5 by way of description and explanation of the achievements here.
权利要求:
Claims (12)
[0001]
1. APPARATUS (100, 200) FOR MODIFYING AN AUDIO SIGNAL (102), characterized in that it comprises: a filterbank processor (110) configured to generate a plurality of passband signals (112) based on a signal audio (102); a fundamental determiner (120) configured to select a passband signal (112) from the plurality of passband signals to obtain a fundamental passband signal (122); an overtone determiner (130) configured to identify a passband signal (112) from the plurality of passband signals that meet an overtone criterion with respect to the selected fundamental passband signal (122) to obtain a overtone passband signal (132) associated with the selected fundamental passband signal (122); a signal processor (140) configured to modify the selected fundamental passband signal (122) based on a predefined modification target and configured to modify an identified overtone passband signal (132) associated with the band signal selected fundamental pass band (122) depending on the modification of the selected fundamental pass band signal (122), wherein the signal processor (140) is configured to generate an amplitude modulated (AM) signal and a frequency modulated signal ( FM) for each passband signal (112) of the plurality of passband signals, wherein the signal processor (140) is configured to modify the frequency modulated (FM) signal of the selected fundamental passband signal. (122) based on the predefined modification target, and wherein the signal processor (140) is configured to modify the frequency modulated (FM) signal of the identified overtone passband signal (132) associated with the selected fundamental passband signal (122) depending on the modification of the selected fundamental passband signal (122); and a combiner (150) configured to combine the modified fundamental passband signal (122), the modified overtone passband signal (132), and the unselected passband signals from the plurality of passband signals. to get a modified audio signal (152).
[0002]
The apparatus of claim 1, wherein each passband signal (112) of the plurality of passband signals is characterized in that it comprises a carrier frequency, wherein the overtone determiner (130) is configured to comparing the carrier frequency of a passband signal (112) of the plurality of passband signals with the carrier frequency of the selected fundamental passband signal (122), where an overtone criterion is met, if the frequency passband carrier (112) is a multiple of the carrier frequency of the selected fundamental passband signal (122) with a predefined carrier frequency tolerance.
[0003]
Apparatus according to claim 1 or 2, characterized in that the overtone determiner (130) is configured to compare an energy content of a passband signal of the plurality of passband signals with an energy content of the selected fundamental passband signal (122), wherein an overtone criterion is met if a proportion of the energy content of the selected fundamental passband signal (112) and the energy content of the selected fundamental passband signal (122) is within a preset power tolerance range.
[0004]
Apparatus according to one of Claims 1 to 3, characterized in that the overtone determiner (130) is configured to calculate a correlation value which indicates a correlation of a temporal envelope of a passband signal (112) of the plurality of passband signals with a temporal envelope of the selected fundamental passband signal (122), wherein an overtone criterion is met if the correlation value is greater than a predefined correlation threshold.
[0005]
APPARATUS according to one of Claims 1 to 4, characterized in that the fundamental determiner (120) is configured to select an additional passband signal (112) from the plurality of passband signals without considering all of the passband signals. already selected fundamental passband signals (122) and all already identified overtone passband signals (132) to obtain an additional fundamental passband signal (122).
[0006]
Apparatus according to claim 5, characterized in that the overtone determiner (130) is configured to identify a passband signal (112) from the plurality of passband signals that meet an overtone criterion in relation to to the additional selected fundamental passband signal (122) without considering all already identified overtone passband signals (132) to obtain an overtone passband signal (132) associated with the selected fundamental passband signal additional (122).
[0007]
Apparatus according to claim 5 or 6, characterized in that the signal processor (140) is configured to modify the additional selected fundamental passband signal (122) based on an additional predefined modification target.
[0008]
8. APPARATUS according to one of Claims 1 to 7, characterized in that the fundamental determinant (120) is configured to select the passband signal (112) based on an energy criterion.
[0009]
Apparatus according to one of claims 1 to 8, in that the fundamental determiner (120) is configured to determine a weighted energy content of each passband signal (112) of the plurality of passband signals and configured for selecting a passband signal (112) characterized by comprising the highest weighted energy content of a to obtain the fundamental passband signal (122).
[0010]
APPARATUS according to one of Claims 1 to 9, characterized in that it comprises a frequency carrier (260), wherein the filterbank processor (110) comprises a filterbank (212) and a signal converter (214), wherein the filterbank (212) is configured to generate passband signals based on the audio signal (102), wherein the signal converter (214) is configured to convert the passband signals. pass generated for a subband domain to obtain the plurality of passband signals, wherein the carrier frequency determiner (260) is configured to determine a plurality of carrier frequencies based on the audio signal (102), wherein the Filterbank (212) of filterbank processor (110) is configured to generate the bandpass signals such that each bandpass signal comprises a frequency range containing a carrier frequency different from the plural. frequency of carrier frequencies to obtain a passband signal associated with each carrier frequency of the plurality of carrier frequencies.
[0011]
APPARATUS according to one of Claims 1 to 10, characterized in that it comprises an envelope shaper and an envelope shaper, wherein the envelope shaper is configured to determine the envelope shape coefficients based on the audio signal (102), wherein the audio signal (102) is a frequency domain audio signal representing a time domain input audio signal, wherein the filterbank processor (110) is configured to generate the plurality of passband signals in a subband domain based on the frequency domain audio signal, wherein the combiner is configured to combine at least a subset of the plurality of passband signals to obtain the modified audio signal representing a time domain audio signal, wherein the envelope shaper is configured to form a envelope of the time domain audio signal based on the shape coefficients. envelope, to form an envelope of the plurality of subband domain passband signals containing the modified subband domain passband signal based on the envelope shape coefficients or to form a envelope of the plurality of passband signals. sub-band domain pass-through based on the envelope shape coefficients before a sub-band domain pass-band signal is modified by the signal processor to obtain a shaped audio signal.
[0012]
12. METHOD (300) FOR MODIFYING AN AUDIO SIGNAL, comprising: generating (310) a plurality of passband signals based on an audio signal; selecting (320) a passband signal from the plurality of passband signals to obtain a fundamental passband signal; identifying (330) a passband signal from the plurality of passband signals that meets an overtone criterion with respect to the selected fundamental passband signal to obtain a passband signal and overtone associated with the overtone signal. selected fundamental pass range; modification (340) of the selected fundamental passband signal based on a predefined modification target by generating an amplitude modulated (AM) signal and a frequency modulated (FM) signal for each passband signal (112) of the plurality of passband signals and by modifying the frequency modulated (FM) signal of the selected fundamental passband signal (122) based on the predefined modification target, modifying (350) an overtone passband signal associated with the selected fundamental passband signal depending on the modification of the selected fundamental passband signal by modifying the frequency modulated (FM) signal of the identified overtone passband signal (132) associated with the passband signal selected fundamental (122) depending on the modification of the selected fundamental passband signal (122); and combining (360) the modified fundamental passband signal (122), the modified overtone passband signal (132) and the unselected passband signals from the plurality of passband signals to obtain a signal. modified audio.
类似技术:
公开号 | 公开日 | 专利标题
BR112012021540B1|2021-07-27|APPARATUS AND METHOD FOR MODIFYING AN AUDIO SIGNAL USING HARMONIC LOCK
Serra1990|A system for sound analysis/transformation/synthesis based on a deterministic plus stochastic decomposition
Holzapfel et al.2009|Three dimensions of pitched instrument onset detection
Woodruff et al.2008|Resolving overlapping harmonics for monaural musical sound separation using pitch and common amplitude modulation
Benetos et al.2010|Auditory spectrum-based pitched instrument onset detection
Disch et al.2010|An enhanced modulation vocoder for selective transposition of pitch
Szczerba et al.2005|Pitch detection enhancement employing music prediction
Bartkowiak et al.2012|Hybrid sinusoidal modeling of music with near transparent audio quality
Huber2009|Harmonic Audio Object Processing in Frequency Domain
Lech et al.2008|A system for automatic detection and correction of detuned singing
Triki2010|Harmonize-Decompose Audio Signals with Global Amplitude and Frequency Modulations
Disch et al.2011|Frequency selective pitch transposition of audio signals
Bay et al.2011|Methods for separating harmonic instruments from a monaural mix
同族专利:
公开号 | 公开日
SG183464A1|2012-09-27|
TW201205555A|2012-02-01|
AU2011219778B2|2013-12-05|
ZA201207111B|2013-05-29|
HK1180443A1|2013-10-18|
CN102870153B|2014-11-05|
CA2790651A1|2011-09-01|
TWI456566B|2014-10-11|
JP5592959B2|2014-09-17|
AU2011219780A1|2012-10-18|
US20130216053A1|2013-08-22|
AU2011219780B2|2013-12-05|
JP2013520698A|2013-06-06|
PL2539886T3|2015-01-30|
SG183461A1|2012-09-27|
ES2523800T3|2014-12-01|
HK1180444A1|2013-10-18|
EP2539886B1|2014-08-13|
ES2484718T3|2014-08-12|
BR112012021540A2|2017-07-04|
US20130182862A1|2013-07-18|
ZA201207112B|2013-05-29|
MX2012009787A|2012-09-12|
KR101494062B1|2015-03-03|
WO2011104356A3|2012-06-07|
EP2362376A3|2011-11-02|
EP2539885A1|2013-01-02|
KR20120128140A|2012-11-26|
RU2591732C2|2016-07-20|
TW201142815A|2011-12-01|
AR080320A1|2012-03-28|
EP2539886A2|2013-01-02|
US9264003B2|2016-02-16|
CN102859579B|2014-10-01|
RU2012140725A|2014-04-10|
CA2790650A1|2011-09-01|
EP2362376A2|2011-08-31|
MY161212A|2017-04-14|
AU2011219778A1|2012-10-18|
KR101492702B1|2015-02-11|
WO2011104356A2|2011-09-01|
TWI470618B|2015-01-21|
WO2011104354A1|2011-09-01|
MY154205A|2015-05-15|
JP2013520697A|2013-06-06|
BR112012021540A8|2018-07-03|
CA2790651C|2015-11-24|
PL2539885T3|2014-12-31|
EP2539885B1|2014-07-02|
EP2362375A1|2011-08-31|
US9203367B2|2015-12-01|
CN102859579A|2013-01-02|
JP5655098B2|2015-01-14|
AR080319A1|2012-03-28|
CA2790650C|2015-11-24|
MX2012009776A|2012-09-07|
CN102870153A|2013-01-09|
RU2591733C2|2016-07-20|
RU2012140707A|2014-05-27|
KR20130010118A|2013-01-25|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题

US5251151A|1988-05-27|1993-10-05|Research Foundation Of State Univ. Of N.Y.|Method and apparatus for diagnosing the state of a machine|
JP2990777B2|1990-09-28|1999-12-13|ヤマハ株式会社|Electronic musical instrument effect device|
US5536902A|1993-04-14|1996-07-16|Yamaha Corporation|Method of and apparatus for analyzing and synthesizing a sound by extracting and controlling a sound parameter|
JP2713102B2|1993-05-28|1998-02-16|カシオ計算機株式会社|Sound signal pitch extraction device|
JPH07219597A|1994-01-31|1995-08-18|Matsushita Electric Ind Co Ltd|Pitch converting device|
KR19980013991A|1996-08-06|1998-05-15|김광호|Voice zoom signal emphasis circuit|
SE512719C2|1997-06-10|2000-05-02|Lars Gustaf Liljeryd|A method and apparatus for reducing data flow based on harmonic bandwidth expansion|
ID29029A|1998-10-29|2001-07-26|Smith Paul Reed Guitars Ltd|METHOD TO FIND FUNDAMENTALS QUICKLY|
RU2155387C1|1998-12-10|2000-08-27|Общество с ограниченной ответственностью "Институт ноосферного естествознания"|Musical synthesizer|
SE9903553D0|1999-01-27|1999-10-01|Lars Liljeryd|Enhancing conceptual performance of SBR and related coding methods by adaptive noise addition and noise substitution limiting |
EP1254513A4|1999-11-29|2009-11-04|Syfx|Signal processing system and method|
AUPQ952700A0|2000-08-21|2000-09-14|University Of Melbourne, The|Sound-processing strategy for cochlear implants|
JP4245114B2|2000-12-22|2009-03-25|ローランド株式会社|Tone control device|
US20050190199A1|2001-12-21|2005-09-01|Hartwell Brown|Apparatus and method for identifying and simultaneously displaying images of musical notes in music and producing the music|
JP3862061B2|2001-05-25|2006-12-27|ヤマハ株式会社|Music sound reproducing device, music sound reproducing method, and portable terminal device|
US6825775B2|2001-08-01|2004-11-30|Radiodetection Limited|Method and system for reducing interference|
US20030187663A1|2002-03-28|2003-10-02|Truman Michael Mead|Broadband frequency translation for high frequency regeneration|
JP3797283B2|2002-06-18|2006-07-12|ヤマハ株式会社|Performance sound control method and apparatus|
JP3938015B2|2002-11-19|2007-06-27|ヤマハ株式会社|Audio playback device|
JP3744934B2|2003-06-11|2006-02-15|松下電器産業株式会社|Acoustic section detection method and apparatus|
US7062414B2|2003-07-18|2006-06-13|Metrotech Corporation|Method and apparatus for digital detection of electromagnetic signal strength and signal direction in metallic pipes and cables|
US8023673B2|2004-09-28|2011-09-20|Hearworks Pty. Limited|Pitch perception in an auditory prosthesis|
DE102004021403A1|2004-04-30|2005-11-24|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Information signal processing by modification in the spectral / modulation spectral range representation|
US8204261B2|2004-10-20|2012-06-19|Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V.|Diffuse sound shaping for BCC schemes and the like|
US8036394B1|2005-02-28|2011-10-11|Texas Instruments Incorporated|Audio bandwidth expansion|
CN101138274B|2005-04-15|2011-07-06|杜比国际公司|Envelope shaping of decorrelated signals|
US7872962B1|2005-10-18|2011-01-18|Marvell International Ltd.|System and method for producing weighted signals in a diversity communication system|
KR100958144B1|2005-11-04|2010-05-18|노키아 코포레이션|Audio Compression|
JP2007193156A|2006-01-20|2007-08-02|Yamaha Corp|Electronic musical instrument with tuning device|
US20090299755A1|2006-03-20|2009-12-03|France Telecom|Method for Post-Processing a Signal in an Audio Decoder|
JP4757130B2|2006-07-20|2011-08-24|富士通株式会社|Pitch conversion method and apparatus|
JP4630980B2|2006-09-04|2011-02-09|独立行政法人産業技術総合研究所|Pitch estimation apparatus, pitch estimation method and program|
US8392198B1|2007-04-03|2013-03-05|Arizona Board Of Regents For And On Behalf Of Arizona State University|Split-band speech compression based on loudness estimation|
US8428957B2|2007-08-24|2013-04-23|Qualcomm Incorporated|Spectral noise shaping in audio coding based on spectral dynamics in frequency sub-bands|
JP5228432B2|2007-10-10|2013-07-03|ヤマハ株式会社|Segment search apparatus and program|
US8498667B2|2007-11-21|2013-07-30|Qualcomm Incorporated|System and method for mixing audio with ringtone data|
DE102008013172B4|2008-03-07|2010-07-08|Neubäcker, Peter|Method for sound-object-oriented analysis and notation-oriented processing of polyphonic sound recordings|
BR122012006269A2|2008-03-10|2019-07-30|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|EQUIPMENT AND METHOD FOR HANDLING AN AUDIO SIGN HAVING A TRANSIENT EVENT|
EP2104096B1|2008-03-20|2020-05-06|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for converting an audio signal into a parameterized representation, apparatus and method for modifying a parameterized representation, apparatus and method for synthesizing a parameterized representation of an audio signal|
JP4983694B2|2008-03-31|2012-07-25|株式会社Jvcケンウッド|Audio playback device|
EP2109328B1|2008-04-09|2014-10-29|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus for processing an audio signal|
EP2304721B1|2008-06-26|2012-05-09|France Telecom|Spatial synthesis of multichannel audio signals|
MX2011000372A|2008-07-11|2011-05-19|Fraunhofer Ges Forschung|Audio signal synthesizer and audio signal encoder.|
CN101836253B|2008-07-11|2012-06-13|弗劳恩霍夫应用研究促进协会|Apparatus and method for calculating bandwidth extension data using a spectral tilt controlling framing|
US8285385B2|2009-01-20|2012-10-09|Med-El Elektromedizinische Geraete Gmbh|High accuracy tonotopic and periodic coding with enhanced harmonic resolution|
EP2239732A1|2009-04-09|2010-10-13|Fraunhofer-Gesellschaft zur Förderung der Angewandten Forschung e.V.|Apparatus and method for generating a synthesis audio signal and for encoding an audio signal|
US8538042B2|2009-08-11|2013-09-17|Dts Llc|System for increasing perceived loudness of speakers|
US8321215B2|2009-11-23|2012-11-27|Cambridge Silicon Radio Limited|Method and apparatus for improving intelligibility of audible speech represented by a speech signal|
WO2011110500A1|2010-03-09|2011-09-15|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for processing an input audio signal using cascaded filterbanks|
WO2011141772A1|2010-05-12|2011-11-17|Nokia Corporation|Method and apparatus for processing an audio signal based on an estimated loudness|
CN103262409B|2010-09-10|2016-07-06|Dts(英属维尔京群岛)有限公司|The dynamic compensation of the unbalanced audio signal of frequency spectrum of the sensation for improving|
JP5747562B2|2010-10-28|2015-07-15|ヤマハ株式会社|Sound processor|
JP5758774B2|2011-10-28|2015-08-05|ローランド株式会社|Effect device|US20050120870A1|1998-05-15|2005-06-09|Ludwig Lester F.|Envelope-controlled dynamic layering of audio signal processing and synthesis for music applications|
US8099476B2|2008-12-31|2012-01-17|Apple Inc.|Updatable real-time or near real-time streaming|
GB201105502D0|2010-04-01|2011-05-18|Apple Inc|Real time or near real time streaming|
US8805963B2|2010-04-01|2014-08-12|Apple Inc.|Real-time or near real-time streaming|
CN102238179B|2010-04-07|2014-12-10|苹果公司|Real-time or near real-time streaming|
US8843586B2|2011-06-03|2014-09-23|Apple Inc.|Playlists for real-time or near real-time streaming|
US8856283B2|2011-06-03|2014-10-07|Apple Inc.|Playlists for real-time or near real-time streaming|
CN102543091B|2011-12-29|2014-12-24|深圳万兴信息科技股份有限公司|System and method for generating simulation sound effect|
US9712127B2|2012-01-11|2017-07-18|Richard Aylward|Intelligent method and apparatus for spectral expansion of an input signal|
BR112015016275B1|2013-01-08|2021-02-02|Dolby International Ab|method for estimating a first sample of a first subband signal in a first subband of an audio signal, method for encoding an audio signal, method for decoding an encoded audio signal, system, audio encoder and decoder audio|
SG11201505911SA|2013-01-29|2015-08-28|Fraunhofer Ges Zur Förderung Der Angewandten Forschung E V|Low-frequency emphasis for lpc-based coding in frequency domain|
CA2961336C|2013-01-29|2021-09-28|Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V.|Audio encoders, audio decoders, systems, methods and computer programs using an increased temporal resolution in temporal proximity of onsets or offsets of fricatives or affricates|
US20150003633A1|2013-03-21|2015-01-01|Max Sound Corporation|Max sound audio program|
US9520140B2|2013-04-10|2016-12-13|Dolby Laboratories Licensing Corporation|Speech dereverberation methods, devices and systems|
CN104282312B|2013-07-01|2018-02-23|华为技术有限公司|Signal coding and coding/decoding method and equipment|
EP2830058A1|2013-07-22|2015-01-28|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Frequency-domain audio coding supporting transform length switching|
US9697843B2|2014-04-30|2017-07-04|Qualcomm Incorporated|High band excitation signal generation|
AU2014204540B1|2014-07-21|2015-08-20|Matthew Brown|Audio Signal Processing Methods and Systems|
US9391649B2|2014-11-17|2016-07-12|Microsoft Technology Licensing, Llc|Envelope shaping in envelope tracking power amplification|
CN105118523A|2015-07-13|2015-12-02|努比亚技术有限公司|Audio processing method and device|
US10262677B2|2015-09-02|2019-04-16|The University Of Rochester|Systems and methods for removing reverberation from audio signals|
JP6705142B2|2015-09-17|2020-06-03|ヤマハ株式会社|Sound quality determination device and program|
US9654181B1|2015-12-14|2017-05-16|Nxp B.V.|Dynamic transmitter signal envelope shaping control for NFC or RFID devices|
CN105750145B|2016-03-26|2018-06-01|上海大学|It comprehensive can show the implementation method of the music fountain of music time domain frequency domain characteristic|
CN108269579B|2018-01-18|2020-11-10|厦门美图之家科技有限公司|Voice data processing method and device, electronic equipment and readable storage medium|
US10950253B2|2018-02-09|2021-03-16|Board Of Regents, The University Of Texas System|Vocal feedback device and method of use|
US11017787B2|2018-02-09|2021-05-25|Board Of Regents, The University Of Texas System|Self-adjusting fundamental frequency accentuation subsystem for natural ear device|
JP2019164107A|2018-03-20|2019-09-26|本田技研工業株式会社|Abnormal sound determination device and determination method|
US11122354B2|2018-05-22|2021-09-14|Staton Techiya, Llc|Hearing sensitivity acquisition methods and devices|
CN109683142B|2018-12-04|2020-06-09|郑州轻工业大学|Method for estimating parameters of triangular linear frequency modulation continuous signals based on differential envelope detection|
JP2022007288A|2020-06-26|2022-01-13|ローランド株式会社|Effect device and effect processing program|
法律状态:
2019-01-08| B06F| Objections, documents and/or translations needed after an examination request according [chapter 6.6 patent gazette]|
2019-09-17| B06U| Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]|
2021-05-25| B09A| Decision: intention to grant [chapter 9.1 patent gazette]|
2021-07-27| B16A| Patent or certificate of addition of invention granted [chapter 16.1 patent gazette]|Free format text: PRAZO DE VALIDADE: 20 (VINTE) ANOS CONTADOS A PARTIR DE 25/02/2011, OBSERVADAS AS CONDICOES LEGAIS. PATENTE CONCEDIDA CONFORME ADI 5.529/DF, QUE DETERMINA A ALTERACAO DO PRAZO DE CONCESSAO. |
优先权:
申请号 | 申请日 | 专利标题
US30851310P| true| 2010-02-26|2010-02-26|
US61/308,513|2010-02-26|
EP10175282A|EP2362375A1|2010-02-26|2010-09-03|Apparatus and method for modifying an audio signal using harmonic locking|
EP10175282.2|2010-09-03|
PCT/EP2011/052834|WO2011104354A1|2010-02-26|2011-02-25|Apparatus and method for modifying an audio signal using harmonic locking|
[返回顶部]