韩国专利KR20040029314A Enhancing source coding systems by adaptive transposition

专利PDF首页>>韩国专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
The present invention is directed to a new method for the enhancement of source coding systems using high frequency reconstruction. It is shown that the tonal signals of the present invention are classified as either pulse-train-like or non-pulse-train-like. With this classification, significant advances in cognitive audio quality can be achieved through adaptive switching of the modulator.
公开号:KR20040029314A
申请号:KR10-2003-7007893
申请日:2001-12-19
公开日:2004-04-06
发明作者:크리스토퍼 크졸링；프레드릭 헨；페르 엑스트란트；라아스 빌레모
申请人:코딩 테크놀러지스 에이비；
IPC主号:

专利说明:

How to improve source coding systems by adaptive modulation {ENHANCING SOURCE CODING SYSTEMS BY ADAPTIVE TRANSPOSITION}
[2] In "Source Coding Enhancement Using Spectral-Band Replication" [WO 98/57436], Transposition is defined as an effective means for generating high frequencies used in codecs based on High Frequency Reconstruction (HFR). Has been established. Implementations of various modulators are described. However, although there is a brief discussion of transient response improvements, there is no detail about the basic modulator characteristics applicable to the program.
[1] The present invention relates to a new method for improving performance of source coding systems using high frequency reconstruction. The present invention shows that the tonal signals are classified into either pulse-train-like or non-pulse-train-like. With this classification, significant advances in the perceived audio quality can be obtained by adaptive switching such as modulators. The present invention shows that switched modulators must have basic differences in their characteristics.
[14] The present invention will be described by way of example with reference to the following figures without limiting the scope of the inventive idea.
[15] 1A shows an input pulse-train signal illustrates x (n).
[16] 1b is a signal The magnitude spectrum | X (f) | of x (n) is illustrated.
[17] 2A illustrates the impulse response h ₀ (n) of the FIR filter.
[18] 2B illustrates the magnitude spectrum | H ₀ (f) | of the FIR filter.
[19] 3a shows the signal y₀(n) = x (n) * h₀(n) is illustrated.
[20] 3B illustrates the magnitude spectrum | Y ₀ (f) | of the signal y ₀ (n).
[21] 4A illustrates the decimated impulse response h ₁ (n) of HFR.
[22] FIG. 4B illustrates the size spectrum | H ₁ (f) | of the decimated FIR filter.
[23] 5A illustrates the modulated signal y ₁ (n).
[24] 5B illustrates the magnitude spectrum | Y ₁ (f) | of the signal y ₁ (n).
[25] 6 illustrates the magnitude spectrum | Y ₂ (f) | after FD-modulation with the long window of signal x (n).
[26] Figure 7 illustrates the achievement of the present invention at the decoder side.
[3] The present invention shows that the tunal passages, i.e., the passages provided mainly by pitched instruments, can be characterized as pulse-train-like and non-pulse-train-like. A typical example of the former (pulse strain like) is a single melodic instrument such as a human voice or a trumpet in the case of vowels, where the "excitation signal" can be modeled as "pulse strain". The latter (non-pulsetrain like) is where several different tones are combined, so no single pulsetrain can be identified. According to the present invention, the performance of the HFR can be significantly improved by distinguishing between the two cases and adapting the modulator properties correspondingly.
[4] If a pulse-train-like passage is detected, the modulator is preferably operated on a per-pulse basis. Here, the decoded low band provided as an input signal of the modulator is a period T_pDivided by and cutoff frequency f_CIt can be seen as a series of lowpass characteristic impulse reactions h (n) with This is frequency f_CTo 1 / T_pFundamental frequency 1 / T with harmonics at all integer multiples of_pCorresponds to the Fourier series. The purpose of the modulator is the period T_pThe desired band Nf when the modulation factor is N_C To increase the band of the individual reactions h (n).
[5] Since the pulse period is preserved, the modulated signal still contains all components up to the current Nf _C and corresponds to a Fourier series with a fundamental frequency of 1 / T _p . So this method provides a perfect continuation of the truncated Fourier series, with the low end of the band cut off. Some prior art methods satisfy the request for preservation of the pulse period. Examples are frequency translations and FD-modulation by [WO98 / 57436], where the window is chosen short enough to not contain more than one period, ie length (window) &_lt; T _p . None of those achievements deals well with material with multiple piches, and only the FD-modulation provides a complete continuation of the truncated Fourier series at the end of the low band.
[6] If a non-pulsetrain pseudo-passage is detected, for example in the presence of multiple piches, the demands on the modulators are based on low-band harmonics and generated harmonics instead of preservation of pulse periods. Transition to preservation of integer relationships between them. This requirement can be found in the FD-modulation methods in [WO 98/57436], where the window is long enough to contain many periods T _i of individual rhythms forming a sequence in one window, ie , Length (window) '' T _i . This also modulates truncated Fourier series [f _i, 2f _i, 3f _i, ...] at any end in the modulator source frequency domain with [Nf _i, 2Nf _i, 3Nf _i, ...] Where N is an integer modulation factor. Contrary to the pulse unit operation above, it is not clear that this scheme produces a complete continuation of the low band Fourier series. This is acceptable for the following rate signals, but not ideal for single-tone pulse train variations. So, this modulation mode is only desired for non-pulse-train-like cases.
[7] According to the invention, the discrimination between pulse-like and non-pulse-like signals can be made at the encoder, and the corresponding control signal can be sent to the decoder. Alternatively, the detection can be done at the decoder, which eliminates the need to send control signals but results in a high degree of complexity at the decoder. Examples of detection principles are vertex-selection in the frequency domain as well as instantaneous detection in the time domain. The decoder includes means for the necessary modulator adaptation. As an example, a system using frequency translation for pulse train like cases and long window FD modulation for non-pulse train like cases is described above. Actual switching or cross fading between the modulators is preferably performed in an envelope-adjusting filterbank.
[8] The present invention includes the following features.
[9] Select different methods for generating high frequency adaptively over time based on whether the processed signal has pulse-train-like characteristics or non-pulse-train-like characteristics.
[10] The selection is based on analysis by vertex-selection in the time / frequency domain representation of the signal.
[11] Other methods for high frequency generation include frequency translation and FD modulation, or
[12] Other methods for high frequency generation include FD modulation with different window sizes, or
[13] Other methods for high frequency generation are time-domain pulse train modulation and FD modulation.
[27] The embodiments described below are merely illustrative for the purposes of the present invention for adaptive modulator switching for HFR systems. It should be understood that modifications and variations of the described arrangements and details will be apparent to those of ordinary skill in the art. Accordingly, it is intended to be limited only by the appended claims and not by the specific details set forth in the description and description of the embodiments herein.
[28] The "ideal transposition" of a single loudness pulse train like signal can be defined by a simple model. If the original signal is separated by m samples, i.e. the sum of the pulse-train diracs δ (n),
[29] Formula (1)
[30] FIG. 1A shows x (n) and FIG. 1B shows the corresponding magnitude spectrum | X (f) |. Obviously | X (f) | corresponds to any one of the Fourier series with a basic f _s / m, where f _s is the sampling frequency. Putting y (n) as the lowpass version of x (n), where the lowpass FIR filter has an impulse response h ₀ (n) of length p corresponding to p <m, in the time and frequency domain See Figures 2A and 2B respectively for representation. The cutoff frequency of the filter is f _c . And the output signal at that time
[31] (Equation 2)
[32] That is, given a series of impulse responses separated by m samples. 3A and 3B are y ₀ (n) and | Y ₀ (f) |, respectively. The origianl Fourier series have been effectively disconnected at the frequency f _c . It is assumed that a modulator based on the time domain can detect an individual impulse response h ₀ (n-lm), such signals being selected by the factor 2, i.e., fed to the output by every second sample. Discarded samples are compensated by inserting 0s ((zeros)) between short responses h ₁ (n-lm) to preserve the length of the signal The de-selected impulse response h ₁ (n) And the corresponding frequency representation | H (f) | are respectively shown in Figs. 4A and 4B. Clearly narrowing the time domain signal corresponds to widening the frequency domain signal by factor 2 in this case. Finally, the modulated signal And | Y ₁ (f) | Is shown in FIGS. 5A and 5B. While preserving accurate time and thereby preserving frequency and properties, the bandwidth of the lowpass (LP) filtered pulse-train has become large. The output signal y ₁ (n) corresponds to a Fourier series having components up to the frequency 2f _c .
[33] The modulation can be approximated in many ways. One approach is to use a frequency domain modulator (FD-transposer) with different window sizes, such as the STFT modulator described in [W0 98/57436]. That is, a short window is used for pulse train signals and a long window is used for all other signals. A short window (length ≤ m in the above example) gives the desired pulse modulation as outlined above and gives confidence that the modulator operates on a pulse basis. Another approach for pulse modulation uses single-side-band modulation. This assures that the period time T _P between the pulses is correct, but the generated partials are not harmonically related to the low band components. It should be pointed out that different pulse train modulation algorithms may be implemented differently for different program material. Therefore, many pulse train modulators can be used with appropriate detection algorithms at the encoder and / or decoder to ensure optimum performance.
[34] Implementation by the FD-modulation method using a long window for the pulse train signal used in the above example may give unsatisfactory results. This is due to the following: When using long windows (of length »m) in the FD-modulation method, the following relationship applies:
[35] (Equation 3)
[36] Where u (n) is the input, v (n) is the output, M is the displacement factor, N is the number of sinusoids, f _i , e _i (n), α _i are the individual input frequencies, time envelopes, And phase constants.
[37] β _i is arbitrary output phase constants and f _s is the sampling frequency.
[38] And 0≤Mf _i ≤ f _s / 2. The input signal x (n) uses the relationship of equation 3 to obtain an output signal y ₂ (n) having a magnitude spectrum | Y ₂ (f) | in accordance with FIG. 6, where the components of y ₂ (n) are x harmonically related to the components of (n). However, the distance between them increased with the modulation factor. In other words, the pitch of the signal was heightened by the modulation factor. When adding a new high band signal to the original low band signal, the two different pitches are clearly differentiated. This allows, for example, the sound to be produced as additional speakers speak simultaneously but in high pitch to the voice signal. That is, ghost sounds appear.
[39] However, pulse modulation cannot be applied where high quality HFR is required unless the input signal exhibits pulse train characteristics of a single pitch. Thus, it is highly desirable to detect which modulation method gives the best results at a given time for optimal performance of the HFR system.
[40] To benefit from other modulation characteristics at the decoder, it is necessary to evaluate which modulation method gives the encoder and / or decoder the best results at a given time. There are many ways to detect pulse-train-like characteristics of a signal, which can be done either in the time domain or in the frequency domain. If the pulse train has a period time T _p , the pulses will be separated in time by the period, and the frequency components are at a distance of 1 / T _P. So, if T _p is high, i.e. low-band pulse-train, it is preferably detected in the time domain, since the pulses are relatively far apart and so easy to discern. However, if T _p is low, this corresponds to a high frequency pulse train and so it is more easily detected in the frequency domain. For time domain detection, it is desirable to continuously whiten the signal to have pulse train-like properties as much as possible to allow for easier detection. Detection schemes in the time domain and frequency domain are similar. They are based on vertex selection and statistical analysis of the distances between the selected vertices. Vertex selection in the time domain is done by comparing the energy of the pre / post signal at any point with the vertex level, and thus detecting the instantaneous behavior in the signal. Vertex detection in the frequency domain is done in the harmonic product spectrum, which is a good indicator if there is a strong harmonic series. The distances between the detected pitches are detected by comparing the ratio between pitch-related entries and non-pitch-related entries. It is displayed on the histogram.
[41] The implementations shown in FIG. 7 show the usage of two different types of modulation methods in the same decoder system-either FD modulators using long windows, or frequency translatind devices [PCT / SE01 / 01150]. Can be. Demultiplexer 701 decodes the bitstream signals and supplies it to any baseband decoder 702. The output from the baseband decoder, i.e., the bandwidth limited audio signal, is fed to an analysis filterbank 703, which divides the audio signal into consecutive bands. The audio signal is supplied to the FD-modulation unit 705 simultaneously. The output therefrom is fed to an additional analysis filterbank 706, which is of the same type as the filterbank unit 703. The data from the filter bank unit 703 is sent to the frequency translation device 704 according to the principles of the frequency translation devices, and fed to the mixing unit 707 with the output from the analysis filter bank 706. do. The mixing unit mixes the data according to the control signal transmitted from the encoder or the control signals obtained from the decoder. The mixed spectral data is subsequently applied to an envelope in an envelope adjuster 708 using data and control signals sent in the bitstream. The spectrum-adapted signal and data from the analysis filter bank 703 are supplied to the synthesis filter bank unit 709, thereby generating an envelope adaptive broadband signal. Finally, the digital wideband signal is converted into an analog output signal at the digital / analog converter 710.

权利要求:
Claims (5)
[1" claim-type="Currently amended] In a method for improving performance of an audio source coding system using high frequency reconstruction,
Improving the performance of an audio source coding system characterized by selecting different methods for high frequency generation adaptively over time, based on whether the processed signal has a pulse train-like property or a non-pulse train-like property. Way.
[2" claim-type="Currently amended] The method according to claim 2,
Said selection being based on analysis by vertex selection in the time and frequency domain representation of said signal.
[3" claim-type="Currently amended] The method of claim 1, wherein the other methods of high frequency generation are frequency translation and FD modulation.
[4" claim-type="Currently amended] The method of claim 1, wherein the other methods of high frequency generation are performed by an FD modulator having different window sizes.
[5" claim-type="Currently amended] The method of claim 1, wherein the other methods of high frequency generation are time domain pulse train modulation and FD modulation.

类似技术:

公开号 | 公开日 | 专利标题

US10008213B2|2018-06-26|Spectral translation/folding in the subband domain

US10600427B2|2020-03-24|Harmonic transposition in an audio coding method and system

US10586550B2|2020-03-10|Cross product enhanced harmonic transposition

US9761234B2|2017-09-12|High frequency regeneration of an audio signal with synthetic sinusoid addition

Sinha et al.1993|Low bit rate transparent audio compression using adapted wavelets

Fu et al.1998|Importance of tonal envelope cues in Chinese speech recognition

Nagel et al.2009|A harmonic bandwidth extension method for audio codecs

JP5655098B2|2015-01-14|Apparatus and method for modifying an audio signal using an envelope shape

TWI441162B|2014-06-11|Audio signal synthesizer, audio signal encoder, method for generating synthesis audio signal and data stream, computer readable medium and computer program

ES2247466T3|2006-03-01|Improvement of source coding using spectral band replication.

ES2739667T3|2020-02-03|Device and method to manipulate an audio signal that has a transient event

KR100726960B1|2007-06-14|Method and apparatus for artificial bandwidth expansion in speech processing

CA2580622C|2011-05-10|Method and device for the artificial extension of the bandwidth of speech signals

Itakura1975|Line spectrum representation of linear predictor coefficients of speech signals

CA2718513C|2015-09-22|Apparatus and method for converting an audio signal into a parameterized representation, apparatus and method for modifying a parameterized representation, apparatus and method for synthesizing a parameterized representation of an audio signal

JP2779886B2|1998-07-23|Wideband audio signal restoration method

KR101278546B1|2013-06-24|An apparatus and a method for generating bandwidth extension output data

CN1181467C|2004-12-22|Enhancing perceptual performance of SBR and related HFR coding methods by adaptive noise-floor addition and noise substitution limiting

KR100551862B1|2006-02-13|Enhancing the performance of coding systems that use high frequency reconstruction methods

Makhoul et al.1979|High-frequency regeneration in speech coding systems

AU2009210303B2|2011-11-10|Device and method for a bandwidth extension of an audio signal

JP5192630B2|2013-05-08|Perceptually improved enhancement of coded acoustic signals

EP0993670B1|2002-03-20|Method and apparatus for speech enhancement in a speech communication system

US8229135B2|2012-07-24|Audio enhancement method and system

Serra et al.1990|Spectral modeling synthesis: A sound analysis/synthesis system based on a deterministic plus stochastic decomposition

同族专利:

公开号 | 公开日

CN1223990C|2005-10-19|

SE0004818D0|2000-12-22|

DE60103086D1|2004-06-03|

WO2002052545A1|2002-07-04|

EP1338000A1|2003-08-27|

JP3992619B2|2007-10-17|

AT265731T|2004-05-15|

US20020118845A1|2002-08-29|

DE60103086T2|2005-01-20|

US7260520B2|2007-08-21|

HK1056428A1|2004-10-21|

JP2004517358A|2004-06-10|

CN1481546A|2004-03-10|

KR100566630B1|2006-03-31|

EP1338000B1|2004-04-28|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

法律状态:
2000-12-22|Priority to SE0004818A

2000-12-22|Priority to SE0004818-1

2001-12-19|Application filed by 코딩 테크놀러지스 에이비

2001-12-19|Priority to PCT/SE2001/002828

2004-04-06|Publication of KR20040029314A

2006-03-31|Application granted

2006-03-31|Publication of KR100566630B1

优先权:

申请号 | 申请日 | 专利标题

SE0004818A|SE0004818D0|2000-12-22|2000-12-22|Enhancing source coding systems by adaptive transposition|

SE0004818-1|2000-12-22|

PCT/SE2001/002828|WO2002052545A1|2000-12-22|2001-12-19|Enhancing source coding systems by adaptive transposition|

[返回顶部]