巴西专利BR112012009490B1 multimode audio decoder and multimode audio decoding method to provide a decoded representation of a

专利PDF首页>>巴西专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
MULTIMODE AUDIO CODEC AND CELP ADAPTED CODING OF THE SAME. The present invention relates to elements of continuous flow of subframes that are differentially encoded to a value of the global gain so that a change of the value of the global gain of the results of the frames in an adjustment of an output level of the decoded representation of the content of the audio. At the same time, the differential condition saves bits otherwise by running when introducing a new element of syntax in an encoded stream. Furthermore, the differential condition allows the reduction of the global adjustment load of the gain of a coded continuous flow allowing the time resolution in the definition of the global gain value to be less than the time resolution in which the aforementioned continuous flow element differentially encoded to the value of the global gain adjusts the gain of the respective subframes. According to another aspect, a control of the global gain through the CELP coded frames and the transformation of the coded frames is achieved by controlling the excitation gain of the CELP codec codebook, along with a level of transformation or reverse transformation of the transformation encoded frames. In (...).
公开号:BR112012009490B1
申请号:R112012009490-4
申请日:2010-10-19
公开日:2020-12-01
发明作者:Ralf Geiger；Guillaume Fuchs；Markus Multrus；Bernhard Grill
申请人:Fraunhofer-Gesellschaft zur Föerderung der Angewandten Forschung E.V.；
IPC主号:

专利说明:

description
[001] The present invention relates to multimode audio coding such as a unified expression and audio codec or a codec adapted for general audio signals such as music, expression, mixed and other signals, and a CELP encoding scheme adapted to it.
[002] It is favorable to mix mixed encoding modes in order to encode general audio signals representing a mixture of audio signals of different types such as expression, music, or the like. The individual encoding modes can be adapted to particular types of audio, and thus, a multimode audio encoder can take advantage of changes in the encoding mode over time corresponding to changing the type of audio content. In other words, the multimode audio encoder may decide, for example, to encode portions of audio signals having expression content using a specially dedicated encoding mode for expression encoding, and to use another encoding mode (s) in order to to encode different portions of the audio content representing non-expression content such as music. Linear prevention coding modes tend to be more suitable for expression coding content, whereas in frequency domain coding modes they tend to outperform linear prevention coding modes until music coding is related.
[003] However, using mixed encoding modes makes it difficult to globally adjust the gain within an encoded bit stream or, to be more accurate the gain of the decoded representation of the audio content of an encoded bit stream without actually having to decode the encoded bit stream and then the recoding of the decoded representation of the adjusted gain again, which would necessarily deflect the decrease in the quality of the bit stream from the adjusted gain due to quantifications performed in the recoding of the decoded adjusted gain representation.
[004] For example, in AAC, an adjustment of the output level can easily be achieved in the bit stream level by changing the value of the 8-bit "global gain" field. This bitstream element can simply be passed and edited, without the need for complete decoding and recoding. Thus, this process does not introduce any degradation in quality and can be undone without loss. There are applications that actually make use of this option. For example, there is free software called "AAC gain" [AAC gain] that applies exactly the method just described. This software is a derivative of the free "gain MP3" software, which applies the same technique for MPEG1 / 2 layer 3.
[005] In the newly emerging USAC codec, the FD encoding mode has inherited the overall 8-bit gain from the AAC. Thus, if USAC executed an FD mode only, such as for higher bit rates, the level adjustment functionality would be fully preserved when compared to AAC. However, as soon as mode transitions are admitted, this possibility is no longer present. In TCX mode, for example, there is also a bit stream element that has the same functionality also called "global gain", which is only 7-bits long. In other words, the number of bits for encoding the individual global elements of the individual modes is first adapted to the respective encoding mode in order to achieve a better compensation between lower expenditure of bits for gain control on the one hand, and avoiding on the other hand a quality degradation due to a very large quantification of the gain adjustability. Obviously, this compensation resulted in a different number of bits when comparing the TCX and the FD mode. In the ACELP mode of the currently emerging USAC standard, the level can be controlled via a "medium energy" bitstream element, which is 2-bit long. Again, obviously the trade-off between many bits for medium energy and smaller bits for average energy resulted in a different number of bits than in comparison to other encoding modes, namely TCX and FD encoding mode.
[006] Thus, until now, globally adjusting the gain of a decoded representation of a stream of bits encoded by multimode encoding is uncomfortable and tends to decrease quality. Any decoding followed by gain adjustment and recoding is to be carried out, or the intensity level adjustment has to be carried out heuristically merely by adapting the respective bit stream element in the different modes influencing the gain of the respective different modes of coding portions of the streaming data. However, the latter possibility is very likely to introduce artifacts into the decoded representation of adjusted gain.
[007] Thus, it is an object of the present invention to provide a multimode audio codec allowing global gain adjustment without deviating from decoding and recoding in moderate penalties in terms of quality and compression rate, and a suitable CELP codec being incorporated into multimode audio coding as the realization of similar properties.
[008] This objective is achieved by the subjects of the independent claims attached in this way.
[009] According to a first aspect of the present invention, the inventors of the present application realized that the problem encountered when trying to harmonize the overall gain adjustment through the mixed coding modes stems from the fact that the mixed coding modes have different sizes of frames and are differently decomposed into subframes. According to the first aspect of the present application, this difficulty is overcome if the encoding bit stream element encoding differentially for the value of the overall gain so that a change in the value of the overall gain of the frame results in a one-level adjustment output of the decoded representation of the audio content. At the same time, the differential condition saves bits otherwise by occurring when introducing a new element of syntax into a stream of encoded bits. Furthermore, the differential condition allows the reduction of the global adjustment load of the gain of a coded bit stream allowing the time resolution in setting the global gain value to be less than the time resolution in which the bit stream element aforementioned differentially encoded for the global gain value adjusts the gain of the respective subframe.
[0010] Therefore, according to a first aspect of the present application, a multimode audio decoder to provide a decoder representation of an audio content based on an encoded bit stream is configured to decode an overall gain value per frame of the encoded bit stream, a first subset of the frames being encoded in a first encoding mode and a second subset of frames being encoded in a second encoding mode, with each frame of the second subset being composed of more than one subframe, decoding, per subframe of at least a subset of the subframes of the second subset of frames, a corresponding bit stream element differential for the overall gain value of the respective frames, and complete bit stream decoding using the global gain value and the corresponding bit stream and decoding the subframes of at least a subset of the subframes of the second s a set of frames and the value of the overall gain in decoding the first subset of frames, in which the multi-code audio decode is configured such that a change in the value of the overall gain of the frames within the result of the encoded bit stream in an adjustment an output level of the audio content decoder representation. A multimode audio encoder is in accordance with this first aspect, configured to encode audio content in a stream of encoded bits with an encoding of a first subset of subframes in a first encoding mode and a second subset of frames in the second mode encoding, when the second subset of frames is composed of one or more subframes, when the multimode audio encoder is configured to determine and encode an overall gain value per frame, and to determine and encode, the subframes of at least a subset of the subframes of the second subset, a corresponding bit stream element differential to the overall gain value of the respective frames, where the multimode audio encoder is configured so that a change in the overall gain value of the frames within the result of the bit stream encoded in an output level adjustment of a decoded representation of the audio content on the dec side odification.
[0011] According to a second aspect of the present application, the inventors of the present application have found that a control of the overall gain through the CELP coded frames and transformation of the coded frames can be achieved by maintaining the advantages outlined above, if the gain of the excitation of the CELP codec codebook is co-controlled along with a transformation level or reverse transformation of the encoded frame transformation. Certainly, such co-uses can be made through differential coding.
[0012] Therefore, a multimode audio decoder to provide a decoded representation of audio content based on an encoded bit stream, a first subset of frames that is encoded CELP and a second subset of frames that are encoded transformations , comprises, according to the second aspect, a CELP decoder configured to decode a current frame of the first subset, of the CELP decoder comprising an excitation generator configured to generate a current excitation of a current frame of the first subset by constructing an excitation of the codebook, based on a past excitation and codebook index of the current frame of the first subset within the coded bit stream, and the definition of a codebook excitation gain based on the overall gain value within the encoded bit stream; and a linear forecast synthesis filter configured to filter the current excitation based on a coefficient of the linear forecast filters for the current frame of the first subset within the encoded bit stream, and a transformation decoder configured to decode a current frame of the second subset through spectral construction information for the current frame of the second subset from the encoded bit stream and formation of a spectral-to-domain transformation of the spectral transformation to obtain a time domain signal so that a level of the time domain signal depends on the value of the overall gain.
[0013] Also, a multimode audio encoder for encoding an audio content in a stream encoded by CELP encoding of a first subset of frames of the audio content and transforming the encoding of a second subset of frames comprises, according to the second aspect, a CELP encoder configured to encode a current frame of the first subset, the CELP encoder comprising a linear prediction analyzer configured to generate coefficients of the linear prediction filters for the current frame of the first subset and encode it in the encoded bit stream , and an excitation generator configured to determine a current excitation of the current frame of the first subset which, when filtered by a linear forecast synthesis filter based on the coefficients of the linear forecast filters within the coded bit stream, retrieves a current frame from the first subset, by building the excitation of the codebook c based on a past excitation and a codebook index for the current frame of the first subset, and a coded transformation configured to encode a current frame of the second subset by performing a transformation in the time-to-domain spectral transformation for a time domain signal for the current frame for the second subset to obtain spectral information and encode the spectral information in the encoded bit stream, where the multimode audio encoder is configured to encode a global gain value in the encoded bit stream, the overall gain value depending on an energy of a version of the audio content of the current frame of the first filtered subset with a linear forecast analyzer filter depending on the coefficient of linear forecast, or a signal energy of the time domain.
[0014] In accordance with a third aspect of the present application, the present inventors have found that the variation of the intensity of a flow remains CELP encoded in the changes of the respective value of the global gain is better adapted to the behavior of the transformations of the encoded level adjustment , if the value of the global gain in the CELP condition is computed and applied in the weighted domain of the excitation signal, instead of the simple directly excitation signal. In addition, computation and apparatus of the value of the global gain in the weighted domain of the excitation signal is also an advantage when considering the CELP encoding mode exclusively as other gains in CELP such as encoding gain and LTP gain, are computed in the weighted domain, too.
[0015] Therefore, according to the third aspect, a CELP decoder comprises an excitation generator configured to generate a current excitation for a current frame of a bit stream by building an adaptive excitation code book based on an past excitation and an adaptable codebook index for the current frame within the bit stream, building a codebook excitation innovation based on a codebook index innovation for the current frame within the bit stream bits, computing an estimate of an excitation innovation energy from the spectrally weighted codebook through a weighted synthesis filter of the linear prediction built from the coefficient of linear prediction within the bit stream, defining an excitation innovation gain of the codebook based on a rate between a gain of the value within the bit stream of the estimated energy, and combining the excitation codebook adap and the innovation of the excitation of the codebook to obtain a current excitation; and a linear forecast synthesis filter configured to filter the current excitation based on the linear forecast filter coefficient.
[0016] Also, a CELP encoder comprises, according to the third aspect, a linear forecast analyzer configured to generate coefficients of the linear forecast filters for a current frame of an audio content and encode the coefficient of the linear forecast filter in a bit stream; an excitation generator configured to determine a current excitation of the current frame as a combination of an adaptive excitation codebook and a codebook excitation innovation that, when filtered through a linear forecast synthesis filter based on the coefficient of the linear forecast filter, retrieve the current frames, by building the adaptive excitation code book defined by a past excitation and an adaptable code book index for the current frame and the coding of the adaptive code book index in the flow bit, and the construction of the codebook excitation innovation defined by a codebook index innovation for the current frame and coding the codebook index innovation in the bit stream; and a determining energy configured to determine an energy from a version of an audio content of the current filtered frame with a linear forecast synthesis filter depending on the coefficients of the linear forecast filters and a perceptual weighting filter to obtain a gain value and a coding of the gain value in the bit stream, the weighting filter built from the filter coefficient of the linear forecast. Brief Description of Drawings
[0017] Preferred modalities of the present application are the subjects of the dependent claims attached in this way. Furthermore, preferred modalities of the present application are described below with respect to the figures, among which:
[0018] Figure 1 shows a block diagram of a multimode audio encoder according to a modality;
[0019] Figure 2 shows a block diagram of the energy computation part of the encoder of figure 1 according to a first alternative;
[0020] Figure 3 shows a block diagram of the energy computation part of the encoder of figure 1 according to a second alternative;
[0021] Figure 4 shows a multimode audio decoder according to a modality and adapted to decode continuous streams encoded by the encoder of figure 1;
[0022] Figures 5a and 5b show a multimode audio encoder and a multimode audio decoder according to the additional embodiment of the present invention;
[0023] Figures 6a and 6b show a multimode audio encoder and a multimode audio decoder according to an additional embodiment of the present invention; and.
[0024] Figures 7a and 7b show a CELP encoder and a CELP decoder according to an additional embodiment of the present invention.
[0025] Figure 1 shows a modality of a multimode audio encoder according to a modality of the present application. The multimode audio encoder of Figure 1 is suitable for encoding audio signals of a mixed type such as a mixture of expression and music, or the like. In order to obtain an optimal distortion rate / compromise, the multimode audio encoder is configured to switch between different encoding modes in order to adapt the encoding properties to the current needs of the audio content to be encoded. In particular, according to the modality of figure 1, the multimode audio encoder generally uses three different encoding modes, namely FD (frequency domain) encoding and LP (linear prediction) encoding, which in turn is divided into a TCX (coded excitation transformations) and CELP coding (codebook linear excitation prediction). In the FD coding mode, the audio content will be windowed, spectrally decomposed, and a spectral decomposition is quantified and scaled according to psychoacoustics in order to hide a quantification noise below the threshold masking. In TCX and CELP encoding modes, the audio content is subject to linear prediction analysis in order to obtain linear prediction coefficients, and these linear prediction coefficients are transmitted within the bit stream along with an excitation signal that, when filtered as a synthesis filter of the corresponding linear forecast using the coefficient of the linear forecast within the production of the continuous flow of the decoded representation of the audio content. In the case of TCX, the excitation signal is encoded transformation, while in the case of CELP, the excitation signal is encoded through the action index entries within a codebook or otherwise synthetically constructing a codebook vector of samples to be filtered. In ACELP (linear prediction of algebraic code book excitation), which is used in accordance with the present modality, excitation is composed of an adaptive excitation code book and a code book excitation innovation. As it should be outlined in more detail below, in the TCX, the coefficient of the linear prediction can be explored on the decoder side also directly on the frequency of a domain to shape the quantification noise, deducting the escalation factors. In this case, the TCX is defined to transform the original signal and apply the result of the LPC only in the frequency domain.
[0026] Despite different encoding modes, the encoder in figure 1 generates the bit stream so that a certain element of syntax associated with all frames of the encoded bit stream - with instances being associated with the individual frames or in groups of frames-, allows an adaptation of the global gain through all the coding modes by, for example, increasing or decreasing these global values by the same amount as by the same number of digits (which is equivalent to a scale with a factor (or divisor) ) of the times of the logarithmic base of the number of digits).
[0027] Particularly, according to the different encoding modes supported by the multimode audio encoder 10 of figure 1, it comprises an FD encoder 12 and an LPC encoder (linear prediction encoding) 14. The LPC encoder 14, in turn Instead, it consists of a TCX coding part 16, and a CELP coding part 18, and a switch coding mode 20. An additional switch coding mode composed of encoder 10 is quite general illustrated in 22 as the relay mode. The cedent mode is configured to analyze the audio content 24 to be encoded in order to associate parts of the consecutive time of the same for different encoding modes. Particularly, in the case of figure 1, the cedant mode 22 allocates different parts of the audio content 24 consecutive time to any of the FD encoding modes and LPC encoding modes. In the illustrative example of figure 1, for example, cedor mode 22 has the assigned part 26 of the audio content 24 for the FD encoding mode, while the immediately following part 28 is assigned for the LPC encoding mode. Depending on the encoding mode assigned by the assignor mode 22, the audio content 24 can be subdivided differently into consecutive frames. For example, in the embodiment of figure 1, the audio content 24 within part 26 is encoded in frames 30 of equal length and with an overlap of each other by, for example, 50%. In other words, the FD encoder 12 is configured to encode FD of part 26 of the audio content 24 in these units 30. According to the embodiment of figure 1, the LPC encoder 14 is also configured to encode its associated part 28 of the content of audio 24 in frame units 32 with these frames, however, not necessarily having the same size as frames 30. In the case of figure 1, for example, the size of frames 32 is smaller than the size of frames 30. Particularly, according with a specific modality, the frame length 30 is 2048 samples of the audio content 24, while the frame length 32 is 1024 samples each. It may be possible that the last frame overlaps the first frame on an edge between the LPC encoding mode and the FD encoding mode. However, in the embodiment of figure 1, and as shown exemplarily in figure 1, it may also be possible that there is no overlapping frame in the case of transitions from the FD encoding mode to the LPC encoding mode, and vice versa.
[0028] As indicated in figure 1, the encoder FD 12 receives frames 30 and then encodes through the frequency domain transformation of the encoding in the respective frames 34 of the encoded bit stream 36. For this purpose, the encoder FD 12 comprises a windower 38 , a transformer 40, a quantization and scale module 42, and a lossless encoder 44, as well as a psychoacoustic controller 46. In principle, the FD encoder 12 can be implemented according to the AAC standard until the following description does not teach different behavior of the FD 12 encoder. In particular, windower 38, transformer 40, quantization and scale module 42 and lossless encoder 44, are connected serially between an input 48 and an output 50 of the FD encoder 12 and psychoacoustic controller 46 has an input connected to input 48 and an output connected to an additional quantization input and scale module 42. It should be noted that the encoder FD 12 can further comprise modules for additional coding options that are, however, not critical here.
[0029] Windower 38 can use different windows for windows of the current frame by inserting input 48. The window frame is subject to a spectral time-to-domain transformation in transformer 40, such as using an MDCT or the like. Transformer 40 can use different length transformations in order to transform the window frames.
[0030] In particular, windower 38 can support windows of the length of which match the length of the frames 30 with the transformer 40 using the same length as the transformation in order to produce a number of transformation coefficients that can, for example, in the case of MDCT, correspond to half the number of samples in frame 30. Windower 38 can, however, also be configured to support encoding options according to which several short windows such as eight half frame length windows 30 are relatively compensated each other at the moment, are applied to a current frame with transformer 40 transformation of these window versions of the current frame using a transformation length in accordance with the windows, thus generating eight spectra for which the frame sampling of the audio content in different moments during these pictures. The windows used by windower 38 can be symmetrical or asymmetric and can have a zero front end and / or zero rear end. In the case of several short application windows for a current picture, the non-zero part of these short windows is relatively offset to each other, however, overlapping each other. Certainly, other encoding options for the windows and transformation length for windower 38 and transformer 40 can be used according to an alternative modality.
[0031] The coefficients of the output transformation through transformer 40 are quantified and scaled in module 42. Particularly, psychoacoustic controller 46 analyzes the input signal at input 48 in order to determine a masking threshold 48 according to which quantification noise introduced by quantification and scaling is formed to be below threshold masking. In particular, the scale module 42 can operate in bands of scale factors together covering the spectral domain of transformer 40 in which the spectral domain is subdivided. Therefore, groups of consecutive transformation coefficients are assigned to different bands of scale factors. Module 42 determines a scale factor per scale factor band, which, when multiplied by the respective value of the transformation coefficient assigned to the respective scale factor bands, produces the reconstructed version of the transformation output coefficients by transformer 40. In addition, this, module 42 defines a spectral value of the gain spectral evenly scaled. A reconstructed transformation coefficient is thus equal to the moments of the value of the transformation coefficient of the moments of the scale factor associated with the value of the gi gain of the respective frames. Transformation coefficient values, scaling factors and gain value are subject to lossless coding in the lossless coder 44, such as through entropy coding such as arithmetic coding or Huffman coding, along with other relative syntax elements, for example, the window and length of the transformation decisions mentioned above and additional syntax elements allowing for additional encoding options. For further details in this regard, reference is made to the AAC standard regarding additional encoding options.
[0032] To be slightly more precise, quantization and scale module 42 can be configured to transmit a value of the quantified transformation coefficient per spectral line k, which produces, when scaled, the reconstructed transformation coefficient in the respective spectral line k, namely scale, when multiplied with. gain = 20.25 '(sf -f - offset)
[0033] where sf is the scaling factor the respective scale-factor band to which the respective quantized transformation coefficient belongs, and sf_compensated is a constant that can be set, for example, to 100.
[0034] Thus, the escalation of the factors is defined in the logarithm domain. The scaling factors can be encoded within the bit stream 36 differently in all others along the spectral access, that is, merely the difference between scaling the neighboring factors spectral sf can be transmitted within the continuous data stream. The first scaling factor sf can be transmitted within the bit stream differentially relatively encoded to the aforementioned global gain value. These overall gain syntax elements should be of interest in the following description.
[0035] The global gain value can be transmitted within the bit stream in the logarithmic domain. That is, module 42 can be configured to take a first scale factor sf of a current spectrum, such as the global gain. This sf value can then be transmitted differently with a zero and the following sf values differ from the respective predecessor.
[0036] Obviously, changes in the global gain changes to the energy of the reconstructed transformation, and thus translates into an intense change in the coded part FD 26, when uniformly conducted in all frames 30.
[0037] In particular, the global gain of the FD frames is transmitted within the bit stream so that the global gain logarithmically depends on the medium of executing samples of reconstructed audio, or, vice versa, the medium of executing samples of moments of the exponentially reconstructed audio depends on an overall gain.
[0038] Similar to frames 30, all frames assigned to the LPC encoding mode, namely frames 32, enter the LPC encoder 14. Within the LPC encoder 14, switch 20 subdivides each frame 32 into one or more subframes 52. Each of these subframes 52 can be assigned to the TCX encoding mode or CELP encoding mode. Subframes 52 assigned to the TCX encoding mode are routed to an input 54 of the TCX encoder 16, while the subframes associated with the CELP encoding mode are routed by switch 20 to an input 56 of the CELP encoder 18.
[0039] It should be noted that the arrangement of switch 20 between input 58 of LPC encoder 14 and inputs 54 and 56 of encoder TCX 16 and encoder CELP 18, respectively, is shown in figure 1 for illustration purposes only and that , in fact, the coding decision with respect to the subdivision of frames 32 in subframes 52 associating with respective coding modes between TCX and CELP to the individual subframes can be made in an interactive way between the internal elements of the TCX 16 encoder and encoder CELP 18 in order to maximize a certain weight / distortion measure.
[0040] In any case, the TCX encoder 16 comprises an excitation generator 60, an analyzer LP 62 and a determining energy 64, where the analyzer LP 62 and the determining energy 64 are co-used (and co-owned) by the CELP encoder 18 which further comprises an excitation generator 66 itself. The respective inputs of the excitation generator 60, the analyzer LP 62 and the determining energy 64 are connected to input 54 of the TCX encoder 16. Also, the respective inputs of the analyzer LP 62, energy determinator 64 and excitation generator 66 are connected to input 56 of the CELP 18 encoder. The LP 62 analyzer is configured to analyze the audio content within the current frames, ie TCX frame or CELP frames, in order to determine the coefficient of the linear prediction, and are connected to the respective inputs of the coefficient of the excitation generator 60, determining energy 64 and excitation generator 66 in order to transmit the coefficient of the linear forecast to these el menus. As should be described in more detail below, the LP analyzer can operate on a pre-emphasized version of the original audio content, and the respective pre-emphasis filter can be part of a respective part of the LP analyzer input, or can be connected in front of the entrance. The same applies to determining energy 66 as it will be described in more detail below. Until the excitation generator 60 is related, however, it can operate on the original signal directly. Respective outputs of the excitation generator 60, analyzer LP 62 determining energy 64, and excitation generator 66, as well as output 50, are connected to the respective inputs of a multiplexer 68 of encoder 10 that is configured to multiplex the syntax elements received in the bit stream 36 at output 70.
[0041] As already noted above, LPC 62 analyzer is configured to determine the linear prediction coefficient for the LPC s 32 frame input. For further details regarding the possible functionality of the LP 62 analyzer, reference is made to the ACELP standard. Generally, the LP 62 analyzer can use a self-correlation or covariance method to determine the LPC coefficients. For example, using an auto correlation method, analyzer LP 62 can produce an auto correlation matrix with the LPC solution coefficient using a Levinson-Durban algorithm. As known in the art, the LPC coefficients define a synthesis filter that models around the human vocal tract, and when driven by an excitation signal, essential models of flow and air through the vocal cords. This synthesis filter is modeled using linear prediction through the LP 62 analyzer. The rate at which the change in the shape of the vocal tracks is limited, and therefore the LP 62 analyzer can use a refresh rate adapted to the limitation and different to from frames - frame rate 32 for updates of the linear forecast coefficient. The LP analyzer performed using analyzer 62 provides information about certain filters for elements 60, 64 and 66, such as: • the linear forecast synthesis filter H (z); • the inverse filter of the same, namely the analyzer filter of the linear forecast or whitening filter A (z) with H (z) = A® '• a perceptual weighting filter such as
where λ is a weighting factor
[0042] analyzer LP 62 transmits information about the LPC coefficients to multiplexer 68 for being inserted in the bit stream 36. This information 72 can represent the coefficient of the linear forecast quantified in an appropriate domain such as a spectral pair domain, or the like. Even the quantification of the coefficient of the linear forecast can be performed in this domain. In addition, the LPC analyzer 62 can transmit the LPC coefficients or information 72 thereof, at a rate greater than a rate at which the LPC coefficient is actually reconstructed on the decoding side. The latest update rate is obtained, for example, by interpolation between the moments of LPC transmission. Obviously, the decoder only has access to the quantized coefficients LPC, and therefore the filters mentioned above defined by the corresponding reconstructed linear predictions are denoted by H (z), Â (z) and W (z).
[0043] As already outlined above, the LP 62 analyzer defines an LP synthesis filter H (z) and H (z), respectively, which, when applied to the respective excitation, recover or reconstruct the original audio content, in addition, some post-processing, which, however, is not considered here for ease of explanation.
[0044] Excitation generators 60 and 66 are for defining this excitation and transmitting in it respective information to the decoding side through multiplexers 68 and bit stream 36, respectively. Until excitation generator 60 of the TCX 16 encoder is listed, same codes as the current excitation by subjecting an appropriate excitation found, for example, through some scheme optimization for a spectral time-to-domain transformation in order to produce a version excitation spectral, where this spectral version of spectral information 74 is forwarded to multiplexer 68 for insertion into bit stream 36, with spectral information being quantified and scaled, for example, analogously to the spectrum where module 42 of the FD encoder 12 operates.
[0045] That is, spectral information 74 defining the excitation of the TCX encoder 16 of the current subframe 52, can have associated transformation coefficients quantified in this way, which are scaled according to a single scale factor which, in turn, is transmitted in relation to an LPC frame of the syntax elements also called global gain in what follows. As in the case of global gain of encoder FD 12, global gain of encoder LPC 14 can also be defined in the logarithmic domain. An increase in this value directly translates into an increase in the intensity of the decoded representation of the audio content of the respective TCX subframes as the decoded representation is achieved by processing the transformation coefficients scaled within the information 74 by linear operations preserving the gain adjustment. These linear operations are the inverse frequency moment transform and, eventually, the LP filtering synthesis. As will be explained in more detail below, however, excitation generator 60 is configured to encode the just mentioned gain of spectral information 74 in the bit stream at a higher time resolution than in the LPC frame unit s. In particular, excitation generator 60 uses a syntax element called the global gain delta in order to differentially encode - differently to the global gain bit stream element - the current gain used to define the excitation spectrum gain. Global delta gain can also be defined in the logarithm domain. The differential condition can be realized so that the global delta gain can be defined as a multiplicative correction of the global gain in the linear domain.
[0046] In contrast to excitation generator 60, excitation generator 66 of the CELP 18 encoder is configured to encode the current excitation of the current subframe using codebook indexes. In particular, excitation generator 66 is configured to determine the current excitation through a combination of an adaptive excitation code book and a codebook excitation innovation. Excitation generator 66 is configured to build the excitation code book adaptable to a current frame in order to be defined by a past excitation, ie the excitation used to a previously encoded CELP subframe, for example, and an index of the excitation book. codes adaptable to current frameworks. The excitation generator 66 encodes the adaptive codebook index 76 in the bit stream forwarding it to multiplexer 68. In addition, the excitation generator 66 constructs the excitation innovation of the codebook defined by a novel index of the codebook. codes for the current frame and encodes the code book invocation index78 in the bit stream by forwarding it to multiplexer 68 for insertion into bit stream 36. In fact, both indexes can be integrated into a common syntax element. Together, it even allows the decoder to recover the excitation of the codebook so determined by the excitation generator. In order to guarantee the synchronization of the internal situation of the encoder and decoder, generator 66 not only determines the syntax elements to allow the decoder to recover the excitement of the current codebook, the same bit also actually updates its situation, generating the same In order to use the excitation of the current codebook as a starting point, this is the past excitation, for coding the next CELP frames.
[0047] Excitation generator 66 can be configured to, in the construction of the adaptive excitation code book and the excitation innovation of the code book, minimizes a perceptual weight distortion measure, in relation to the audio content of the current subframe whereas the resulting excitation is subject to LP filtering synthesis on the decoding side for reconstruction. In effect, indices 76 and 78 certain index tables available in the encoder 10 as well as the decoding side in order to index or otherwise determine vectors that serve as an excitation input of the LP synthesis filter. Conversely, for adaptive codebook excitation, the codebook excitation innovation is determined regardless of past excitation. In reality, excitation generator 66 can be configured to determine the excitation code book adaptable to the current frame using the past and reconstructing the excitation of the previously encoded CELP subframe by modifying the latter using a certain return and gain value and a predetermined filtering (interpolation), so that the adaptive excitation of the codebook resulting from the current framework minimizes a difference from a certain target to the adaptive excitation recovery codebook, when filtered by the synthesis filter, the original audio content. The just mentioned return and gain and filtering is indicated by the adaptive codebook index. The remaining discrepancy is compensated for by the excitation innovation of the codebook. Again, excitation generator 66 appropriately sets the codebook index to find an ideal codebook excitation innovation that, when combined with (such as added to), the excitation codebook adaptable to the current excitation for the frame current (with then serving as the excitation passed when building the adaptive excitation code book of the CELP subframe next). In other words, the adaptive codebook search can be performed on a subframe basis and consists of conducting a closed loop frequency search, then computing the adaptive code vector by interpolating the excitation passed in the frequency lag fractional selected. In reality, the excitation signal u (n) is defined by the excitation generator 66 as a weighted sum of the adaptive codebook vector v (n) and the innovation vector of the codebook c (n) by

[0048] The frequency of the gp gain is defined by the adaptive codebook index 76. The innovation of the gc codebook is determined by the innovative codebook index78 and by the syntax elements of the aforementioned global gain for LPC table s determined by the determining energy 64 as it should be outlined below.
[0049] That is, when optimizing the innovation of the codebook index78, the excitation generator 66 adopts, and remains unchanged, the innovation of the gain code book gc with merely optimizing the innovation of the codebook index for determine pulse positions and signals from the innovation of the codebook vector, as well as the number of these pulses.
[0050] A first method (or alternative) for defining the aforementioned LPC table of the global gain syntax elements by determining energy 64 with respect to figure 2 below. According to both alternatives described below, the global gain syntax elements are determined for each LPC 32 frame. These syntax elements then serve as a reference for the above mentioned delta global gain syntax elements of the TCX subframe belonging to the respective table 32, as well as the innovation of the aforementioned gc gain code book which is determined by the overall gain as described below.
[0051] As shown in figure 2, determining energy 64 can be configured to determine the syntax elements of the global gain 80, and can comprise a linear forecast analyzer filter 82 controlled by the LP analyzer 62, a computer energy 84 and a quantification and encoding stage 86, as well as decoding stage 88 for requantification. As shown in figure 2, a pre-emphasizer or pre-emphasis 90 filter can pre-emphasize the original audio content 24 before the last one is further processed within the determining energy 64 as described below. Although not shown in figure 1, a pre-emphasis filter can also be present in the block diagram of figure 1 directly in front of both, the inputs of the analyzer LP 62 and the determining energy 64. In other words, they can even be co-owned or co-owned. - used by both. The pre-emphasis filter 90 can be given by

[0052] Thus, the pre-emphasis filter can be a high pitch filter. Here, it is a first order of the high pass filter, but more generally, it can even be an nth-order-high pass filter. In the present case, it is exemplary a first order high pitch filter, with α defined at 0.68.
[0053] The determining energy input 64 of figure 2 is connected to the output of the pre-emphasis filter 90. Between the input and output 80 of determining energy 64, the analyzer LP filter 82, the power of the computer 84, and the quantification and coding stage 86 are connected serially in the order mentioned. Encoding stage 88 has its input connected to the quantizing output and encoding stage 86 and quantized gain outputs as obtained by the decoder.
[0054] In particular, the analyzer filter of the linear forecast 82 A (z) applied to the pre-emphasized audio content results in an excitation signal 92. Thus, the excitation 92 equal to the pre-emphasized version of the original audio content 24 filtered by the analysis filter LPC A (z), that is the original audio content 24 filtered with

[0055] Based on this excitation signal 92, the common global gain for the current frame 32 is deducted by computing the energy over all 1024 samples of this excitation signal 92 within the current frame 32.
[0056] In particular, computer energy 84 averages signal energy 92 per segment of 64 samples in the logarithmic domain by:

[0057] The gain g is then quantified by quantification and encoding stage 86 in 6 bits in the logarithmic domain based on average energy nrg by:

[0058] This index is then transmitted within the bit stream as elements of syntax 80, ie as global gain. It is defined in the logarithmic domain. In other words, quantification increases the step size exponentially. The quantified gain is obtained by the decoding stage 88 computing:

[0059] The quantification used here has the same granularity as the quantification of the overall gain of the FD mode, and therefore scaling the index scales to the intensity of the LPC 32 frame in the same way as scaling the syntax elements of the overall frame gain FD 30, thereby achieving an easy way of controlling the gain of the encoded multimode bit stream 36 with no need to carry out decoding and recoding deviation, while still maintaining quality.
[0060] As it should be outlined in more detail below with respect to the decoder, due to the synchronization of the aforementioned maintenance between encoder and decoder (excitation update), the excitation generator 66 can, in optimization or after having optimized the codebook indices, a) compute, based on the global gain, a prediction of the gc 'gain and b) multiply the predicted gain g' with the correction factor f codebook innovation to produce the current innovation of the codebook gc gain codes c) actually generate the excitation of the code book by combining the adaptive excitation code book and the coding excitation innovation with weighting of the latter with the current innovation of the gc gain code book.
[0061] Particularly, in accordance with the present alternative, encoding stage quantization 86 transmits index within the bit stream and the excitation generator 66 accepts the quantized gain g as a predefined fixed reference to optimize the excitation innovation of the codebook .
[0062] Particularly, excitation generator 66 optimizes the innovation of the gain code book gc using (ie with optimizing) only the innovation index of the code book which also define f which is the innovation of the factor gain code book correction. Particularly, the innovation of the gain factor correction code book determines the innovation of the gain code book gc to be

[0063] As should be further described below, the TCX gain is encoded to transmit the element of the global delta gain encoded in 5 bits:

[0064] It is decoded as follows:

[0065] So

[0066] gain CELP subframes and TCX subframe are respected, according to the first alternative described with respect to figure 2, the overall index gain is thus encoded in 6 bits per frame or superframe 32. This results in the same granularity of gain as for encoding the overall gain of the FD mode. In this case, the overall frame of the global index gain is encoded only in 6 bits, although the global gain in FD mode is sent in 8 bits. Thus, the element of global gain is not the same for LPD (linear domain prediction) and FD modes. However, as the granularity of the gain is similar, a unified gain control can easily be applied. In particular, the logarithmic domain for encoding the global gain in the FD and LPD mode is advantageously performed on the same logarithmic basis 2.
[0067] In order to completely harmonize both global elements it would be simple to extend the encoding in 8 bits even until the LPD frames are respected. Until the CELP subframes are respected, the elements of the syntax of the index completely assume the task of controlling the gain. The elements of the global gain within the aforementioned TCX subframe can be encoded into 5 differentiated bits from the global gain superframe. Compared to the case where the above multimode coding scheme would be implemented by normal AAC, ACELP and TCX, the above concept according to that of alternative figure 2, would result in 2 smaller bits for coding in the case of a superframe 32 consisting merely of TCX subframes 20 and / or ACELP, and would consume 2 or 4 additional bits per superframe in the case of the respective superframe comprising a TCX 40 and TCX 80 subframe, respectively.
[0068] In terms of signal processing, the overall frame of the overall index gain represents the mean residual energy LPC over superframe 32 and quantified on a logarithmic scale. In (A) CELP, it is used instead of the "average energy" element generally used in ACELP to estimate the innovation of the won codebook. The new estimate according to the present first alternative according to figure 2 has more amplitude resolution than in the ACELP standard, but also lower time resolution as a gypsum is merely transmitted by superframes, instead of subframes. However, it was found that residual energy is a weak estimator and used as a cause indicator of the gain range. As a consequence, time resolution is probably more important. To avoid any problems during transients, excitation generator 66 can be configured to systematically underestimate the innovator of the gain code book and let the gain adjustment recover the gap. This strategy can counterbalance the lack of resolution time.
[0069] In addition, the global gain superframe is also used in the TCX as an estimate of the "global gain" element determining the scaling gain as mentioned above. Because the overall frame of the global index gain represents the residual LPC energy and the global TCX represents over the weighted signal energy, the differential encoding gain through the use of the global delta gain implicitly includes some LP gains. However, the differential gain still shows less breadth than the "global gain" plan.
[0070] For mono 12 kbps and 24 kbps, some perceptual tests were performed focusing mainly on the quality of the clean expression. The quality was found very close to one of the current USACs different from the above modality in which normal gain control of the AAC and ACELP / TCX standards has been used. However, for certain items of the expression, the quality tends to be slightly worse.
[0071] After having described the modality of figure 1 according to the alternative of figure 2, the second alternative is described with respect to figures 1 and 3. According to the second method for the LPD mode, some disadvantages of the first alternative are solved: • ACELP innovation gain prediction failed by some high-amplitude dynamic frames subframes. It was mainly due to the computation of energy that was geometrically average. Although, the average SNR was better than the original ACELP, the gain adjustment of the codebook was more often saturated. It was to be the main reason for the perceived slight degradation for certain items of the expression. • Furthermore, the prediction of the gain from the ACELP innovation was also not ideal. In effect, the gain is optimized in the weighted domain while the profit forecast is computed in the residual LPC domain The idea of the following alternative is to carry out the forecast in the weighted domain. • The prediction of the individual global TCX gain was not ideal as the transmitted energy was computed for the residual LPC while TCX computes its gain in the weighted domain.
[0072] The main difference from the previous scheme is that the overall gain now represents the weighted signal energy instead of the excitation energy.
[0073] In terms of bit flow, the modifications compared to the first method, are as follows: • An overall gain encoded in 8 bits with the same quantifier as in the FD mode. Both LPD and FD modes now share the same bit stream element. It was found that the global gain in AAC has good reason to be encoded in 8 bits with such quantifiers. 8 bits is definitely a lot for the overall gain of LPD mode, which can only be encoded in 6 bits. However, it is the price to pay for unification. • Encode the individual global TCX gain with a differential encoding, using: o 1 bit for TCX1024, fixed length codes. o 4 bits on average for TCX256 and TCX 512, variable length codes (Huffman)
[0074] In terms of consumption, the second method differs from the first in which: • For ACELP: same bit consumption before • For TCX1024: +2 bits • For TCX512: +2 bits on average • For TCX256: same average bit consumption as before
[0075] In terms of quality, the second method differs from the first in that: • Portions of TCX audio should sound like the overall quantification of granularity kept unchanged. • Portions of ACELP audio could be expected to be slightly improved as the forecast was highlighted. Collected statistics show lower internal values in the gain adjustment than in the current ACELP.
[0076] See, for example, figure 3. Figure 3 shows the excitation generator 66 as comprising a weighting filter W (z) 100, followed by a power from the computer 102 and a quantization and coding stage 104, as well as a decoding stage 106. In reality, these elements are arranged with respect in all others like elements 82 and 88 where in figure 2.
[0077] The weighting filter is defined as:

[0078] where λ is a perceptual weighting factor that can be set to 0.92.
[0079] Thus, according to the second method, the common global gain for TCX and CELP 52 subframes is deducted from a calculation energy performed every 2024 samples in the weighted signal, that is, in the unit of the LPC frame 32. The weighted signal is computed in the encoder within filter 100 by filtering the original signal 24 by the weighting filter W (z) deduced from the LPC coefficients as output by the analyzer LP 62. By the way, the aforementioned pre-emphasis is not part of W (z). It is only used before computing the LPC coefficients, that is inside or in front of the LP 62 analyzer, and before the ACELP, that is inside or in front of the excitation generator 66. In a way the pre-emphasis is already reflected in the coefficient of A (z).
[0080] Computer 102 energy then determines the energy to be:

[0081] Quantification and encoding stage 104 then quantifies the global gain in 8 bits in the logarithmic domain based on the average energy nrg by:

[0082] The quantified global gain is then obtained by decoding stage 106 by:

[0083] As it should be outlined in more detail below with respect to the decoder, because of the synchronization of the aforementioned maintenance between encoder and decoder (excitation update), the excitation generator 66 can, optimizing or having optimized after the indexes of the codebook, a) estimate the innovation of the excitation energy of the codebook as determined by a first information contained within the - provisional candidate or finally transmitted - codebook innovation index, namely the above mentioned number, positions and signals from the pulse code book innovation vector, with filtering of the respective code book vector innovation with the LP synthesis filter, weighted meanwhile, with the W (z) weighting filter and the emphasis filter, ie inverse of the emphasis filter, (H2 filter (z), see below), and determining the energy of the result, b) from a rate between the energy thus derived and an energy E = 20.log (g) determined nothing by the global gain in order to obtain an expected gain g 'c) multiply the predicted gain g' with the innovation of the codebook correction factor f to produce the current innovation of the codebook gain gc d) actually generate the excitement of the codebook combining the adaptive excitation codebook and the codebook excitation innovation with weighting the latter with the current gc gain codebook innovation.
[0084] In particular, the quantification thus achieved has the same granularity as the quantification of the overall gain of the FD mode. Again, excitation generator 66 can adopt, and treat as constant, the quantified global gain g by optimizing the excitation innovation of the codebook. In particular, the excitation generator 66 can define the excitation innovation of the correction factor codebook by finding the ideal innovation of the codebook index so that the ideal fixed quantified of the earned codebook results, namely according to:

[0085] with obedience:

[0086] where cw is the innovation is the innovation vector c [n] in the weighted domain obtained by a winding of n = 0 to 63 according to: cw [n] =

[0087] where h2 is the impulse response of the weighted synthesis filter

[0088] with y = 0.92 and α = 0.68, for example.
[0089] The TCX gain is encoded by transmitting the global delta gain element encoded with the Variable Length Code.
[0090] If the TCX has a size of 1024, only 1 bit is used for the element of the global gain delta, while the global gain is recalculated and requantified:

[0091] It is decoded as follows:

[0092] Otherwise, for the other TCX sizes, the global delta gain is coded as follows:

[0093] The TCX gain is then decoded as follows:

[0094] global delta gain can be directly encoded in 7 bits or using Huffman codes, which can produce 4 bits on average
[0095] Finally and in both cases the final gain is deducted:

[0096] In what follows, a corresponding multimode audio decoder corresponding to the modality of figure 1 with respect to the two alternatives described with respect to figures 2 and 3 is described with respect to figure 4.
[0097] The multimode audio decoder of figure 4 is generally indicated with reference signal 120 and comprises a demultiplexer 122, a decoder FD 124, and a decoder LPC 126 composed of a decoder TCX 128 and a decoder CELP 130, and an overlay / transition handler 132.
[0098] The demultiplexer comprises an input 134 concurrently forming the input of the multimode audio decoder 120. Bit stream 36 of figure 1 enters input 134. Demultiplexer 122 comprises several outputs connected to decoders 124, 128, and 130, and distributes elements of bitstream composite syntax 134 for the individual decoding machine. In reality, multiplexer 132 distributes frames 34 and 35 of bit stream 36 with the respective decoder 124, 128 and 130, respectively.
[0099] Each of decoders 124, 128 and 130 comprises a time domain output connected to a respective overlay / transaction handler entry 132. Overlay / transaction handler 132 is responsible for performing the respective overlay / transaction handling in transitions between consecutive frames. For example, overlay / transaction handler 132 can perform the relative consecutive window overlay / addition procedure of the FD frames. The same applies to TCX subframes. Although not described in detail with respect to figure 1, for example, even excitation generator 60 uses windows in a row by a spectral time-to-domain transformation in order to obtain the transformation coefficient to represent the excitation, and the windows can overlap each other. When transitioning to / from CELP subframes, overlay / transaction handler 132 can take special measures to avoid aliasing. For this purpose, the overlay / transaction handler 132 can be controlled by the respective syntax element transmitted through bit stream 36. However, as these transmission measures exceed the focus of the present application, reference is made to, for example, the ACELP W + standard for exemplary illustrative solutions in this regard.
[00100] The decoder FD 124 comprises a decoder without predes 134, the quantization and rescaling module 136, and a retransformer 138, which are connected serially between demultiplexer 122 and overlay / transaction manipulator 132 in this order. The pre-decoder 134 retrieves, for example, the scaling of factors from the bit stream that is, for example, differentially encoded therein. The quantization and rescaling module 136 recover the transformation coefficient by, for example, scaling to the value of the transformation coefficient for the individual spectral lines with the corresponding scaling of the factors of the scale factor bands so that these values of the transformation coefficient belong. Retransformer 138 performs a transformation from spectral-to-domain time to the thus obtained transformation coefficient as an inverse MDCT, in order to obtain a time domain signal to be forwarded to the overlay / transaction handler 132. Any decanting and modulating module rescales 136 or retransformer 138 uses the global gain syntax element transmitted within the bit stream for all FD frames, so that the time domain signal resulting from the transformation is scaled by the syntax elements (this is linearly scaled with some exponential functions in it). In reality, scaling can be performed before the transformation of spectral-to-domain time or subsequently there.
[00101] The TCX 128 decoder comprises an excitation generator 140, an old spectrum 142, and an LP coefficient converter 144. Excitation generator 140 and old spectrum 142 are connected serially between demultiplexer 122 and another input of the overlap / transaction handler 132, and LP 144 coefficient converter provides an additional input from the old spectrum 142 with spectrum of the weighting values obtained from the LPC coefficients transmitted through the bit stream. In particular, the TCX decoder 128 operates in the TCX subframe between subframes 52. Excitation generator 140 treats the input information spectrum similar to components 134 and 136 of the FD 124 decoder. That is, the excitation generator 140 decanting and scaling the value of the transformation coefficient transmitted within the bit stream in order to represent the excitation in the spectrum domain. The transformation coefficient thus obtained is scaled by the excitation generator 140 with a value corresponding to a sum of the global delta gain syntax elements transmitted to the current TCX 52 subframe and the global gain syntax elements transmitted to the current frame 32 for the to which the current TCX 52 subframe belongs. Thus, the excitation generator 140 of the outputs of a spectrum represents the excitation for the current subframe of the climb according to the global gain delta and global gain. LPC Converter 134 converts the LPC coefficients transmitted within the bit stream by means of, for example, interpolation and differential encoding, or the like, into spectrum weighting values, namely a spectrum weighting value by transforming the spectrum's coefficient. excitation output by excitation generator 140. In particular, the LP 144 coefficient converter determines this spectrum of the weighting values so that it resembles even a synthesis filter of the linear prediction of the transfer function. In other words, they resemble a transfer function of the LPH synthesis filter (z). Old spectrum 140 weights spectrally the transformation of input coefficients by the excitation generator 140 by the weight of the spectrum obtained by the coefficient converter LP 144 in order to obtain spectral weighted the transformation coefficient that are then subjected to a spectral-to-domain time transformation on the transformer 146 so that the transformer 146 comes out of a reconstructed version or decoded representation of the audio content of the current TCX subframe. However, it is noted that, as noted above, a post-processing can be performed on the output of the retransformer 146 by forwarding the time domain signal to the overlay / transaction handler 132. In any case, the signal level time domain output by retransformer 146 is again controlled by the global gain syntax elements of the respective LPC frames 32.
[00102] The CELP 130 decoder of figure 4 comprises an innovation of the constructor code book 148, an adaptable constructor code book 150, a gain adapter 152, a combiner 154, and an LP 156 synthesis filter. builder codes 148, gain adapter 152, combiner 154, and LP synthesis filter 156 are connected serially between demultiplexer 122 and overlay / transaction handler 132. Adaptive builder code book 150 has an input connected to demultiplexer 122 and an output connected to an additional input of combiner 154, which in turn can be incorporated as an adder as shown in figure 4. An additional input of adaptive code book builder 150 is connected to an output of adder 154 in order to obtain excitation his past. Gain adapter 152 and synthesis filter LP 156 having LPC inputs connected to a certain output of multiplexer 122.
[00103] After having described the structure of the TCX decoder and CELP decoder, its functionality is described in more detail below. The description starts with the functionality of the TCX 128 decoder first and then proceeds to describe the functionality of the CELP 130 decoder. As already described above, LPC 32 frames are subdivided into one or more subframes 52. Generally, CELP 52 subframes are restricted to have a length of 256 audio samples. TCX 52 subframe can have different lengths. subframes TCX 20 or TCX 256 52, for example, have a sample length of 256. Also, TCX 40 (TCX 512) subframes 52 have one of 512 audio samples, and TCX 80 (TCX 1024) subframes belong to a length of 1024 sample, that is, they belong to the LPC 32 frame set. TCX 40 subframes can merely be positioned with the two main quarters of the current LPC 32 frame, or the two rear quarters thereof. Thus, completely, there are 26 different combinations of different subframe types in which an LPC frame 32 can be subdivided.
[00104] Thus, as just mentioned, TCX 52 subframes are of different lengths. Considering the sample of the lengths just described, namely 256, 512, and 1024, it can be thought that these TCX subframe do not overlap each other. However, this is not correct until the length of the window and the length of the transformation are measured in samples is related, and is used in order to perform the decomposition of the excitation spectrum. The length of the transformations used by windower 38 extends, for example, further ahead to the direction and rear end of each current TCX subframe and the corresponding window used for windows to the excitation is adapted to readily extend in regions beyond the rear ends and direction of the respective current TCX subframe, in order to understand non-zero portions of the previous overlay and successive subframes of the current subframe to allow for serrated cancellation as it is known from FD coding, for example. Thus, the excitation generator 140 receives quantified spectral coefficients from the bit stream and reconstructs the excitation spectrum from it. This spectrum is scaled depending on a combination of the global delta gain of the current TCX subframe and the global frame of the current frame 32 to which the current subframe belongs. In particular, the combination may involve a multiplication between both values in the linear domain (corresponding to a sum in the logarithm domain), in which both gains of the syntax elements are defined. Therefore, the excitation spectrum is thus scaled according to the syntax elements of the overall gain. Former spectrum 142 then performs an LPC based on the frequency noise domain to shape the resulting spectral coefficients followed by the MDCT inverse transformation performed by retransformer 146 to obtain the domain synthesis signal. The overlay / transaction handler 132 can perform an overlay add process between consecutive TCX s subframe.
[00105] The CELP 130 decoder acts on the aforementioned CELP subframes that have, as noted above, a length of 256 audio samples each. As noted above, the CELP 130 decoder is configured to build the current excitation as a combination or addition of adaptive codebook scaling and codebook vector innovation. The adaptive codebook builder 150 uses the adaptive codebook index that is retrieved from the bit stream through demultiplexer 122 to find an interaction of the fractional part of a frequency gap. The adaptive codebook builder 150 can then find an initial adaptive excitation codebook of vector v '(n) by interpolating the past excitation u (n) at the return and phase frequency, ie fraction, using an interpolation filter FIR. The adaptive excitation code book is computed for the size of 64 samples. Depending on a syntax element called the adaptive filter index retrieved by the continuous data stream, the adaptive codebook builder can decide whether the filtered adaptable codebook is v (n) = v '(n) or v (n) = 0.18 v '(n) +0.64 v' (n-1) + 0.18 v '(n-2).
[00106] The innovation of the codebook builder 148 uses the innovation of the codebook index retrieved from the bit stream to extract positions and amplitudes, that is, signals, of pulse excitation within an algebraic vector of codes, this is the vector of code innovation c (n). This is,

[00107] Where mi and si are the pulses of positions and signs and M is the number of pulses. Once the algebraic code vector c (n) is decoded, the frequency sharpening procedure is performed. First c (n) is filtered through a pre-emphasis filter defined as follows:

[00108] The pre-emphasis filter has the role of reducing the excitation energy at low frequencies. Of course, the pre-emphasis filter can be defined in another way. Then, a periodicity can be carried out by the innovative code book builder 148. This periodicity of improvement can be carried out by means of an adaptive pre-filter with a transfer function defined as:

[00109] where n is the current position in units of immediately consecutive groups of 64 audio samples, and where T is a rounded version of the entire part T0 and fractional part T0, fraction of the fundamental frequency loss as determined by:

[00110] The adaptive pre-filter Fp (z) of colors of the spectrum by inter-harmonic damping of the frequencies, which are irritating to the human ear in the case of voice signals.
[00111] The received innovation and the adaptive codebook index within the bitstream directly provides the adaptive codebook gain gp and the codebook innovation gain correction factor f. The gain of the codebook gain is then computed by multiplying the gain of the correction factor f by an estimated number of innovations of the gain codebook f '. This is done by the gain adapter 152.
[00112] According to the first alternative mentioned above, the gain adapter 152 performs the following steps:
[00113] First, Eque is transmitted through the transmitted global gain and represents the average excitation energy per superframe 32, serves as an estimated gain G 'in db, that is
[00114] The innovative average excitation energy in a superframe 32, E, is thus encoded with 6 bits per superframe through the global gain, and E is derived from the global gain through its quantified version g by: E = 20 .log (g)
[00115] The predicted gain in the linear domain is then derived by the gain adapter 152 by:
[00116] The quantified fixed codebook gain is then computed through the gain adapter 152 by gc = 1 ■ g '.
[00117] As described, the gain adapter 152 then scales the excitation innovation of the codebook with gc, while adaptive codebook builder 150 scales the adaptive excitation codebook with gp, and a weighted sum of both excitations of the code book is formed in the combiner 154.
[00118] According to the second alternative of the alternatives outlined above, the estimated fixed -gain code book g is formed by the gain adapter 152 as follows:
[00119] First, the average innovation of energy is found. The average innovation of energy Ei represents the innovation energy in the weighted domain. Convoluting calculates the innovation encoded with the impulse response h2 of the following heavy synthesis filter:

[00120] Innovation in the weighted domain is then obtained by winding n = 0 to 63: cw [n] = c [n] * h2 [n]
[00121] The energy is then:

[00122] Then, the estimated gain G 'in db is found by

[00123] where, again, E is transmitted through the transmitted global gain and represents the average excitation energy per superframe 32 in the weighted domain. The average energy in a superframe 32, E, is thus encoded with 8 bits per superframe through the global gain, and E is derived from the global gain through its quantized version g by:

[00124] The predicted gain in the linear domain is then derived by the gain adapter 152 by:

[00125] The quantified fixed-codebook gain is then derived by the gain adapter 152 by

[00126] The above description does not go into details until the determination of the TCX gain of the excitation spectrum according to the above outlined two alternatives is related. The TCX gain, by which the spectrum is scaled, is - as already described above - encoded by the transmission of the global delta gain element encoded in 5 bits on the encoding side according to:

[00127] It is decoded by the excitation generator 140, for example, as follows:

[00128] with g denotes the quantified version of the global gain of gn according to g = 2 4, with, in turn, the global gain presented within the bit stream for the LPC frame 32 to which the current TCX frame belongs.
[00129] Then, excitation generator 140 scales the excitation spectrum by multiplying each transformation coefficient with g with

[00130] According to the second method presented above, the TCX gain is encoded by the transmission of the global delta gain element encoded with Variable Length Code, for example. If the TCX subframe currently under consideration has a size of 1024, only 1-bit can be used by the delta global gain element, while global gain can be recalculated and re-calculated on the coding side, according to:

[00131] Excitation generator 140 then derives from TCX gain by gindex

[00132] So computing

[00133] However, for the other TCX sizes, the global delta gain can be computed by the excitation generator 140 as follows:

[00134] The TCX gain is then decoded by the excitation generator 140 as follows:
so computing

[00135] In order to obtain the gain through which the excitation generator 140 scales each transformation coefficient.
[00136] For example, global delta gain can be directly encoded in 7-bits or using Huffman codes that can produce 4- bits on average. Thus, according to the above modality, it is possible to encode the audio content using multiple modes. In the above modality, three encoding modes have been used, namely FD, TCX and ACELP. Despite using the three different modes it is easy to adjust the intensity of the respective decoded representations to that of the audio content encoded in bit stream 36. Particularly, according to both methods as described above, it is merely necessary to equally increase / decrease the elements of global gain syntax contained in each of frames 30 and 32, respectively. For example, all these syntax elements of the overall gain can be increased by 2 in order to increase the intensity equally through the different coding modes, or decrease by 2 in order to decrease the intensity equally through the different modes of coding parts.
[00137] After having described a modality of the present application, in the following, other modalities are described that are more generic and individually concentrate on aspects of the individual advantage of the multimode audio encoder and decoder described above. In other words, the modality described above represents a possible implementation for each of the subsequently outlined three modalities. The above modality incorporates all the advantages of the aspects to which the below-outlined merely individual modalities refer.
[00138] Each of the subsequently described modalities focuses on one aspect of the above - explained multimode audio codec which is advantageous further to the specified deployment used in the previous modality, that is to say that it can implement differently than before. The aspects to which the modalities outlined below belong, can be carried out individually and do not have to be implemented concurrently as illustratively described with respect to the modality outlined above.
[00139] Therefore, when describing the modalities below, the elements of the respective encoder and decoder modalities are indicated by the use of new reference signals. However, behind these reference signs, reference numbers of elements in figures 1 to 4 are presented in parentheses, with the last element representing the possible implementation of the respective element within the subsequently described figures. In other words, the elements in the figures described below, can be implemented as described above with respect to the elements indicated in parentheses behind the respective numeral reference of the element within the figures described below, individually or with respect to all elements of the respective figures described below .
[00140] Figures 5a and 5b show a multimode audio encoder and a multimode audio encoder according to a first embodiment. The multimode audio encoder of Figure 5a generally indicated at 300 is configured to encode audio content 302 into a bit stream encoder 304 with encoding a first subset of frames 306 in a first encoding mode 308 and a second subset of frames 310 in a second encoding mode 312, wherein the second subset of frames 310 is respectively composed of one or more subframes 314, where the multimode audio encoder 300 is configured to determine and encode a global gain value (global gain) by frames, and determine and encode, by subframe of at least one subset 316 of the subframes of the second subset, a corresponding bit stream element (delta global gain) differentially for the global gain value 318 of the respective frames, where the encoder of multimode audio 300 is configured so that a change in the global gain value (global gain) of the frames within the encoded bit stream s 304 results in an adjustment of an output level of a decoded representation of the audio content on the decoding side.
[00141] The corresponding multimode audio decoder 320 is shown in figure 5b. Decoder 320 is configured to provide a decoded representation 322 of the audio content 302 based on an encoded bit stream 304. For this purpose, the multimode audio decoder 320 decodes a global gain value (global gain) per frame 324 and 326 of the encoded bit stream 304, a first subset 324 of the frames being encoded in a first encoding mode and a second subset 326 of the frames being encoded in a second encoding mode, with each frame 326 of the second subset being composed of more than one subframe 328 and decode, by subframe 328 of at least a subset of subframes 328 of the second subset 326 of frames, a corresponding bit stream element (global delta gain) differentially for the global gain value of the respective frames, and the complete encoding of the bit stream using the global gain value (global gain) and the corresponding bit stream element (global delta gain) ed ecoding the subframes of at least one subset of subframes of the second subset 326 of frames and the value of the overall gain (global gain) in decoding the first subset of frames, in which the multimode audio decoder 320 is configured so that a change in global gain value (global gain) of frames 324 and 326 within the encoded bit stream 304 results in an adjustment 330 of an output level 332 of the decoded representation 322 of the audio content.
[00142] As was the case with the modalities of figures 1 to 4, the first coding mode can be a frequency domain coding mode, while the second coding mode is a linear prediction coding mode. However, the modality of figures 5a and 5b are not restricted to the present case. However, linear prevention coding modes tend to require less time in the granularity until the overall gain control is related, and therefore, using a linear prediction encoding mode for 326 frames and a frequency domain encoding mode for frames 324 are to be preferred over the contrary, according to which frequency domain encoding mode was used for frames 326 and a linear prediction encoding mode for frames 324.
[00143] Furthermore, the modality of figures 5a and 5b are not restricted to the case where the TCX and ACLEP modes exist for coding subframes 314. Furthermore, the modality of figures 1 to 4 can, for example, also be implemented according to the embodiment of figures 5a and 5b, if a CELP coding mode is absent. In this case, the differential condition of both elements, namely global gain and global delta gain, would allow an account for the greater sensitivity of the TCX coding mode against variations and the gain of fixation, however, avoiding giving the advantages provided for an overall gain control without the diversion of decoding and recoding, and without an undue increase in information needed aside.
However, the multimode audio decoder 320 can be configured to, at the end of decoding the encoded bit stream 304, decode the subframes of at least one subset of the subframes of the second subset 326 of frames using the linear encoding prediction transformation (namely the four subframes of the left frame 326 in figure 5b), and decode an unbound subset of the subframes of the second subset 326 of the frames using CELP. In this regard, the multimode audio decoder 220 can be configured to decode, per frame of the second subset of the frames, an additional bit stream element revealing a decomposition of the respective frame into one or more subframes. In the aforementioned modality, for example, each LPC frame can have a syntax element contained in it, which identifies one of the twenty-six current decomposition possibilities mentioned above in the LPC frame in TCX and ACELP frames. However, again, the modality of figures 5a and 5b are not restricted to ACELP, and the two specific alternatives described above with respect to the average energy definition according to the syntax elements of the overall gain.
[00145] Similarly to the mode above figures 1 to 4, table 326 can correspond to table 310 having, tables 326 or can have a sample length of 1024 samples, and at least a subset of the subframes of the second subset of frames for the which the bitstream element of the global delta gain is transmitted, the sample length selected from the group consisting of 256, 512, and 1024 samples may vary, and not linked to the subset of the subframes may have a sample length of 256 samples each. Tables 324 of the first subset may have the same sample length as each other. As described above. The multimode audio decoder 320 can be configured to decode the global gain value in 8-bits and the bit stream element into the variable number of bits, the number depending on a sample length of the respective subframes. Also, the multimode audio decoder can be configured to decode the overall gain value into 6-bits and to decode the bitstream elements into 5-bits. It should be noted that here are different possibilities for differentially encoding the elements of global delta gain.
[00146] As if according to the case with the mode above figures 1 to 4, the global elements gain can be defined in the logarithmic domain, namely linear with the intensity of the audio sample. The same applies to the global delta gain. In order to encode global delta gain, the multimode audio encoder 300 may subject a linear gain rate to the respective subframes 316, such as the above mentioned gain_TCX (such as the first differentially encoded factor of the scale), and the gain global quantified of the corresponding table 310, ie the linearized (applied to an exponential function) version of the global gain, for a logarithm such as the logarithm for base 2, in order to obtain the elements of global delta gain syntax in the logarithm domain. As is known in the art, the same result can be obtained by performing a subtraction in the logarithm domain. Therefore, the multimode audio decoder 320 can be configured first, retransmission of the syntax elements global gain delta and global gain by an exponential function for the linear domain in order to multiply the results in the linear domain in order to obtain the gain with which the multimode audio decoder has to scale the current subframes such as the encoded TCX of the excitation and the transformation coefficient spectrum thereof, as described above. As is known in the art, the same result can be obtained by adding both elements of syntax in the domain of the logarithm before the transition to the linear domain.
[00147] In addition, as described above, the multimode audio codec of figures 5a and 5b can be configured so that the global gain value is encoded in the fixed number of, for example, eight bits and the bit stream element in a variable number of bits, the number depending on a sample length of the respective subframe. Alternatively, the global gain value can be encoded in a fixed number of, for example, six bits and the bit stream element in, for example, five bits.
[00148] Thus, the modalities of figures 5a and 5b used in the differential encoding advantages of the syntax elements of the subframes gain in order to count for the different needs of mixed encoding modes until the moment and the bit of granularity in the control of the gain is related, in order to, on the one hand, avoid unwanted quality deficiencies and yet to achieve the advantages involved with the overall gain control, namely avoiding the need to decode and recode in order to scale the intensity.
[00149] Next, with respect to figures 6a and 6b, another modality for the multimode audio codec and the corresponding encoder and decoder is described. Figure 6a shows a multimode audio encoder 400 configured to encode and audio content 402 in a stream of encoded bits 404 by CELP encoding a first subset of frames of audio content 402 denoted 406 in figure 6a, and transformation of the encoding of a second subset of the tables denoted 408 in figure 6a. The multimode audio encoder 400 comprises a CELP 410 encoder and a transformation encoder 412. The CELP 410 encoder, in turn, comprises an LP analyzer 414 and an excitation generator 416. The CELP encoder is configured to encode a current frame of the first subset. For this purpose, the LP analyzer 414 generates LPC filter coefficient 418 for the current frame and encodes it in the encoded bit stream 404. The excitation generator 416 determines a current excitation of the current frame of the first subset, which when filtered by a synthesis of linear prediction based on the coefficients of linear prediction filters 418 within the encoded bit stream 404, retrieving a current frame from the first subset, defined by a past excitation 420 and a codebook index for the current frame from the first subset and encoding the index of codebook422 in the encoded bit stream 404. Transformation encoder 412 is configured to encode a current frame of the second subset 408 by performing a spectral time-to-domain transformation into a time domain signal to the current frame to obtain information spectrum and encode information spectrum 424 in the encoded bit stream 404. The data encoder multimode audio 400 is configured to encode a global gain value 426 in the encoded bit stream 404, the global gain value 426 depending on an energy of a version of the audio content of the current frame of the first subset 406 filtered with an analyzer filter from linear forecast depending on the coefficient of the linear forecast, or a signal energy of the time domain. In the case of the modality above figures 1 to 4, for example, the transformation encoder 412 was implemented as a TCX encoder and the time domain signal went to the excitation of the respective frames. Also, the result of filtering the audio content 402 of the current frame of the first subset (CELP) filtered with the linear prediction analyzer filter - or the modified version of it in the form of the A (z / n) weighting filter - depending on the linear forecast coefficient 418, results in a representation of the excitation. The overall gain value 426 thus depends on both excitation energies of both frames.
[00150] However, the modality of figures 6a and 6b are not restricted to the encoding of the TCX transformation. It is conceivable that another transformation of the coding scheme, such as AAC, if mixed with the CELP encoding of the CELP 410 encoder.
[00151] Figure 6b shows the multimode audio decoder corresponding to the encoder in figure 6a. As shown therein, the decoder of figure 6b generally indicated at 430 is configured to provide a decoded representation 432 of an audio content based on an encoded bit stream 434, a first subset of frames of which is CELP encoded (indicated with " 1 "in figure 6b), and a second subset of frames of which it is encoded transformations (indicated with" 2 "in figure 6b). Decoder 430 comprises a CELP decoder 436 and a transformation decoder 438. The CELP decoder 436 comprises an excitation generator 440 and a linear forecast synthesis filter 442.
[00152] The CELP 440 decoder is configured to decode a current frame from the first subset. To this end, excitation generator 440 generates a current excitation 444 from the current frame by constructing a codebook excitation based on a past excitation 446, and a code book index 448 from the current frame of the first subset within the coded bit stream 434, and defining a codebook excitation gain based on an overall gain value 450 within the coded bit stream 434. The linear forecast synthesis filter is configured to filter the current excitation 444 with based on coefficients of the linear prediction filters 452 of the current frame within the encoded bit stream 434. The result of the filtering synthesis represents, or is used to obtain, the decoded representation432 in the corresponding frame for the current frame within bit stream 434. The transformation decoder 438 is configured to decode a current frame from the second subset of frames using the spectral information from construction 454 to the frame to be of the second subset from the encoded bit stream 434 and carrying out a spectral-to-domain time transformation in the spectral information to obtain a time domain signal so that a level of the time domain signal depends on the value of the global gain 450. As noted above, the spectral information can be the excitation spectrum in the case of a transform decoder being a TCX decoder, or the original audio content in the case of an FD encoding mode.
[00153] The excitation generator 440 can be configured to, generating a current excitation 444 of the current frame of the first subset, build an adaptive excitation code book based on a past excitation and an adaptive code book index of the current frame of the first subset within the coded bit stream, build a codebook excitation innovation based on a codebook index innovation for the current frame of the first subset within the coded bit stream, defined as the excitation gain of the codebook, a gain of the excitation innovation of the codebook based on the value of the overall gain within the coded bit stream, and combining the adaptive excitation codebook and the excitation innovation of the codebook to obtain the current excitation 444 of the current frame of the first subset. That is, an excitation generator 444 can be incorporated as described above with respect to Figure 4, but it does not necessarily have to do so.
[00154] Furthermore, the transformation decoder can be configured so that the information spectrum refers to a current excitation of the current frames, and the transformation decoder 438 can be configured to decode a current frame from the second subset , spectral of the current excitation of the current frame of the second subset according to a synthesis filter of the linear forecast transfer function defined by coefficients of the linear forecast filters for the current frame of the second subset within the coded bit stream 434, so that the realization of the transformation of the spectral-to-domain time in the information spectrum results in the representation of the decoder 432 of the audio content. In other words, transform decoder 438 can be incorporated as a TCX encoder, as described above with respect to Figure 4, but this is not mandatory.
[00155] The transformation decoder 438 can, in addition, be configured to perform the information spectrum by converting the coefficients of the linear forecast filters into a linear spectrum forecast and spectrum weighting of the current excitation information with the linear spectrum forecast. This has been described above with respect to 144. As also described above, transformation decoder 438 can be configured to scale the information spectrum with the global gain value 450. As such, transformation decoder 438 can be configured to construct the information spectrum for the current frame of the second subset by using only the transformation coefficient spectrum within the encoded bit stream, and scaling the factors within the encoded bit stream for scaling the transformation coefficient spectrum in the granularity spectrum and scale factor bands , with scaling the escalation of the factors based on the value of the global gain, as well as to obtain the 432 decoded representation of the audio content.
[00156] The modality of figures 6a and 6b highlight the advantageous aspects of the modality of figures 1 to 4, according to which it is the gain of the excitation of the code book according to which the gain adjustment of the coded part CELP is coupled with gain adjustment or ability to control part of the coded transformations.
[00157] The next modalities described with respect to figures 7a and 7b concentrate on the CELP codec portions described in the above mentioned modalities without requiring the existence of the other coding mode. Furthermore, the CELP coding concept, described with respect to figures 7a and 7b, focuses on the second alternative described with respect to figures 1 to 4 according to which the gain control ability of the CELP encoded data is realized by implementing the skill gain control in the weighted domain, in order to achieve a gain adjustment of the decoder reproduction with a possible fine granularity that is not possible in a conventional CELP. Furthermore, computing the gain in the weighted domain mentioned above can improve the audio quality.
[00158] Again, Figure 7a shows the encoder and Figure 7b shows the corresponding decoder. The CELP encoder of figure 7a comprises an LP analyzer 502, and an excitation generator 504, and a determining energy 506. The linear forecast analyzer is configured to generate linear forecast coefficient 508 for the current frame 510 of an audio content 512 and encode the coefficients of the linear prediction filters 508 into a bit stream 514. The excitation generator 504 is configured to determine the current excitation 516 of the current frame 510 as a combination 518 of an adaptive excitation code book 520 and an innovation of excitation of code book 522, which when filtered by a linear forecast synthesis filter based on the coefficients of linear forecast filters 508, retrieves a current frame 510, by building the adaptive excitation code book 520 by a past excitation 524 and an adaptable codebook index 526 for the current frame 510 and coding the adaptive codebook index 526 in bit stream 514, and construction a codebook excitation innovation defined by a codebook index innovation528 for the current frame 510 and coding the codebook index innovation in bit stream 514.
[00159] The determining energy 506 is configured to determine an energy of a version of the audio content 512 of the current frame 510, filtered by a weighting filter emitted from (derived from) the linear forecast analysis to obtain a value of gain 530, and encoding the value of gain 530 in bit stream 514, the weighting filter being built from the coefficient of linear prediction 508.
[00160] According to the description above, the excitation generator 504 can be configured to, in construction, the adaptive excitation code book 520 and the excitation innovation of the code book 522, minimizes a measure of the perceptual distortion in relation to the audio content 512. In addition, the linear prediction analyzer 502 can be configured to determine the coefficients of the linear prediction filters 508 by analyzing the linear prediction applied to a window and, according to the predetermined pre-emphasis filter, pre version -emphasized the audio content. The excitation generator 504 can be configured to, in the construction of the adaptive excitation codebook and the excitation innovation of the codebook, minimizes the weighted perceptual distortion measured in relation to the audio content using the perceptual weighting filter: W (z ) = A (z / Y), where Y is a perceptual weighting factor and A (z) is 1 / H (z), where H (z) is the linear synthesis filter forecast, and where the determining energy is configured to use the perceptual weighting filter as a weighting filter. In particular, minimization can be performed using a weighted perceptual of the distortion measured against the audio content using the perceptual weighting of the synthesis filter:

[00161] where Y is the perceptual weighting factor, A (z) is a quantified version of the linear forecast synthesis filter
a factor of high frequency emphasis, and in which the determining energy (506) is configured to use the perceptual weighting filter W (z) = A (z / y) as a weighting filter.
[00162] In addition, to cause synchronous maintenance between the encoder and the decoder, the excitation generator 504 can be configured to perform an excitation update, by a) estimating an excitation innovation of the energy codebook as determined by a first information contained within the codebook index innovation (as transmitted within the bit stream), such as the aforementioned number, positions and signals of the pulse codebook vector innovation, with filtering of the respective book vector innovation of codes with H2 (z), and determining the energy of the result, b) from a rate between the energy thus derived and an energy determined by the global gain in order to obtain an expected gain gc 'c) multiply the expected gain g 'with the correction factor code book innovation, this is the second information contained within the code book index innovation, to produce the current gain code book innovation. d) actually generate the excitation of the codebook - serving as the excitation passed to the next frame to be the CELP to be coded - by combining the adaptive excitation codebook and the excitation innovation of the codebook with the weighting of the last with the current innovation of codbook excitement.
[00163] Figure 7b shows the corresponding CELP decoder as having an excitation generator 450 and an LP 452 synthesis filter. Excitation generator 440 can be configured to generate a current excitation 542 for the current frame 544 by building a code book adaptive excitation 546 based on a past excitation 548 and an adaptable codebook index 550 for the current frame 544, within the bit stream, building a coding excitation innovation 552 based on a book index innovation of codes 554 for the current frame 544 within the continuous data stream, computing an estimate of an excitation innovation energy from the codebook spectrally weighted by a weighted synthesis filter of the linear forecast H2 constructed from the coefficients of the linear forecast filters 556 within the continuous data flow, defining a gain 558 of the innovation excitation of the codebook 552 based on a rate between a gain value 560 within the flu bit rate and estimated energy, and combining the adaptive excitation codebook and codebook excitation innovation to obtain the current excitation 542. The linear forecast synthesis filter 542 filters the current excitation 542 based on the coefficients of the linear prediction filters 556.
[00164] The excitation generator 440 can be configured to, in the construction, the adaptive excitation code book 546, the past excitation filter 548 with the filter depending on the index of the adaptive code book 546. In addition, the excitation generator 440 can be configured to, in the construction, the excitation innovation of codebook 554 so that the latter comprises a zero vector with a number of non-zero pulses, the number and positions of the non-zero pulses being indicated by the innovation of the index of the book of codes 554. The excitation generator 440 can be configured to compute the estimate of the excitation innovation energy of the codebook 554, and the excitation innovation filter of the codebook 554 with

[00165] where the linear forecast synthesis filter is configured to filter the current excitation 542 according to 1 A (z), where W (z) = A (z / y) and Y is a perceptual weighting factor, Hemph ~ 1 _ aze α is a factor of high frequency emphasis, in which the 440 excitation generator is further configured to compute a quadratic sum of the excitation filtered innovation samples from the codebook to obtain the energy estimate.
[00166] The excitation generator 540 can be configured to, in combining the adaptive excitation code book 556 and the excitation innovation of the code book 554, from a weighted sum of the adaptive excitation code book 556 weighted with a weighting factor depending on the adaptive codebook index 556, and the gain excitation innovation of codebook 554.
[00167] Additional considerations for LPD mode are outlined in the following list: • Quality improvements can be achieved by recycling the VQ gain in the ACELP to more accurately match the statistics of the new gain adjustment. • The overall encoding gain in AAC could be modified by • encoding it in 6/7 bits instead of 8 bits, as is done in TCX. It may work for current scan points, but it may be a limitation when the audio input has a resolution greater than 16 bits. • increase the resolution of the global unified gain to match the TCX quantification (this corresponds to the second approach described above): the way the scale factors are applied in AAC, it is not necessary to have such an accurate quantification. In addition, it will imply a series of modifications in the AAC structure and a higher consumption of bits for the scale factors. • The overall TCX gains can be quantified before the quantification of spectral coefficients: this is done in this way, in AAC and allows for the quantization of spectral coefficients to be the only source of error. This approach seems to be the most elegant way of doing it. However, global TCX encoded gains currently represent an energy, the amount of which is also useful in ACELP. This energy was used in the aforementioned gain unification control approaches as a bridge between the coding scheme for two coding gains.
[00168] The above modalities are transferable to embodiments where SBR is used. The SBR encoding of the energy envelope can be performed in such a way that the spectral band energies to be reproduced are transmitted / encoded in relation to / differently to the energy of the baseband energy, that is, the energy of the spectral band to the to which the above mentioned codec embodiments are applied.
[00169] In conventional SBR, the energy envelope is independent from the bandwidth energy of the core. The energy envelope of the extended band is then completely rebuilt. In other words, when the core bandwidth is the set level, it will not affect the extended band that will remain unchanged.
[00170] In the SBR, two coding schemes can be used to transmit the energies of different frequency bands. The first scheme consists of a differential condition in the direction of the moment. The energies of the different bands are differentially coded from the corresponding bands in the previous table. When using this coding system, the current frame energies will be automatically adjusted in case the previous frame energies have already been processed.
[00171] The second coding scheme is a delta encoding the energies in the frequency direction. The difference between the energy of the current band and the energy of the previous band in frequency is quantified and transmitted. Only the energy of the first band is absolutely encoded. The encoding of this first band energy can be modified and can be done in relation to the core bandwidth energy. In this way, the extended bandwidth is automatically level adjusted when the core bandwidth is modified.
[00172] Another method for encoding the SBR energy envelope can be used by changing the energy quantification step of the first band when using the encoding delta towards frequency in order to obtain the same granularity as for the common global gain element of the coding core. In this way, an adjustment of the total level could be achieved by modifying both the index of the common global gain of the core encoder and the index of the first band energy of SBR when delta coding towards frequency is used.
[00173] Thus, in other words, an SBR decoder can comprise any of the above decoders as a core decoder to decode part of the core-encoder of a bit stream. The SBR decoder can then decode the envelope energies for a spectral band to be replicated, from a part of SBR to the bit stream, determine a baseband signal energy and scale the envelope energies according to an energy baseband signal. In doing so, the spectral replicated band of the reconstructed representation of the audio content has an energy that inherently scales with the aforementioned global gain syntax elements.
[00174] Thus, according to the above modalities, the unification of the global gain for USAC can work as follows: there is currently no global gain of 7-bits for each TCX frame (length 256, 512 or 1024 samples), or correspondingly a 2-bit average energy value for each ACELP- frame (length 256 samples). There is no overall value for 1024 frames, in contrast to AAC frames. To unify this, a global value per 1024 frame with 8 bits could be introduced for the TCX / ACELP parts, and the corresponding values per TCX / ACELP frames can be differentially encoded for this global value. Due to this differential encoding, the number of bits for these individual differences can be reduced.
[00175] Although some aspects have been described in the context of an apparatus, it is evident that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a characteristic of a method step. Similarly, the aspects described in the context of a method step also represent a description of a corresponding block or item or characteristic of a corresponding device. Some or all of the steps in the method can be performed by (or using) a hardware device, such as a microprocessor, a programmable computer or an electronic circuit. In some embodiments, some one or more of the most important steps in the method can be performed by such an apparatus.
[00176] The inventive encoded audio signal can be stored in a digital storage medium or can be transmitted through a transmission medium, such as a wireless transmission medium or a wired transmission medium such as the Internet.
[00177] Depending on the requirements of determined implementation, modalities of the invention can be implemented in hardware or in software. The implementation can be carried out using a digital storage medium, for example, a floppy disk, a DVD, a Blu-Ray, a CD, a ROM, the PROM, an EPROM, an EEPROM or flash memory, having the control electronically readable signals stored in it, that cooperate (or are able to cooperate) with a programmable computer system in such a way that the respective method is carried out. Therefore, the digital storage medium can be computer readable.
[00178] Some embodiments according to the invention comprise a data carrier having electronic reading control signals, which are capable of cooperating with a programmable computer system, in such a way that one of the methods described here is performed.
[00179] Generally, modalities of the present invention can be implemented as a computer program product with a program code, the program code being operative to perform one of the methods, when the computer program product is executed on a computer. The program code can, for example, be stored on a machine-readable carrier.
[00180] Other modalities include the computer program for performing one of the methods described here, stored on a readable carrier machine.
[00181] In other words, one embodiment of the inventive method is, therefore, a computer program having a program code for carrying out one of the methods described here, when the computer program is executed on a computer.
[00182] Another embodiment of the methods of the invention is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) which comprises, recorded on it, the computer program for carrying out one of the methods described here. The data carrier, the digital storage medium or the medium are typically recorded as tangible and / or non-transitory.
[00183] An additional modality of the inventive method is, therefore, a data flow or a sequence of signals that represent the computer program for carrying out one of the methods described here. The data stream or signal sequence can, for example, be configured to be transferred over a data communication link, for example, over the Internet.
[00184] A modality further comprises a means of processing, for example, a computer, or a programmable logic device, configured for or adapted to perform one of the methods described herein.
[00185] Another modality comprises a computer having the computer program installed in it to perform one of the methods described here.
[00186] Another embodiment according to the invention comprises an apparatus or a system configured to transfer (for example, electronically or optically) a computer program for carrying out one of the methods described herein to a receiver. The receiver can, for example, be a computer, a mobile device, a memory device or the like. The apparatus or system may, for example, comprise a file server for transferring the computer program to the receiver.
[00187] In some embodiments, a programmable logic device (for example, a programmable gate matrix field) can be used to perform some or all of the functionality of the methods described here. In some embodiments, a programmable gate array field can cooperate with a microprocessor in order to perform one of the methods described here. Generally, the methods are preferably performed by any hardware device.
[00188] The modalities described above are merely illustrative of the principles of the present invention. It is understood that modifications and variations of the arrangements and the details described herein, will be evident to other experts in the art. It is therefore intended to be limited only by the scope of the pending patent claims and not by the specific details presented by way of description and explanation of the modalities described herein.

权利要求:
Claims (8)
[0001]
1. Multimode audio decoder (120; 320) to provide a decoded representation (322) of the audio content (24; 302) based on an encoded bit stream (36; 304), the multimode audio decoder (120; 320) characterized by the fact that it is configured to decode a value of the global gain per frame (324, 326) of the encoded bit stream (36; 304), in which a first subset (324) of the frames being encoded in a first mode encoding and a second subset (326) of the frames being encoded in a second encoding mode, with each frame of the second subset being composed of more than one subframe (328), decoding, by subframe of at least one subset of the subframes (328 ) of the second subset of frames, a bit stream element differentially corresponding to the overall gain value of the respective frames, and decoding completes the encoded bit stream (36; 304) using the global gain value and the flu element corresponding bit rate in decoding the subframes of at least a subset of the subframes (328) of the second subset of frames and the value of the overall gain in decoding the first subset of frames, in which the multimode audio decoder is configured so that a changing the overall gain value of the frames within the encoded bit stream (36; 304) results in an adjustment (330) of an output level (332) of the decoded representation (322) of the audio content (24; 302).
[0002]
2. Multimode audio decoder according to claim 1, characterized in that the first encoding mode is a domain frequency encoding mode, and the second encoding mode is a linear prediction encoding mode.
[0003]
3. Multimode audio decoder according to claim 2, characterized by the fact that the multimode audio decoder is configured to decode the subframes of at least one, when completing a decoding of the encoded bit stream (36; 304). subset of the subframes (328) of the second subset of frames (310) using decoding the linear prediction of the excitation transformation, and decoding an unbound subset of the subframes of the second subset of the frames by using the CELP.
[0004]
4. Multimode audio decoder according to any one of claims 1 to 3, characterized in that the multimode audio decoder is configured to decode, by frame of the second subset (326) of the frames, an additional element of bit stream revealing a decomposition of the respective frame in one or more subframes.
[0005]
5. Multimode audio decoder according to any one of the preceding claims, characterized in that the frames of the second subset are of equal lengths, and at least a subset of the subframes (328) of the second subset of frames has a length variation of the sample selected from the group consisting of 256, 512 and 1024 samples, and an unbound subset of the subframes (328) has a sample length of 256 samples.
[0006]
6. Multimode audio decoder according to any one of the preceding claims, characterized in that the multimode audio decoder is configured to decode the global gain value into a fixed number of bits and the bitstream element over a number bit variable, the number depending on a sample length of the respective subframe.
[0007]
7. Multimode audio decoder according to any one of claims 1 to 5, characterized in that the multimode audio decoder is configured to decode the global gain value in the fixed number of bits and to decode the bit stream element in the fixed number of bits.
[0008]
8. Multimode audio decoder to provide a decoded representation (432) of an audio content based on an encoded bit stream (434), a first subset of frames from which it is CELP encoded and a second subset of frames from which it is encoded transformations, the multimode audio decoder characterized by the fact that it comprises: a CELP decoder (436) configured to decode a current frame of the first subset, the CELP decoder comprising: an excitation generator (440) configured to generate a current excitation ( 444) of the current frame of the first subset by building a codebook excitation based on a past excitation (446) and a code book index (448) of the current frame of the first subset within the encoded bit stream, and definition of a gain in the excitation of the codebook based on an overall gain value (450) within the encoded bit stream (434); and a linear prediction synthesis filter (442) configured to filter the current excitation (444) based on coefficients of the linear prediction filters (452) for the current frame of the first subset within the encoded bit stream; a transformation decoder (438) configured to decode a current frame from the second subset by

类似技术:

公开号 | 公开日 | 专利标题

BR112012009490B1|2020-12-01|multimode audio decoder and multimode audio decoding method to provide a decoded representation of audio content based on an encoded bit stream and multimode audio encoder for encoding audio content into an encoded bit stream

US10388287B2|2019-08-20|Audio encoder for encoding a multichannel signal and audio decoder for decoding an encoded audio signal

US7490036B2|2009-02-10|Adaptive equalizer for a coded speech signal

ES2683077T3|2018-09-24|Audio encoder and decoder for encoding and decoding frames of a sampled audio signal

JP2013178539A|2013-09-09|Scalable speech and audio encoding using combinatorial encoding of mdct spectrum

BR122019023924B1|2021-06-01|ENCODER SYSTEM, DECODER SYSTEM, METHOD TO ENCODE A STEREO SIGNAL TO A BITS FLOW SIGNAL AND METHOD TO DECODE A BITS FLOW SIGNAL TO A STEREO SIGNAL

BRPI0612987A2|2010-12-14|hierarchical coding / decoding device

PT2146344T|2016-10-13|Audio encoding/decoding scheme having a switchable bypass

PT2301023T|2016-07-11|Low bitrate audio encoding/decoding scheme having cascaded switches

BRPI0409970B1|2018-07-24|“Method for encoding a sampled sound signal, method for decoding a bit stream representative of a sampled sound signal, encoder, decoder and bit stream”

BR112013020699B1|2021-08-17|APPARATUS AND METHOD FOR ENCODING AND DECODING AN AUDIO SIGNAL USING AN EARLY ALIGNED PART

BR112013020587B1|2021-03-09|coding scheme based on linear prediction using spectral domain noise modeling

Yoon et al.2001|An efficient transcoding algorithm for G. 723.1 and G. 729A speech coders

BR112012009447B1|2021-10-13|AUDIO SIGNAL ENCODER, STNAI, AUDIO DECODER, METHOD FOR ENCODING OR DECODING AN AUDIO SIGNAL USING AN ALIASING CANCEL

BR112013020589B1|2021-09-21|AUDIO CODEC TO SUPPORT TIME DOMAIN AND FREQUENCY DOMAIN ENCODING MODES

同族专利:

公开号 | 公开日

US20160260438A1|2016-09-08|

MY167980A|2018-10-09|

MY164399A|2017-12-15|

BR112012009490A2|2016-05-03|

AU2010309894B2|2014-03-13|

ES2453098T3|2014-04-04|

SG10201406778VA|2015-01-29|

JP6214160B2|2017-10-18|

CA2862712A1|2011-04-28|

HK1175293A1|2013-06-28|

JP6173288B2|2017-08-02|

US20120253797A1|2012-10-04|

MX2012004593A|2012-06-08|

JP2015043096A|2015-03-05|

CA2778240C|2016-09-06|

US8744843B2|2014-06-03|

EP2491555B1|2014-03-05|

KR101508819B1|2015-04-07|

CA2862715A1|2011-04-28|

TW201131554A|2011-09-16|

RU2586841C2|2016-06-10|

CA2862712C|2017-10-17|

CN104021795A|2014-09-03|

CN102859589B|2014-07-09|

US9495972B2|2016-11-15|

US9715883B2|2017-07-25|

CN104021795B|2017-06-09|

EP2491555A1|2012-08-29|

CN102859589A|2013-01-02|

TWI455114B|2014-10-01|

RU2012118788A|2013-11-10|

AU2010309894A1|2012-05-24|

CA2778240A1|2011-04-28|

KR20120082435A|2012-07-23|

JP2013508761A|2013-03-07|

CA2862715C|2017-10-17|

WO2011048094A1|2011-04-28|

US20140343953A1|2014-11-20|

ZA201203570B|2013-05-29|

PL2491555T3|2014-08-29|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

IL95753A|1989-10-17|1994-11-11|Motorola Inc|Digital speech coder|

US5495555A|1992-06-01|1996-02-27|Hughes Aircraft Company|High quality low bit rate celp-based speech codec|

IT1257065B|1992-07-31|1996-01-05|Sip|LOW DELAY CODER FOR AUDIO SIGNALS, USING SYNTHESIS ANALYSIS TECHNIQUES.|

IT1257431B|1992-12-04|1996-01-16|Sip|PROCEDURE AND DEVICE FOR THE QUANTIZATION OF EXCIT EARNINGS IN VOICE CODERS BASED ON SUMMARY ANALYSIS TECHNIQUES|

WO1995013660A1|1993-11-09|1995-05-18|Sony Corporation|Quantization apparatus, quantization method, high efficiency encoder, high efficiency encoding method, decoder, high efficiency encoder and recording media|

JP3317470B2|1995-03-28|2002-08-26|日本電信電話株式会社|Audio signal encoding method and audio signal decoding method|

EP0880235A1|1996-02-08|1998-11-25|Matsushita Electric Industrial Co., Ltd.|Wide band audio signal encoder, wide band audio signal decoder, wide band audio signal encoder/decoder and wide band audio signal recording medium|

US6134518A|1997-03-04|2000-10-17|International Business Machines Corporation|Digital audio signal coding using a CELP coder and a transform coder|

EP0932141B1|1998-01-22|2005-08-24|Deutsche Telekom AG|Method for signal controlled switching between different audio coding schemes|

JP3802219B2|1998-02-18|2006-07-26|富士通株式会社|Speech encoding device|

US6480822B2|1998-08-24|2002-11-12|Conexant Systems, Inc.|Low complexity random codebook structure|

US6385573B1|1998-08-24|2002-05-07|Conexant Systems, Inc.|Adaptive tilt compensation for synthesized speech residual|

US6260010B1|1998-08-24|2001-07-10|Conexant Systems, Inc.|Speech encoder using gain normalization that combines open and closed loop gains|

US7272556B1|1998-09-23|2007-09-18|Lucent Technologies Inc.|Scalable and embedded codec for speech and audio signals|

DE60017825T2|1999-03-23|2006-01-12|Nippon Telegraph And Telephone Corp.|Method and device for coding and decoding audio signals and record carriers with programs therefor|

US6604070B1|1999-09-22|2003-08-05|Conexant Systems, Inc.|System of encoding and decoding speech signals|

US6782360B1|1999-09-22|2004-08-24|Mindspeed Technologies, Inc.|Gain quantization for a CELP speech coder|

CN1432176A|2000-04-24|2003-07-23|高通股份有限公司|Method and appts. for predictively quantizing voice speech|

FI110729B|2001-04-11|2003-03-14|Nokia Corp|Procedure for unpacking packed audio signal|

US6963842B2|2001-09-05|2005-11-08|Creative Technology Ltd.|Efficient system and method for converting between different transform-domain signal representations|

US7043423B2|2002-07-16|2006-05-09|Dolby Laboratories Licensing Corporation|Low bit-rate audio coding systems and methods that use expanding quantizers with arithmetic coding|

JP2004281998A|2003-01-23|2004-10-07|Seiko Epson Corp|Transistor, its manufacturing method, electro-optical device, semiconductor device and electronic apparatus|

US20040181411A1|2003-03-15|2004-09-16|Mindspeed Technologies, Inc.|Voicing index controls for CELP speech coding|

BRPI0409970B1|2003-05-01|2018-07-24|Nokia Technologies Oy|“Method for encoding a sampled sound signal, method for decoding a bit stream representative of a sampled sound signal, encoder, decoder and bit stream”|

CA2457988A1|2004-02-18|2005-08-18|Voiceage Corporation|Methods and devices for audio compression based on acelp/tcx coding and multi-rate lattice vector quantization|

US8155965B2|2005-03-11|2012-04-10|Qualcomm Incorporated|Time warping frames inside the vocoder by modifying the residual|

KR100923156B1|2006-05-02|2009-10-23|한국전자통신연구원|System and Method for Encoding and Decoding for multi-channel audio|

US20080002771A1|2006-06-30|2008-01-03|Nokia Corporation|Video segment motion categorization|

US8532984B2|2006-07-31|2013-09-10|Qualcomm Incorporated|Systems, methods, and apparatus for wideband encoding and decoding of active frames|

EP2051244A4|2006-08-08|2010-04-14|Panasonic Corp|Audio encoding device and audio encoding method|

WO2009125588A1|2008-04-09|2009-10-15|パナソニック株式会社|Encoding device and encoding method|PL2311034T3|2008-07-11|2016-04-29|Fraunhofer Ges Forschung|Audio encoder and decoder for encoding frames of sampled audio signals|

MX2011000375A|2008-07-11|2011-05-19|Fraunhofer Ges Forschung|Audio encoder and decoder for encoding and decoding frames of sampled audio signal.|

EP2144230A1|2008-07-11|2010-01-13|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Low bitrate audio encoding/decoding scheme having cascaded switches|

TW201214415A|2010-05-28|2012-04-01|Fraunhofer Ges Forschung|Low-delay unified speech and audio codec|

KR101826331B1|2010-09-15|2018-03-22|삼성전자주식회사|Apparatus and method for encoding and decoding for high frequency bandwidth extension|

WO2012091464A1|2010-12-29|2012-07-05|삼성전자 주식회사|Apparatus and method for encoding/decoding for high-frequency bandwidth extension|

CN103477387B|2011-02-14|2015-11-25|弗兰霍菲尔运输应用研究公司|Use the encoding scheme based on linear prediction of spectrum domain noise shaping|

KR101698905B1|2011-02-14|2017-01-23|프라운호퍼 게젤샤프트 쭈르 푀르데룽 데어 안겐반텐 포르슝에. 베.|Apparatus and method for encoding and decoding an audio signal using an aligned look-ahead portion|

CA2903681C|2011-02-14|2017-03-28|Fraunhofer-Gesellschaft Zur Forderung Der Angewandten Forschung E.V.|Audio codec using noise synthesis during inactive phases|

PL2550653T3|2011-02-14|2014-09-30|Fraunhofer Ges Forschung|Information signal representation using lapped transform|

EP2676262B1|2011-02-14|2018-04-25|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Noise generation in audio codecs|

MY159444A|2011-02-14|2017-01-13|Fraunhofer-Gesellschaft Zur Forderung Der Angewandten Forschung E V|Encoding and decoding of pulse positions of tracks of an audio signal|

MY167853A|2011-02-14|2018-09-26|Fraunhofer Ges Forschung|Apparatus and method for error concealment in low-delay unified speech and audio coding |

ES2529025T3|2011-02-14|2015-02-16|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for processing a decoded audio signal in a spectral domain|

ES2623291T3|2011-02-14|2017-07-10|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Encoding a portion of an audio signal using transient detection and quality result|

AR085361A1|2011-02-14|2013-09-25|Fraunhofer Ges Forschung|CODING AND DECODING POSITIONS OF THE PULSES OF THE TRACKS OF AN AUDIO SIGNAL|

US9626982B2|2011-02-15|2017-04-18|Voiceage Corporation|Device and method for quantizing the gains of the adaptive and fixed contributions of the excitation in a CELP codec|

CN105225669B|2011-03-04|2018-12-21|瑞典爱立信有限公司|Rear quantization gain calibration in audio coding|

NO2669468T3|2011-05-11|2018-06-02|

CN107025909B|2011-10-21|2020-12-29|三星电子株式会社|Energy lossless encoding method and apparatus, and energy lossless decoding method and apparatus|

WO2013185857A1|2012-06-14|2013-12-19|Telefonaktiebolaget L M Ericsson |Method and arrangement for scalable low-complexity coding/decoding|

RU2630889C2|2012-11-13|2017-09-13|Самсунг Электроникс Ко., Лтд.|Method and device for determining the coding mode, method and device for coding audio signals and a method and device for decoding audio signals|

CN103915100B|2013-01-07|2019-02-15|中兴通讯股份有限公司|A kind of coding mode switching method and apparatus, decoding mode switching method and apparatus|

AU2014225223B2|2013-03-04|2019-07-04|Voiceage Evs Llc|Device and method for reducing quantization noise in a time-domain decoder|

JP2016520854A|2013-03-21|2016-07-14|インテレクチュアルディスカバリーカンパニーリミテッド|Audio signal size control method and apparatus|

CN107818789B|2013-07-16|2020-11-17|华为技术有限公司|Decoding method and decoding device|

EP2830059A1|2013-07-22|2015-01-28|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Noise filling energy adjustment|

EP3069338B1|2013-11-13|2018-12-19|Fraunhofer Gesellschaft zur Förderung der Angewand|Encoder for encoding an audio signal, audio transmission system and method for determining correction values|

PL3000110T3|2014-07-28|2017-05-31|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Selection of one of a first encoding algorithm and a second encoding algorithm using harmonics reduction|

EP2980797A1|2014-07-28|2016-02-03|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Audio decoder, method and computer program using a zero-input-response to obtain a smooth transition|

CN104143335B|2014-07-28|2017-02-01|华为技术有限公司|audio coding method and related device|

EP2996269A1|2014-09-09|2016-03-16|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Audio splicing concept|

KR20160081844A|2014-12-31|2016-07-08|한국전자통신연구원|Encoding method and encoder for multi-channel audio signal, and decoding method and decoder for multi-channel audio signal|

EP3067887A1|2015-03-09|2016-09-14|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Audio encoder for encoding a multichannel signal and audio decoder for decoding an encoded audio signal|

EP3079151A1|2015-04-09|2016-10-12|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Audio encoder and method for encoding an audio signal|

KR20170019257A|2015-08-11|2017-02-21|삼성전자주식회사|Adaptive processing of audio data|

US9787727B2|2015-12-17|2017-10-10|International Business Machines Corporation|VoIP call quality|

US10109284B2|2016-02-12|2018-10-23|Qualcomm Incorporated|Inter-channel encoding and decoding of multiple high-band audio signals|

法律状态:
2018-03-27| B15K| Others concerning applications: alteration of classification|Ipc: G10L 19/12 (2013.01), G10L 19/03 (2013.01), G10L 1 |

2019-01-08| B06F| Objections, documents and/or translations needed after an examination request according [chapter 6.6 patent gazette]|

2019-09-03| B06U| Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]|

2020-03-31| B06A| Notification to applicant to reply to the report for non-patentability or inadequacy of the application [chapter 6.1 patent gazette]|

2020-07-21| B09A| Decision: intention to grant [chapter 9.1 patent gazette]|

2020-12-01| B16A| Patent or certificate of addition of invention granted|Free format text: PRAZO DE VALIDADE: 10 (DEZ) ANOS CONTADOS A PARTIR DE 01/12/2020, OBSERVADAS AS CONDICOES LEGAIS. |

优先权:

申请号 | 申请日 | 专利标题

US25344009P| true| 2009-10-20|2009-10-20|

US61/253,440|2009-10-20|

PCT/EP2010/065718|WO2011048094A1|2009-10-20|2010-10-19|Multi-mode audio codec and celp coding adapted therefore|

[返回顶部]