巴西专利BR112013021855B1 apparatus and method for determining a measurement for a perceived level of reverb, audio processor

专利PDF首页>>巴西专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
Apparatus and Method for Determining a Measurement for a Perceived Level of Reverberation, Audio Processor and Method for Processing a Signal An apparatus for determining a measurement for a perceived level of reverberation in a mixing signal consisting of a component of the direct signal (100) and a reverb signal component (102), comprising a loudspeaker processor (104) comprising a perceptual filter stage for filtering the dry signal component (100), the reverb signal component (102) or the mixing, characterized by the perceptual filter stage being configured to model an auditory perception mechanism of an entity to obtain a filtered direct signal, a filtered reverberation signal or a filtered mixing signal. The apparatus also comprises a loudness estimator to estimate a first loudness measurement using the filtered direct signal and to estimate a second loudness measurement using the filtered reverberation signal or the filtered mixing signal, where the filtered mixing signal is derived from an overlap of the direct signal component and the reverb signal component. The apparatus further comprises a combiner (110) to combine the first and second loudness measurements (106, 108) to obtain (...).
公开号:BR112013021855B1
申请号:R112013021855-0
申请日:2012-02-24
公开日:2021-03-09
发明作者:Christian Uhle；Jouni PAULUS；Juergen Herre；Peter PROKEIN；Oliver Hellmuth
申请人:Fraunhofer - Gesellschaft Zur Förderung Der Angewandten Forschung E.V；
IPC主号:

专利说明:

Application field
The present application relates to the processing of the audio signal and, in particular, to the processing of useful audio in artificial reverb.
Determining a measurement for a perceived level of reverberation is, for example, desired for applications where an artificial reverb processor is operated in an automated manner and needs to adapt its parameters to the input signal, so that the perceived level of reverberation corresponds to a target value. It is observed that the term reverberance, while referring to the same theme, does not seem to have a generally accepted definition, which makes it difficult to use as a quantitative measurement in a hearing test and prediction scenario.
Artificial reverb processors are generally implemented as linear time-invariant systems and operated in a pass-return signal pass, as described in Figure 6, with pre-delay d, reverberation impulse response (RIR I reverberation impulse response) and a scale factor g to control the direct reverberation ratio (DRR I direct-to-reverberation ratio). When implemented as parametric reverb processors, they feature a variety of parameters, for example, to control the shape and density of the RIR and the inter-channel coherence (ICC I inter-channel coherence) of the RIRs for multichannel processors in one or more bands frequency.
Figure 6 shows an input of the direct signal x [k] on an input 600, and this signal is routed to an adder 602 to add this signal to a component of the reverb signal r [k] emitted from a weight 604, which receives , at 5 its first entry, a signal emitted by a reverb filter 606 and which receives, at its second entry, a gain factor g. The reverb filter 606 may have an optional delay stage 608 connected upstream of the reverb filter 606, but due to the fact that the reverb filter 606 10 includes a delay of its own, the delay in block 608 can be included in the filter reverb 606, so that the upper branch in Figure 6 can only comprise a single filter that incorporates delay and reverberation or just incorporate reverberation without any additional delay. A component of the reverberation signal 15 is emitted by the filter 606 and this component of the reverberation signal can be modified by the multiplier 606 in response to the gain factor g to obtain the manipulated reverb signal component r [k] which is then combined with the component of the direct signal input at 600 to finally obtain the 20 m [k] mixing signal at the output of the adder 602. It is observed that the term "reverberation filter" refers to the common implementations of artificial reverberations (as much as convolution which is equivalent to FIR filtering, as well as implementations using recursive structures, such as Feedback Delay Networks or pass-through filter networks and feedback comb filters or other recursive filters), but designates general processing that produces a reverberating signal. Such processing may involve non-linear processes or time-varying processes such as low frequency modulations of the signal amplitudes or delay extensions. In these cases the term "reverberation filter" would not apply in the strict technical sense of a Linear Time Invariant system (LTI I Linear Time Invariant). In fact, the "reverberation filter" refers to processing that emits a reverberant signal, possibly including a mechanism for reading a computed or recorded reverberant signal from memory.
These parameters have an impact on the resulting audio signal in terms of perceived level, distance, size of the environment, color and sound quality. In addition, the perceived characteristics of the reverberation depend on the temporal and spectral characteristics of the input signal [1]. Focusing on a very important sensation, namely the loudness, it can be observed that the perceived reverberation sound is monotonically related to the non-stationary nature of the input signal. Intuitively speaking, an audio signal with wide variations in its housing excites reverberation at high levels and allows it to become audible at lower levels. In a typical scenario where the long-term DRR expressed in decibels is positive, the direct signal can mask the reverberation signal almost completely at times of time when its energy envelope increases. On the other hand, whenever the signal ends, the previously excited reverb tail becomes apparent in spaces that exceed a minimum duration determined by the post-masking slope (maximum 200 ms) and the time of integration of the auditory system (maximum 200 ms for moderate levels).
To illustrate this, Figure 4a shows the artificially generated reverberation signals, and Figure 4b shows the predicted and partial sonority functions computed with a computational sonority model. A RIR with a short 50 ms pre-delay is used here, omitting the previous reflections and synthesizing the delayed part of the reverberation with exponentially decaying white noise [2], The input signal was generated from a harmonic broadband signal and an envelope function so that an event with a short decline and a second event with a long decline are perceived. While the long event produces more total reverb energy, it is not surprising that it is the short sound that is perceived to be more reverberating. Where the decaying slope of the longest event masks the reverberation, the short sound already disappears before the reverberation is built and thus a space is open where the reverberation is perceived. Please note that the definition of masking used here includes both full and partial masking [3].
Although such observations have been made many times [4, 5, 6], it is still worth emphasizing, as it illustrates qualitatively why partial sound models can be applied in the context of this work. In fact, it was pointed out that the perception of reverberation arises from the flow segregation processes in the auditory system [4, 5, 6] and is influenced by the partial masking of reverberation due to the direct sound.
The considerations above motivate the use of sound models. The related investigations were carried out by Lee et al. and focus on predicting the rate of subjective decline in RIR's by listening to them directly [7] and the effect of the level of reproduction on reverberation [8]. A predictor for reverberation using declining periods previously based on loudness is proposed in [9]. In contrast to this work, the forecasting methods proposed here process the direct signal and the 5 reverberation signal with a computational model of partial sound (and with simplified versions of it in the search for low complexity implementations) and thus consider the influence of the signal. (direct) input into the sensation. Recently, Tsilfidis and Mourjopoulus [10] investigated the use of a sound model 10 for the suppression of delayed reverberation in single channel recordings. An estimate of the direct signal is computed from the reverberant input signal using a spectral subtraction method, and a reverberation masking index is derived by means of a computational auditory masking model, which controls 15 the reverberation processing.
It is a feature of multichannel synthesizers and other devices to add reverb to make the sound better from a perceptual point of view. On the other hand, the reverberation generated is an artificial signal that when added to the signal at a low level is barely audible and when added at a high level leads to the end signal mixed with artificial and unpleasant sound. What makes things even worse is that, as discussed in the context of Figure 4a and 4b, that the perceived level of reverb is strongly dependent on the signal, and so a given reverb filter can work very well for one type of signal, but it can have no audible effect or, even worse, it can generate serious readable artifacts for a different type of signal.
An additional problem related to reverberation is that the reverberated signal is intended for the ear of an entity or individual, such as a human being and the ultimate goal of generating a mixing signal having a component of the direct signal and a component of the reverberation signal is that the entity perceives this mixed signal or "reverberated signal" as sounding good or sounding natural. However, the auditory perception mechanism or the mechanism as the sound is actually perceived by an individual is strongly non-linear, not only with respect to the bands in which the human hearing works, but also with respect to the processing of signals within the bands. In addition, it is known that the human perception of sound is not very much directed by the sound pressure level that can be calculated, for example, by squaring digital samples, but the perception is more controlled by a feeling of loudness. In addition, for mixed signals, which include a direct component and a component of the reverb signal, the feel of the sound component of the reverb depends not only on the type of component of the direct signal, but also on the level or sound of the component of the direct signal. .
Thus, there is a need to determine a measurement for a perceived level of reverberation in a signal that consists of a component of the direct signal and a component of the reverberation signal to cooperate with the above problems related to an entity's auditory perception mechanism.
An object of the present invention is, therefore, to provide an apparatus or method for determining a measurement to a perceived level of reverberation or to provide a processor with improved characteristics.
This objective is achieved by an apparatus for determining a measurement for a perceived level of reverberation in accordance with claim 1, a method for determining a measurement for a perceived level of reverberation in accordance with claim 10, an audio processor in accordance with with claim 11, a method for processing an audio signal according to claim 14 or a computer program according to claim 15.
The present invention is based on the finding that the measurement for a perceived level of reverberation in a signal is determined by a sound model processor comprising a perceptual filter stage to filter a component of the direct signal, a component of the reverberation signal or a component of the mixing signal using a perceptual filter to model an entity's auditory perception mechanism. Based on the perceptually filtered signals, a loudness estimator estimates a first loudness measurement using the filtered direct signal and a second loudness measurement using the filtered reverberation signal or the filtered mixing signal. Then, a combiner combines the first measurement and the second measurement to obtain a measurement for the perceived level of reverberation. In particular, a way of combining two different loudness measurements, preferably calculating the difference, provides a quantitative value or a measurement of how strong a sense of reverb is compared to the sensation of the direct signal or the mixing signal.
To calculate the loudness measurements, the absolute loudness measurements can be used and, in particular, the absolute loudness measurements of the direct signal, the mixed signal or the reverberation signal. Alternatively, the partial loudness can also be calculated where the first loudness measurement is determined using the direct signal as the stimulus and the reverberation signal as noise in the loudness model and the second loudness measurement is calculated using the reverb signal. as the stimulus and the direct signal as the noise. In particular, by combining these two measurements in the combiner, a useful measurement for a perceived level of reverberation is obtained. It has been observed by the inventors that such a useful measurement cannot be determined alone by generating a single loudness measurement, for example, using the direct signal alone or the mixing signal alone or the reverberation signal alone. Also, due to the interdependencies in human hearing, combining measurements that are derived differently from these three signals, the perceived level of reverberation in a signal can be determined or modeled with some degree of precision.
Preferably, the sound model processor provides a time / frequency conversion and recognizes the auditory transfer function along with the pattern of excitation that actually occurs in a human who hears a model by hearing models.
In a preferred application, the measurement for the perceived level of reverberation is forwarded to a predictor that actually provides the perceived level of reverberation on a useful scale such as the Sone scale. This predictor is preferably trained by listening to the test data and the predictor parameters for a preferred linear predictor to comprise a constant term and a scale factor. The constant term preferably depends on the characteristic of the reverb filter actually used and, in an application of the parameter of the characteristic of the reverb filter T60, which can be given for well-known direct reverb filters used in artificial reverberators. Even when, however, this characteristic is not known, for example, when the component of the reverberation signal is not separately available, but was separated from the mixing signal before processing in the inventive apparatus, an estimate for the constant term can be derived.
Subsequently, the preferred applications of the present invention are described with reference to the accompanying drawings, in which:
Figure 1 is a block diagram for an apparatus or method for determining a measurement for a perceived level of reverberation;
Figure 2a is an illustration of a preferred application of the loudspeaker processor;
Figure 2b illustrates another preferred implementation of the loudspeaker processor;
Figure 3 illustrates another preferred implementation of the sound model processor;
Figure 4a, illustrates an example of time signal envelopes and a corresponding loudness and partial loudness, being: Time signal envelopes of an audio signal (solid line), the reverberation signal (dashed line) and the mixture of both the signs (dotted line).
Figure 4b illustrates an example of time signal envelopes and a corresponding loudness and partial loudness, of the signal being: Total loudness (dotted line), partial loudness of the direct signal (solid line) and the reverberation signal (dashed line);
Figure 5a, b illustrate information in the experimental data for training the predictor;
Figure 6 illustrates a block diagram of an artificial reverb processor;
Figure 7 illustrates three tables to indicate the evaluation metrics for the applications of the invention;
Figure 8 illustrates an audio signal processor implemented to use the measurement to a perceived level of reverberation for the purpose of artificial reverberation;
Figure 9 illustrates a preferred implementation of the predictor that depends on the perceived levels of reverberation averaged; and
Figure 10 illustrates the equations in the publication by Moore Glasberg, Baer, 1997 used in a preferred application to calculate the specific sonority.
The perceived level of reverb depends on the audio input signal and the impulse response. Applications of the invention aim to quantify this observation and predict the perceived level of delayed reverberation based on the passages of the signal separate from the direct and reverberant signals, as they appear in the digital audio effects. An approach to the problem is developed and subsequently extended considering the impact of the reverberation time on the forecast result. This leads to a linear regression model with two input variables that can predict the perceived level with high precision, as shown in the experimental data derived from the hearing tests. Variations of this model with different degrees of sophistication and computational complexity 5 are compared with respect to its precision.
Applications include control of digital audio effects for automatic mixing of audio signals.
Applications of the present invention are not only useful for predicting the perceived level of reverberation in speech and music when the direct signal and the reverberation impulse response (RIR) are separately available. In other applications, in which a reverberated signal occurs, the present invention can be applied as well. At this point, however, a direct / ambient or direct / reverb separator would be included to separate the component from the direct signal and the component from the reverb signal from the mixing signal. Such an audio processor would then be useful for changing the direct / reverb ratio on this signal to generate a better sounding reverb signal or better sound mixing signal.
Figure 1 illustrates an apparatus for determining a measurement for a perceived level of reverberation in a mixing signal comprising a component of the signal component of the dry or direct signal 100 and a component of the reverberation signal 102. The component of the dry signal 100 and the component of the reverb signal 102 are inserted into a processor of the loudness model 104. The processor of the sound model is configured to receive the component of the direct signal 100 and the component of the reverb signal 102 and is furthermore comprising a perceptual filter stage 104a and a subsequently connected loudness calculator 104b as illustrated in Figure 2a. The processor of the loudness model generates, on its output, a first loudness measurement 106 and a second loudness measurement 108. Both loudness measurements are inserted in a combiner 110 to combine the first loudness measurement 106 and the second measurement of loudness 108 to finally obtain a measurement 112 for the perceived level of reverberation. Depending on the implementation, the measurement for perceived level 10 112 can be inserted into a predictor 114 to predict the perceived level of reverberation based on an average value of at least two measurements for the perceived loudness for different signal structures as will be discussed in context of Figure 9. However, predictor 114 in Figure 1 is optional and actually 15 transforms the measurement to the perceived level in a given value range or unit range like the Sone unit range which is useful for giving quantitative values related to sonority. However, other uses for the measurement for the perceived level 112 that is not processed by the predictor 114 can also be used, for example, in the audio processor of Figure 8, which does not necessarily have to depend on a value emitted by the predictor 114, but that can also directly process the measurement to the perceived level 112, either in a direct form or preferably in a kind of smooth form where the smoothing time is preferred so as not to have level corrections that change much of the reverberated signal or, as the case may be, discussed later, the gain factor g illustrated in Figure 6 or illustrated in Figure 8.
In particular, the perceptual filter stage is configured to filter the component of the direct signal, the component of the reverberation signal or the component of the mixing signal, where the perceptual filter stage is configured to model an auditory perception mechanism of an entity like a human being to get a filtered direct signal, a filtered reverb signal or a filtered mix signal. Depending on the implementation, the perceptual filter stage can comprise two filters that operate in parallel or can comprise a storage and a single filter as long as one and the same filter can actually be used to filter each of the three signals, that is, the signal reverb, mix signal and direct signal. In this context, however, it should be noted that although Figure 2a illustrates n filters that model the auditory perception mechanism, actually two filters will be sufficient or a single filter that filters two signals outside the group comprising the component of the reverberation signal, the component of the mixing signal and the component of the direct signal.
The loudness calculator 104b or loudness estimator is configured to estimate the first sound related measurement using the filtered direct signal and to estimate the second loudness measurement using the filtered reverb signal or the filtered mixing signal, where the mixing signal it is derived from a superposition of the component of the direct signal and the component of the reverberation signal.
Figure 2c illustrates four preferred ways to calculate the measurement for the perceived level of reverberation. Application 1 depends on the partial sonority where both the component of the direct signal x and the component of the reverberation signal r are used in the processor of the sonority model, but where, to determine the first measurement EST1, the reverberation signal is used as the stimulus and the direct signal is used as the noise. To determine the second EST2 loudness measurement, the situation is changed, and the component of the direct signal is used as a stimulus and the component of the reverb signal is used as the noise. Then, the measurement for the perceived level of correction generated by the combiner is a difference between the first EST1 loudness measurement and the second EST2 loudness measurement.
However, other computationally efficient applications additionally exist, being indicated in lines 2, 3, and 4 in Figure 2c. These computationally more efficient measurements depend on the calculation of the total loudness of three signals comprising the mix signal m, the direct signal x and the reverberation signal n. Depending on the required calculation performed by the combiner indicated in the last column of Figure 2c, the first sound measurement EST1 is the total sound of the mixing signal or the reverberation signal and the second sound measurement EST2 is the total sound of the component of the direct signal x or the mix signal component m, where the actual combinations are as shown in Figure 2c.
In another application, the loudspeaker processor 104 is operating in the frequency domain as discussed in more detail in Figure 3. In this situation, the loudspeaker processor and, particularly, the loudness calculator 104b provides a first measurement and a second measurement for each band. These first measurements on all n bands are subsequently added or combined together in an adder 104c for the first branch and 104d for the second branch to finally obtain a first measurement for the broadband signal and a second measurement for the broadband signal .
Figure 3 illustrates the preferred application of the sound model processor that has already been discussed in some respects with respect to Figures 1, 2a, 2b, 2c. In particular, the perceptual filter stage 104a comprises a time / frequency converter 300 for each branch, where, in Figure 3 the application, x [k] indicates the stimulus and n [k] indicates noise. The converted time / frequency signal is routed to a 302 hearing transfer function block (Please note that the hearing transfer function can alternatively be computed before the time / frequency converter with similar results, but the computational load more high) and the output of this block 302 is inserted into a calculation excitation pattern block 304 followed by a time integration block 306. Then, in block 308, the specific sound in this application is calculated, where block 308 corresponds to the block of the sound calculator 104b in Figure 2a. Subsequently, an integration on the frequency in block 310 is performed, where block 310 corresponds to the adder already described as 104c and 104d in Figure 2b. It should be noted that block 310 generates the first measurement for a first set of stimulus and noise and the second measurement for a second set of stimulus and noise. Particularly, when Figure 2b is considered, the stimulus to calculate the first measurement is the reverberation signal and the noise is the direct signal while, to calculate the second measurement, the situation is changed and the stimulus is the component of the direct signal and noise is the component of the reverb signal. Thus, to generate two different sound measurements, the procedure illustrated in Figure 3 was performed twice. However, changes in the calculation occur only in block 308 that operates differently as discussed further in the context of Figure 10, so that the steps illustrated by blocks 300 to 306 should only be performed once, and the result of the temporal integration block 306 can be stored to calculate the first estimated sound and the second estimated sound for application 1 in Figure 2c. It should be noted that, for other applications 2, 3, 4 in Figure 3c, block 308 is replaced by an individual block "calculate total loudness" for each branch, where, in this application it is indifferent, if a signal is considered as a stimulus or a noise.
Subsequently, the sound model illustrated in Figure 3 is discussed in more detail.
The implementation of the sound model in Figure 3 follows the descriptions in [11, 12] with modifications as detailed later. Prediction training and validation uses data from hearing tests described in [13] and briefly summarized later. The application of the sound model to predict the perceived level of delayed reverberation is described later as well. The experimental results follow.
This section describes the implementation of a partial sound model, the data from the hearing test was used as true for the computational prediction of the perceived level of reverberation, and a proposed provision method that is based on the partial sound model.
The loudness model computes computationally the partial loudness Nxn [&] of a signal x [Zr] when presented simultaneously with a masking signal n [k]

Although previous models have dealt with the perception of sonority in the stable background noise, some work exists in the perception of sonority in the backgrounds of comodulated random noise [14], complex ambient sounds [12], and music signals [15]. Figure 4b illustrates the total loudness and partial loudness of its components of the example signal shown in Figure 4a, calculated with the loudness model used here.
The model used in this work is similar to the models in [11, 12] which were based on the previous research by Fletcher, Munson, Stevens, and Zwicker, with some modifications, as described below. A block diagram of the sound model is shown in Figure 3. The input signals are processed in the frequency domain using a short-lived Fourier transform (STFT I Short-time Fourier transform). In [12], 6 DFT's of different extensions are used to obtain a good match for the frequency resolution and the temporal resolution of the human auditory system at all frequencies. In this work, only one extension of DFT is used for computational efficiency, with an extension of the structure of 21 ms at a sampling rate of 48 kHz, 50% overlap and a function of the Hann window. The transfer through external and average hearing is simulated with a fixed filter. The excitation function is computed for 40 auditory filter bands spaced on the equivalent rectangular bandwidth scale (ERB I equivalent rectangular bandwidth) using a level dependent excitation pattern. In addition to the temporal integration due to the STFT windowing, a recursive integration is implemented with a time constant of 25 ms, which is only active in periods where the excitation signal declines.
The partial sonority specifies, that is, the partial sonority evoked in each auditory filter band, is computed from the excitation levels of the signal of interest (the stimulus) and the interference noise according to Equations (17) - ( 20) in [11], illustrated in Figure 10. These equations cover the four cases where the signal is above the hearing limit in noise or not, and where the excitation of the mixing signal is less than 100 dB or not. If no interference signal is inserted into the model, that is / [£] = 0, the result is equal to the total sound 7Vf [Z:] of the stimulus x [&].
In particular, Figure 10 illustrates equations 17, 18, 19, 20 of the publication "A Model for the Prediction of Thresholds, Loudness and Partial Loudness", BCJ Moore, BR Glasberg, T. Baer, J. Audio Eng. Soc., Vol. 45, No. 4, April 1997. This reference describes the case of a signal presented along with a background sound. Although the background can be any type of sound, it is referred to as the "noise" in this reference to distinguish it from the signal whose sound should be judged. The presence of noise reduces the loudness of the signal, an effect called partial masking. The loudness of the signal grows very quickly when its level is raised from a limit value to a value 20-30dB above the limit. On paper it is assumed that the partial loudness of a signal presented in the noise can be calculated by summarizing the loudness of the partial specific signal by frequency (on an ERB scale). Equations are derived to calculate the partial specific sound considering four limiting cases. ESIN denotes the excitement evoked by the signal and ERUIDO denotes the excitement evoked by the noise. It is assumed that ESIN> ET11RQ and ESIN plus ERUtDO <1010. The total specific N'TOT loudness is defined as follows:

It is assumed that the listener can divide a specific loudness at a given central frequency between the specific loudness of the signal and that of the noise, but in a way that prefers the total specific loudness.

This assumption is consistent, since in most experiments that measure partial masking, the listener hears the first noise alone and then the noise plus the signal. The specific sound for the noise alone, assuming it is above the limit, is

Thus, if the specific loudness of the signal was derived simply by subjecting the specific loudness to the noise of the total specific loudness, the result would be

In practice, the shape that the specific sound is positioned between the signal and the noise seems to vary depending on the relative excitation of the signal and the noise.
Four situations are considered indicating how the specific sound is assigned at different levels of the signal.
Let ETHRN denote the maximum excitation evoked by a sinusoidal signal when it is at its masked limit in the background noise - When USIN is well below £ THRN, all the specific loudness is attributed to the noise, and the partial specific tone sounds close to zero. Second, when FRUIT is well below KTHRQ, the partial specific sonority approaches the value would have a quiet signal. Third, when the signal is at its masked limit, with ETHRN excitation, it is assumed that the partial specific sound is equal to the value that would occur for a signal at the absolute limit. Finally, when a signal is centered on the narrowband noise it is well above its masked limit, the loudness of the signal approaches its unmasked value. Then, the sonority of the partial specific signal also approaches its unmasked value.
Consider the implications of these various boundary conditions. At the masked limit, the specific loudness is equal to a signal at the silent limit. This specific loudness is less than would be predicted from the above equation, presumably because the specific loudness of the signal is attributed to the noise. To obtain the correct specific loudness for the signal, it is assumed that the specific loudness attributed to the noise is increased by the factor B, where
Applying this factor to the second term in the above equation for N 'SIN results in
It is assumed that when the signal is at the masked limit, its maximum ETHRN excitation is equal to KERUÍDO + .ETHRQZ where K is the signal-to-noise ratio at the output of the auditory filter necessary to limit at higher levels of the masker. Recent estimates of K, obtained to mask the experiments using the wavy noise, suggest that K increases markedly at very low frequencies, becoming larger than the unit. In the reference, the value of K is estimated as a function of frequency. The value reduces from high levels at low frequencies to constant low levels at higher frequencies. Unfortunately, there are no estimates for K for center frequencies below 100 Hz, so the 50 to 100 Hz values that replace ETHRN in the above equation results in:
When ESIN = ETHRN, this equation specifies the maximum specific loudness for a signal at the absolute limit in silence.
When the signal is well above its masked limit, that is, when ESIN »£ THRNZ the specific sound of the signal approaches the value it would have when no background noise is present. This means that the specific sound attributed to the noise becomes a small disappearance. To accommodate this, the above equation is modified by introducing an extra term that depends on the ETHRN / ESIN index. This term reduces as E ESIN is raised above the value corresponding to the masked limit. Thus, the above equation becomes equation 17 in Figure 10.
This is the final equation for N'SIN The case when ESIN> ETHRN and ESIN + ERUID ^ IO10 • The exponent 0.3 in the empirically chosen to give a good fit to the data in the I sound of a tone in noise as a function of the reason signal-noise.
Subsequently, the situation is considered where 5 ESIN <ETHRN- In the limit case where ESIN is just below ETHRN, the specific sonority would approximate the value given in Equation 17 in Figure 10. When ESIK is reduced to well below ETHRN, the sonority specifies would quickly become very small. This is achieved by Equation 18 in Figure 10. The first term in 10 parentheses determines the rate at which a specific sound reduces as ESIN is reduced below ETHRN. THIS describes the relationship between the specific pitch and the excitation for a silent signal when £ SIN <-ETHRQ / except that ETHRN has been replaced in Equation 18. The first term in brackets ensures that the specific pitch 15 approaches the value defined by the Equation 17 in Figure 10 as ESIN approaches ETHRN-
The partial sound equations described so far apply when ESIN + ERUIDO <1010. Applying the same reasoning as used for the derivation of equation (17) of Figure 10, 20 any equation can be derived for the case ERUID ^ -ETHRN and ESIN + ERUID> 1010 as described in equation 19 in Figure 10. C2-C / (1.04x106) ° '5. Similarly, applying the same reasoning as used for the derivation of equation (18) in Figure 10, an equation can be derived for the case where ESIN <ETHRN and 25 -ESIN + £ NOISE> 1010 as described in equation 20 in Figure 10.
The following points must be observed. This prior art model is applied to the present invention where, in a first run, SIN corresponds, for example, to the direct signal as the "stimulus" and the noise corresponds, for example, to the reverberation signal or the mixing signal as the "noise". In the second execution as discussed in the context of the first application in Figure 2c, SIN would then correspond to the reverberation signal as "stimulus" and "noise" would correspond to the direct signal. Then, the two loudness measurements are obtained which are then combined by the combiner preferably forming a difference.
To assess the suitability of the sound model described for the task of predicting the perceived level of delayed reverberation, a true corpus generated from the listener's responses is preferred. For this purpose, the data from an investigation that characterizes the hearing test [13] are used in this work, which is briefly summarized below. Each hearing test consisted of several graphical user interface screens that presented mixtures of different direct signals with different conditions of artificial reverberation. The listeners were asked to rate this perceived amount of reverberation on a scale of 0 to 100 points. In addition, two anchor signals were presented at 10 points and at 90 points. The listeners were asked to rate the perceived amount of reverberation on a scale of 0 to 100 points. In addition, two anchor signals were presented at 10 points and at 90 points. The anchor signals were created from the same direct signal with different reverb conditions.
The direct signals used to create the test items were monophonic recordings of speech, individual instruments and music of different genres with an extension of approximately 4 seconds each. Most items originated from anechoic recordings, but also commercial recordings with a small amount of original reverb were used.
The RIR's represent delayed reverberation and were generated using exponentially decaying white noise with frequency-dependent declining rates. Declination rates are chosen so that the reverb time is reduced from low to high frequencies, starting at a T60 reverb time. Previous reflections were denied in this work. The reverb signal r [&] and the direct signal have been scaled and added so that the ratio of your average loudness measurement according to ITU-R BS.1770 [16] corresponds to a desired DRR and so that all the mixes of the test signal have equal long-term sound. All test participants were working in the audio field and had experience with subjective hearing tests.
The true data used for training and verification / testing of the prediction method were considered from two hearing tests and are denoted by A and B, respectively. Data set A consisted of indices from 14 listeners to 54 signals. The listeners repeated the test once and the average index was obtained from all 28 indexes for each item. The 54 signals were generated by combining 6 different direct signals and 9 stereophonic reverberation conditions, with e {1,1.6,2.4} s and DRR and {3,7.5,12} dB, and in the pre-delay.
The data in B were obtained from the indices of 14 listeners for 60 signals. The signals were generated using 15 direct signals and 36 reverberation conditions. The reverberation conditions tested four parameters, namely 7 ^ 0, DRR, pre-delay, and ICC. For each direct signal, 4 RIR's were chosen so that two had no pre-delay and two had a short 50 ms pre-delay, and two were monophonic and two were stereophonic.
Subsequently, other functions of a preferred application of combiner 110 in Figure 1 are discussed.
The basic input characteristic for the forecasting method is computed from the difference of the partial sound Nr x [A] of the reverberation signal r [A] (with the direct signal x [A] being the interferer) and the sound of x [A] (where r [k] is the interferer), according to Equation 2.

The logic behind Equation (2) is that the difference zWrv [A] is a measure of how strong the sensation of the reverberation is compared to the sensation of the direct signal. Considering the Considering the difference, we also observe the result of the approximately invariant forecast in relation to the level of reproduction. The level of reproduction has an impact on the investigated sensation [17, 8], but to a more subtle extent than that reflected by the increase in the partial sound Nrx with an increase in the level of reproduction. Typically, music recordings sound more reverberated at moderate to high levels (starting at approximately 75-80 dB SPL) than at approximately 12 to 20 dB lower levels. This effect is especially obvious in cases where DRR is positive, which is valid "for almost all recorded music" [18], but not in all cases for concert music where "listeners are generally well beyond the critical distance" [6].
The reduction in the perceived level of reverberation with decreasing reproduction level is best explained by the fact that the dynamic reverb range is smaller than that of direct sounds (or, a representation of time / frequency of reverberation is more dense where a representation of time / frequency of direct sounds is more insufficient [19]). In this scenario, the reverb signal is more likely to fall below the hearing threshold than direct sounds.
Although equation (2) describes, as the combination operation, a difference between the two sound measurements Nr, x [k] and Nx, r [k], other combinations can be performed as multiplications, divisions or even additions. In any case, it is sufficient that the two alternatives indicated by the two loudness measurements are combined to have the influences of both alternatives on the result. However, experiments have shown that the difference results in the best values of the model, that is, in the results of the model that fit the hearing tests to a good extent, so that the difference is the preferred form of combination.
Subsequently, details of the predictor 114 illustrated in Figure 1 are described, where these details refer to a preferred application.
The forecasting methods described below are linear and use a minimal quadratic fit to compute the model coefficients. The simple structure of the predictor is advantageous in situations where the size of the data sets for training and testing the predictor is limited, which could lead to overfitting the model when using regression methods with more degrees of freedom, for example, neural networks . The Rh baseline predictor is derived by linear regression according to Equation (3) with the coefficients there, with K being the signal extension in the structures,

The model has only one independent variable, that is, the mean of. To track changes and be able to implement real-time processing, averaging computation can be approximated using a leak integrator. The model parameters derived when using data set A for training are a0 = 48.2 and ^ = 14.0, where a0 is equal to the average index of all listeners and items.
Figure 5a describes the sensations expected for data set A. It can be seen that the predictions are moderately correlated with the average listener indexes with a correlation coefficient of 0.71. Please note that the choice of regression coefficients does not affect this correlation. As shown in the lower graph, For each mixture generated by the same direct signals, the points display a shape of the characteristic centered close to the diagonal. This shape indicates that although the Rh baseline model can predict R to a certain degree, it does not reflect the influence of T60 on the indices. Visual inspection of data points suggests a linear dependency at 760. If the value of is known, as is the case when controlling an audio effect, it can be easily incorporated into the linear regression model to derive an improved forecast

The model parameters derived from data set A are α0 = 48.2, «, = 12.9, a2 -10.2. The results are shown in Figure 5b separately for each of the data sets. The evaluation of the results is described in more detail in the next section.
Alternatively, an average of more or less blocks can be performed as long as an average of at least two blocks occurs, although, due to the theory of the linear equation, the best results can be obtained, when an average over all music increases the a certain structure is carried out. However, for real-time applications, it is preferred to reduce the number of structures on which it averages depending on the actual application.
Figure 9 further illustrates that the constant term is defined by a0 and a2 "T60- 0 second term a2 * T60 was selected to be in the position to apply this equation not only to a single reverberator, that is, in a situation in which the filter 600 in Figure 6 is not changed. This equation, which is certainly a constant term, but which depends on the reverb filters actually used in Figure 6, then provides the flexibility to use exactly the same equation for other reverb filters having other values of T60. As is known in the art, T60 is a parameter that describes a particular reverb filter and, particularly means that the reverb energy has been reduced to 60dB of an initial maximum reverb energy value. reverberations are decreasing over time, and then T60 indicates a period of time, in which a reverberation energy generated by an excitation of the signal has reduced to 60dB. in terms of forecasting accuracy, they are obtained by replacing T60 with 5 parameters that represent similar information (of the RIR extension), for example T30.
Next, the models are evaluated using the correlation coefficient r, the mean absolute error (MAE I mean absolute error) and the mean square prediction error (RMSE I root 10 mean squared error) between the listener's mean indexes and the sensation Preview. The experiments are performed as two-fold cross-validation, that is, the predictor is trained with data set A and tested with data set B, and the experiment is repeated with B for training and A for testing. The 15 evaluation metrics obtained from both operations are averaged, separately for training and testing.
The results are shown in Table 1 for the Rh and Rc forecast models. The Rc predictor reproduces the accurate results with an RMSE of 10.6 points. The mean of the standard deviation 20 of the indices of the individual listener per item is given as a measurement for the dispersion of the mean (of the indices of all listeners per item) as <TX = 13.4 for data set A and CT / j = 13.6 for data set B. The comparison to RMSE indicates that Re is at least as accurate as the average hearing in the 25 hearing test.
The forecasting accuracy for the different data sets slightly, for example, for Re both MAE and RMSE are approximately one point below the mean value (as listed in the Table) when testing with data set A and one point above the mean at test with data set B. The fact that the evaluation metrics for training and testing are comparable indicates that the overfitting of the predictor was avoided.
To facilitate an economical implementation of such forecasting models, the following experiments investigate how the use of sound functions with less computational complexity influences the accuracy of the forecast result. The experiments focus on replacing the computation of the partial sound with the estimates of the total sound and simplified implementations of the excitation pattern.
Instead of using the ZXJV partial sound difference ,. t [A], three differences in the total loudness estimates are evaluated, with the loudness of the direct signal AU &], the loudness of the reverb AÇ [&], and the loudness of the mixing signal Nm [ír], as shown in Equations ( 5) - (7), respectively.

Equation (5) is based on the assumption that the perceived level of the reverberation signal can be expressed as the difference (increase) in all the loudness that is caused by adding the reverberation to the dry signal.
Following a similar logic as for the difference in partial sound in Equation (2), the sound functions using the differences in the total sound of the reverb signal and the mix signal or the direct signal, respectively, are defined in Equations (6) and (7). The measurement to predict the sensation is derived from the sound of the reverberation signal when listed separately, with subtraction terms to model the partial masking and for normalization with respect to the level of reproduction derived from the mixing signal or the direct signal, respectively.

Table 2 shows the results obtained with the functions based on the total sonority and reveals that in fact two of them, ΔA ^ m .., [*] and ΔN „[t], produce predictions with almost the same precision as Re. But as shown in Table 2, even ΔNr_n [k] provides use for the results.
Finally, in an additional experiment, the influence of the implementation of the dispersion function is investigated. This is of particular significance for many application scenarios, as the use of level-dependent excitation patterns requires highly computationally complex implementations. Experiments with similar processing as for and, but using a loudness model without dispersion and a loudness model with level-invariant dispersion function led to the results shown in Table 2. The influence of dispersion appears to be insignificant.
Thus, equations (5), (6) and (7) that indicate applications 2, 3, 4 of Figure 2c illustrate that even without partial sounds, but with total sounds, for different combinations of signal components or signals, good values or measurements for the perceived level of reverberation in a mixing signal are also obtained.
Subsequently, a preferred application of the inventive determination of measurements to a perceived level of reverberation is discussed in the context of Figure 8. Figure 8 illustrates an audio processor for generating a reverb signal from a component of the direct signal inserted into an input 800. The component of the dry or direct signal is inserted into a reverberator 801, which can be similar to reverberator 606 in Figure 6. The component of the dry input signal 800 is additionally inserted into an apparatus 802 to determine the measurement for a perceived loudness that can be implemented as discussed in the context of Figure 1, Figure 2a and 2c, 3, 9 and 10. The output of the 802 device is the R measurement for a perceived level of reverberation in a mix signal that is inserted into an 803 controller. controller 803 receives, in another input, a target value for the measurement of the perceived level of reverberation and calculates, from this target value and the real value R again a value at the output 804.
This gain value is inserted in a manipulator 805 that is configured to manipulate, in this application, the component of the reverb signal 806 emitted by the reverberator 801. As illustrated in Figure 8, the device 802 additionally receives the component of the reverb signal 806 as discussed in the context of Figure 1 and the other Figures that describe the apparatus to determine a measurement of a perceived loudness. The output of the manipulator 805 is inserted in an adder 807, where the output of the manipulator comprises in the application of Figure 8 the manipulated reverb component and the output of the adder 807 indicates a mixing signal 808 with a perceived reverb as determined by the target value. The 803 controller can be configured to implement any of the control rules as defined in the feedback control technique where the target value is a defined value and the R value generated by the device is a real value and the 804 gain is selected so that the actual value R approximates the target value inserted in the controller 803. Although Figure 8 is illustrated in which the reverberation signal is manipulated by the gain in the manipulator 805 which particularly comprises a multiplier or weight, other implementations can be performed as well. Another implementation, for example, is that not the reverb signal 806, but the dry signal component is handled by the handler as indicated by the optional line 809. In this case, the unhandled reverb signal component as emitted by reverb 801 would be inserted in the adder 807 as illustrated by the optional line 810. Of course, even a manipulation of the dry signal component and the reverberation signal component could be performed to introduce or define a certain measurement of the perceived loudness of the reverberation in the mixing signal 808 emitted by the adder 807. Another implementation, for example, is that the T6o reverberation time is manipulated.
The present invention provides a simple and robust prediction of the perceived level of reverberation and, specifically, the delayed reverberation in speech and music using sound models of varying computational complexity. The prediction modules were trained and evaluated using the subjective data from three hearing tests. As a starting point, the use of a partial sonority model led to a prediction model with high accuracy when the Tgo of RIR 606 in Figure 6 is known. This result is interesting from the perceptual point of view, when it is considered that the partial sonority model was not originally developed with direct and reverberant sound stimulus as discussed in the context of Figure 10. Subsequent modifications of the computation of the input functions for the forecasting method leads to a series of simplified models that have been shown to achieve comparable performance for the data sets at hand. These modifications included the use of simplified full-sound dispersion function models. The applications of the present invention are also applicable to more diverse RIRs including previous reflections and longer pre-delays. The present invention is also useful for determining and controlling the contribution of perceived loudness to other types of additive or reverberant audio effects.
Although some aspects have been described in the context of an apparatus, it is evident that these aspects also represent a description of the corresponding method, where a block or a device corresponds to a method step or a characteristic of a method step. Similarly, the aspects described in the context of a method step also represent a description of a corresponding block or item or characteristic of a corresponding device.
Depending on the requirements of certain implementations, the applications of the invention can be implemented in hardware or in software. The implementation can be carried out using a digital storage medium, for example, a Floppy Disk, a DVD, a CD, a ROM memory, PROM, EPROM, EEPROM or a FLASH memory, having electronically readable control signals stored in it, which cooperate ( or are able to cooperate) with a programmable computer system, so that the respective method is carried out.
Some applications according to the invention comprise a tangible or non-transitory data carrier with electronically readable control signals, which are capable of cooperating with a programmable computer system, in such a way that one of the methods described here is carried out.
In general, the applications of the present invention can be implemented as a computer program product with a program code, the program code being operative to perform one of the methods when the computer program product operates on a computer. The program code can, for example, be stored on a mechanically readable medium.
Other applications include the computer program to execute one of the methods described here, stored in a mechanically readable support.
In other words, an application of the method of the invention is, therefore, a computer program with a program code to perform one of the methods described herein, when the computer program is executed on a computer.
A further application of the method of the invention is, therefore, a data carrier (either a digital storage medium or a computer-readable medium) comprising, recorded on it, the computer program for carrying out one of the methods described herein.
A further application of the method of the invention is, therefore, a data stream or a sequence of signals representing the computer program for carrying out one of the methods described herein. The data flow or signal sequence can, for example, be configured to be transferred via a data communication connection, for example, via the Internet.
An additional application comprises a processing means, for example, a computer or a programmable logic device, configured for or adapted to perform one of the methods described herein.
An additional application comprises a computer, having the computer program installed on it to execute one of the methods described here.
In some applications, a programmable logic device (for example, an array of field programmable gates) can be used to perform some or all of the functionality of the methods described here. In some applications, an array of programmable ports in the field can cooperate with a microprocessor in order to perform one of the methods described here. In general, the methods are preferably carried out by any hardware device.
The applications described above are merely illustrative for the principles of the present invention. It is understood that modifications and variations of the arrangements and details described herein will be evident to other experts in the art. It is therefore intended to be limited only by the scope of the patent pending claims and not by the specific details presented for the purpose of describing and explaining the applications of the present invention. Reference List [1] A. Czyzewski, "A method for artificial reverberation quality testing," J. Audio Eng. Soc., Vol. 38, pp. 129-141, 1990. [2] J.A. Moorer, "About this reverberation business," Computer Music Journal, vol. 3, 1979. [3] B. Scharf, "Fundamentals of auditory masking," Audiology, vol. 10, pp. 30-40, 1971. [4] W.G. Gardner and D. Griesinger, "Reverberation level matching experiments," in Proc. of the Sabine Centennial Symposium, Acoust. Soc. Of Am., 1994. [5] D. Griesinger, "How loud is my reverberation," in Proc. Of the AES 98th Conv., 1995. [6] D. Griesinger, "Further investigation into the loudness of running reverberation," in Proc, of the Institute of Acoustics (UK) Conference, 1995. [7] D. Lee and D Cabrera, "Effect of listening level and background noise on the subjective decay rate of room impulse responses: Using time varying-loudness to model reverberance," Applied Acoustics, vol. 71, pp. 801-811, 2010. [8] D. Lee, D. Cabrera, and W.L. Martens, "Equal reverberance matching of music," Proc, of Acoustics, 2009. [9] D. Lee, D. Cabrera, and W.L. Martens, "Equal reverberance matching of running musical stimuli having various reverberation times and SPLs," in Proc, of the 20th International Congress on Acoustics, 2010. [10] A. Tsilfidis and J. Mourjopoulus, "Blind perceptual reverberation modeling," J Acoust. Soc. Am, vol. 129, pp. 1439-1451, 2011. [11] B.C.J. Moore, B.R. Glasberg, and T. Baer, "A model for the prediction of threshold, loudness, and partial loudness," J. Audio Eng. Soc., Vol. 45, pp. 224-240, 1997. [12] B.R. Glasberg and B.C.J. Moore, "Development and evaluation of a model for predicting the audibility of time varying sounds in the presence of the background sounds," J. Audio Eng. Soc., Vol. 53, pp. 906-918, 2005. [13] J. Paulus, C. Uhle, and J. Herre, "Perceived level of late reverberation in speech and music," in Proc, of the AES 130th Conv., 2011. [14] JL Verhey and SJ Heise, "Einfluss der Zeitstruktur des Hintergrundes auf die Tonhaltigkeit und Lautheit des tonalen Vordergrundes (in German)," in Proc, of DAGA, 2010. [15] C. Bradter and K. Hobohm, "Loudness calculation for individual acoustical objects within complex temporally variable sounds, "in Proc, of the AES 124th Conv., 2008. [16] International Telecommunication Union, Radiocommunication Assembly, "Algorithms to measure audio program loudness and true-peak audio level," Recommendation ITU-R BS. 1770, 2006, Geneva, Switzerland. [17] S. Hase, A. Takatsu, S. Sato, H. Sakai, and Y. Ando, "Reverberance of an existing hall in relation to both subsequent reverberation time and SPL," J. Sound Vib., Vol. 232, pp. 149-155, 2000. [18] D. Griesinger, "The importance of the direct to reverberant ratio in the perception of distance, localization, clarity, and envelopment," in Proc, of the AES 126th Conv., 2009. [19 ] C. Uhle, A. Walther, O. Hellmuth, and J. Herre, "Ambience separation from mono recordings using Non-5 negative Matrix Factorization," in Proc, of the AES 30th Conf., 2007.

权利要求:
Claims (14)
[0001]
1. Apparatus for determining a measurement for a perceived level of reverberation in a mixing signal consisting of a component of the direct signal (100) and a component of the reverberation signal (102), characterized by comprising: a processor of the sound model (104) comprising a perceptual filter stage to filter the dry signal component (100), the reverb signal component (102) or the mixing signal, the perceptual filter stage being configured to model a perception mechanism an entity's auditory to obtain a filtered direct signal, a filtered reverberation signal or a filtered mixing signal; a loudness estimator to estimate a first loudness measurement using the filtered direct signal and to estimate a second loudness measurement using the filtered reverberation signal or the filtered mixing signal, where the filtered mixing signal is derived from a component overlay the direct signal and the component of the reverberation signal; and a combiner (110) to combine the first and second loudness measurements (106, 108) to obtain a measurement (112) for the perceived level of reverberation.
[0002]
Apparatus according to claim 1, characterized in that the loudness estimator (104b) is configured to estimate the first loudness measurement so that the filtered direct signal is considered as a stimulus and the filtered reverberation signal is considered a noise , or to estimate the second loudness measurement (108) so that the filtered reverberation signal is considered as a stimulus and the filtered direct signal is considered as a noise.
[0003]
Apparatus according to claim 1 or 2, characterized in that the loudness estimator (104b) is configured to calculate the first loudness measurement as a loudness of the filtered direct signal or to calculate the second loudness measurement as a loudness of the signal filtered reverb or mixing signal.
[0004]
Apparatus according to any one of the preceding claims, characterized in that the combiner (110) is configured to calculate a difference using the first loudness measurement (106) and the second loudness measurement (108).
[0005]
Apparatus according to claim 1, characterized by comprising: a predictor (114) to predict the perceived level of reverberation based on an average value (904) of at least two measurements for the perceived loudness for different structures of the sign (k).
[0006]
Apparatus according to claim 5, characterized in that the predictor (114) is configured to use, in a forecast (900), a constant term (901, 903), a linear term depending on the average value (904) and a scale factor (902).
[0007]
Apparatus according to claim 5 or 6, characterized in that the constant term (903) depends on the reverberation parameter that describes the reverberation filter (606) used to generate the reverberation signal in an artificial reverberator.
[0008]
Apparatus according to any one of the preceding claims, characterized in that the filter stage comprises a time / frequency conversion stage (300), in which the sound estimator (104b) is configured to add (104c, 104d) the results obtained for a plurality of bands to derive the first and second loudness measurements (106, 108) for a broadband mix signal comprising the direct signal component and the reverb signal component.
[0009]
Apparatus according to any one of the preceding claims, characterized in that the filter stage (104a) comprises: an atrial transfer filter (302), an excitation pattern calculator (304) and a time integrator (306) to derive the filtered direct signal or filtered reverberation signal or filtered mixing signal.
[0010]
10. Method for determining a measurement for a perceived level of reverberation in a mixing signal consisting of a component of the direct signal (100) and a component of the reverberation signal (102), characterized by comprising: filtering (104) the component of the dry signal (100), the component of the reverberation signal (102) or the mixing signal, in which filtering is performed using a perceptual filter stage being confirmed to model an entity's auditory perception mechanism to obtain a signal filtered direct, a filtered reverberation signal or a filtered mixing signal; estimate a first sound measurement using the filtered direct signal; estimating a second loudness measurement using the filtered reverberation signal or the filtered mixing signal, where the filtered mixing signal is derived from an overlap of the direct signal component and the reverberation signal component; and combining (110) the first and second loudness measurements (106, 108) to obtain a measurement (112) for the perceived level of reverberation.
[0011]
11. Audio processor to generate a reverb signal (808) from a component of the direct signal (800), characterized by comprising: a reverberator (801) to reverberate the component of the direct signal (800) to obtain a component of the reverberated signal ( 806); an apparatus for determining a measurement for a perceived level of reverberation in the reverberated signal comprising the component of the direct signal and the component of the reverberated signal according to any one of claims 1 to 9; a controller (803) to receive the perceived level (R) generated by the apparatus (802) to determine a measurement of a perceived level of reverberation, and to generate a control signal (804) according to the perceived level and a target value ; a manipulator (805) for manipulating the dry signal component (800) or the reverb signal component (806) according to the control value (804); and a combiner (807) to combine the manipulated dry signal component and the manipulated reverb signal component, or to combine the dry signal component and the manipulated reverb signal component, or to combine the manipulated dry signal component and the reverb signal component to obtain the mixing signal (808).
[0012]
Apparatus according to claim 11, characterized in that the manipulator (805) comprises a weight to weight the component of the reverberation signal by a gain value, the gain value being determined by the control signal, or in which the reverberator ( 801) comprises a variable filter, the filter being variable in response to the control signal (804).
[0013]
Apparatus according to claim 12, characterized in that the reverberator (801) has a fixed filter, in which the manipulator (805) has the weight to generate the component of the manipulated reverberation signal, and in which the adder (807) is configured to add the component of the direct signal and the component of the reverb signal manipulated to obtain the mixed signal (808).
[0014]
14. Method for processing an audio signal to generate a reverb signal (808) from a component of the direct signal (800), characterized by comprising: reverberate (801) the component of the direct signal (800) to obtain a component of the reverberated signal (806); a method for determining a measurement for a perceived level of reverberation in the reverberated signal comprising the component of the direct signal and the component of the reverberated signal, according to claim 10; receiving the perceived level (R) generated by the method (802) to determine a measurement of a perceived level of reverberation, generating (803) a control signal (804) according to the perceived level and a target value; manipulating (805) the dry signal component (800) or the reverb signal component (806) according to the control value (804); and combining (807) the manipulated dry signal component and the manipulated reverb signal component, or combining the dry signal component and the manipulated reverb signal component, or combining the manipulated dry signal component and the signal component reverberation to obtain the mix signal (808).

类似技术:

公开号 | 公开日 | 专利标题

BR112013021855B1|2021-03-09|apparatus and method for determining a measurement for a perceived level of reverb, audio processor and method for processing a signal

Postma et al.2016|Perceptive and objective evaluation of calibrated room acoustic simulation auralizations

BR112012022571B1|2020-11-17|METHOD FOR FILTERING A MULTICAN AL AUDIO SIGNAL, SYSTEM TO IMPROVE THE SPEECH DETERMINED BY A MULTICAN AL AUDIO INPUT SIGNAL AND COMPUTER-READY MEDIA

Pätynen et al.2014|Concert halls with strong lateral reflections enhance musical dynamics

US10242692B2|2019-03-26|Audio coherence enhancement by controlling time variant weighting factors for decorrelated signals

BRPI0911456A2|2013-05-07|Method and apparatus for maintaining multi-channel audio speech audibility with minimal impact on immersive experience

Lee et al.2012|The effect of loudness on the reverberance of music: Reverberance prediction using loudness models

van Dorp Schuitman2011|Auditory modelling for assessing room acoustics

Eneman et al.2008|Evaluation of signal enhancement algorithms for hearing instruments

Vicente et al.2020|Further validation of a binaural model predicting speech intelligibility against envelope-modulated noises

Kates2017|Modeling the effects of single-microphone noise-suppression

Uhle et al.2011|Predicting the perceived level of late reverberation using computational models of loudness

Guski2015|Influences of external error sources on measurements of room acoustic parameters

Lee et al.2017|Comparison of psychoacoustic-based reverberance parameters

Lee et al.2018|Development of a clarity parameter using a time-varying loudness model

Poblete et al.2016|The Use of Locally Normalized Cepstral Coefficients | to Improve Speaker Recognition Accuracy in Highly Reverberant Rooms.

Impulse et al.2019|Implementation Of A Hybrid Reverb Algorithm

Aichinger et al.2009|Investigation of psychoacoustic principles for automatic mixdown algorithms

BR112021005050A2|2021-06-08|device and method for adapting virtual 3d audio to a real room

Natalie2018|Real-Time Binaural Auralization

van Dorp Schuitman et al.2013|Obtaining objective, content-specific room acoustical parameters using auditory modeling

Extra et al.2006|Artificial Reverberation: Comparing algorithms by using monaural analysis tools

Goldberg0|FINDING THE AUDIBILITY OF THE TEMPORAL DECAY RATE OF A LOW FREQUENCY ROOM MODE

同族专利:

公开号 | 公开日

EP2541542A1|2013-01-02|

RU2550528C2|2015-05-10|

BR112013021855A2|2018-09-11|

CN103430574B|2016-05-25|

AR085408A1|2013-10-02|

KR101500254B1|2015-03-06|

EP2681932A1|2014-01-08|

CA2827326C|2016-05-17|

ES2892773T3|2022-02-04|

WO2012116934A1|2012-09-07|

KR20130133016A|2013-12-05|

MX2013009657A|2013-10-28|

TW201251480A|2012-12-16|

RU2013144058A|2015-04-10|

CA2827326A1|2012-09-07|

AU2012222491B2|2015-01-22|

TWI544812B|2016-08-01|

JP2014510474A|2014-04-24|

CN103430574A|2013-12-04|

AU2012222491A1|2013-09-26|

EP2681932B1|2021-07-28|

US9672806B2|2017-06-06|

JP5666023B2|2015-02-04|

US20140072126A1|2014-03-13|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

US7644003B2|2001-05-04|2010-01-05|Agere Systems Inc.|Cue-based audio coding/decoding|

US7949141B2|2003-11-12|2011-05-24|Dolby Laboratories Licensing Corporation|Processing audio signals with head related transfer function filters and a reverberator|

US7583805B2|2004-02-12|2009-09-01|Agere Systems Inc.|Late reverberation-based synthesis of auditory scenes|

WO2006022248A1|2004-08-25|2006-03-02|Pioneer Corporation|Sound processing apparatus, sound processing method, sound processing program, and recording medium on which sound processing program has been recorded|

KR100619082B1|2005-07-20|2006-09-05|삼성전자주식회사|Method and apparatus for reproducing wide mono sound|

EP1761110A1|2005-09-02|2007-03-07|Ecole Polytechnique Fédérale de Lausanne|Method to generate multi-channel audio signals from stereo signals|

JP4175376B2|2006-03-30|2008-11-05|ヤマハ株式会社|Audio signal processing apparatus, audio signal processing method, and audio signal processing program|

JP4668118B2|2006-04-28|2011-04-13|ヤマハ株式会社|Sound field control device|

US8036767B2|2006-09-20|2011-10-11|Harman International Industries, Incorporated|System for extracting and changing the reverberant content of an audio input signal|

CN101816191B|2007-09-26|2014-09-17|弗劳恩霍夫应用研究促进协会|Apparatus and method for extracting an ambient signal|

EP2154911A1|2008-08-13|2010-02-17|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|An apparatus for determining a spatial output multi-channel audio signal|

BRPI0923174B1|2008-12-19|2020-10-06|Dolby International Ab|METHOD AND REVERBERATOR TO APPLY REVERBERATION TO AN AUDIO INPUT SIGNAL WITH DOWNMIXING OF CHANNELS|US9055374B2|2009-06-24|2015-06-09|Arizona Board Of Regents For And On Behalf Of Arizona State University|Method and system for determining an auditory pattern of an audio segment|

CN104982042B|2013-04-19|2018-06-08|韩国电子通信研究院|Multi channel audio signal processing unit and method|

EP2840811A1|2013-07-22|2015-02-25|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Method for processing an audio signal; signal processing unit, binaural renderer, audio encoder and audio decoder|

US9319819B2|2013-07-25|2016-04-19|Etri|Binaural rendering method and apparatus for decoding multi channel audio|

US10469969B2|2013-09-17|2019-11-05|Wilus Institute Of Standards And Technology Inc.|Method and apparatus for processing multimedia signals|

EP3062535B1|2013-10-22|2019-07-03|Industry-Academic Cooperation Foundation, Yonsei University|Method and apparatus for processing audio signal|

KR20210094125A|2013-12-23|2021-07-28|주식회사 윌러스표준기술연구소|Method for generating filter for audio signal, and parameterization device for same|

CN107770718B|2014-01-03|2020-01-17|杜比实验室特许公司|Generating binaural audio by using at least one feedback delay network in response to multi-channel audio|

EP3122073A4|2014-03-19|2017-10-18|Wilus Institute of Standards and Technology Inc.|Audio signal processing method and apparatus|

KR101856540B1|2014-04-02|2018-05-11|주식회사 윌러스표준기술연구소|Audio signal processing method and device|

US9407738B2|2014-04-14|2016-08-02|Bose Corporation|Providing isolation from distractions|

EP2980789A1|2014-07-30|2016-02-03|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for enhancing an audio signal, sound enhancing system|

RU2685999C1|2015-06-17|2019-04-23|Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф.|Volume control for user interactivity in the audio coding systems|

US9590580B1|2015-09-13|2017-03-07|Guoguang Electric Company Limited|Loudness-based audio-signal compensation|

EP3389183A1|2017-04-13|2018-10-17|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus for processing an input audio signal and corresponding method|

GB2561595A|2017-04-20|2018-10-24|Nokia Technologies Oy|Ambience generation for spatial audio mixing featuring use of original and extended signal|

US9820073B1|2017-05-10|2017-11-14|Tls Corp.|Extracting a common signal from multiple audio signals|

JP2021129145A|2020-02-10|2021-09-02|ヤマハ株式会社|Volume control device and volume control method|

法律状态:
2018-09-18| B15I| Others concerning applications: loss of priority|Free format text: PERDA DA PRIORIDADE US 61/448,444 DE 02/03/2011 REIVINDICADA NO PCT/US2012/053193 POR NAO ENVIO DE DOCUMENTO COMPROBATORIO DE CESSAO DA MESMA CONFORME AS DISPOSICOES PREVISTAS NA LEI 9.279 DE 14/05/1996 (LPI) ART. 166O, ITEM 27 DO ATO NORMATIVO 128/1997, ART. 28 DA RESOLUCAO INPI-PR 77/2013 E ART 3O DA IN 179 DE 21/02/2017 UMA VEZ QUE DEPOSITANTE CONSTANTE DA PETICAO DE REQUERIMENTO DO PEDIDO PCT E DISTINTO DAQUELE QUE DEPOSITOU A PRIORIDADE REIVINDICADA. |

2018-10-23| B12F| Other appeals [chapter 12.6 patent gazette]|

2019-08-06| B06F| Objections, documents and/or translations needed after an examination request according [chapter 6.6 patent gazette]|

2019-10-29| B06U| Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]|

2020-12-29| B09A| Decision: intention to grant [chapter 9.1 patent gazette]|

2021-03-09| B16A| Patent or certificate of addition of invention granted [chapter 16.1 patent gazette]|Free format text: PRAZO DE VALIDADE: 20 (VINTE) ANOS CONTADOS A PARTIR DE 24/02/2012, OBSERVADAS AS CONDICOES LEGAIS. |

优先权:

申请号 | 申请日 | 专利标题

US201161448444P| true| 2011-03-02|2011-03-02|

US61/448,444|2011-03-02|

EP11171488.7|2011-06-27|

EP11171488A|EP2541542A1|2011-06-27|2011-06-27|Apparatus and method for determining a measure for a perceived level of reverberation, audio processor and method for processing a signal|

PCT/EP2012/053193|WO2012116934A1|2011-03-02|2012-02-24|Apparatus and method for determining a measure for a perceived level of reverberation, audio processor and method for processing a signal|

[返回顶部]