-
Levesque Nygaard posted an update 3 weeks, 4 days ago
This study examines the use of Gaussian process (GP) regression for sound field reconstruction. GPs enable the reconstruction of a sound field from a limited set of observations based on the use of a covariance function (a kernel) that models the spatial correlation between points in the sound field. Significantly, the approach makes it possible to quantify the uncertainty on the reconstruction in a closed form. In this study, the relation between reconstruction based on GPs and classical reconstruction methods based on linear regression is examined from an acoustical perspective. Purmorphamine order Several kernels are analyzed for their potential in sound field reconstruction, and a hierarchical Bayesian parameterization is introduced, which enables the construction of a plane wave kernel of variable sparsity. The performance of the kernels is numerically studied and compared to classical reconstruction methods based on linear regression. The results demonstrate the benefits of using GPs in sound field analysis. The hierarchical parameterization shows the overall best performance, adequately reconstructing fundamentally different sound fields. The approach appears to be particularly powerful when prior knowledge of the sound field would not be available.Frequency-differencing, or autoproduct processing, techniques are one area of research that has been found to increase the robustness of acoustic array signal processing algorithms to environmental uncertainty. Previous studies have shown that frequency differencing techniques are able to mitigate problems associated with environmental mismatch in source localization techniques. While this method has demonstrated increased robustness compared to conventional methods, many of the metrics, such as ambiguity surface peak values and dynamic range, are lower than would typically be expected for the observed level of robustness. These previous studies have suggested that such metrics are reduced by the inherent nonlinearity of the frequency-differencing method. In this study, simulations of simple multi-path environments are used to analyze this nonlinearity and signal processing techniques are proposed to mitigate the effects of this problem. These methods are used to improve source localization metrics, particularly ambiguity surface peak value and dynamic range, in two experimental environments a small laboratory water tank and in a deep ocean (Philippine Sea) environment. The performance of these techniques demonstrates that many source localization metrics can be improved for frequency-differencing methods, which suggests that frequency-differencing methods may be as robust as previous studies have shown.A suite of methodologies is presented to compute shear wave dispersion in incompressible waveguides encountered in biomedical imaging; plate, tube, and general prismatic waveguides, all immersed in an incompressible fluid, are considered in this effort. The developed approaches are based on semi-analytical finite element methods in the frequency domain with a specific focus on the complexities associated with the incompressibility of the solid media as well as the simplification facilitated by the incompressibility of the surrounding fluid. The proposed techniques use the traditional idea of selective reduced integration for the solid medium and the more recent idea of perfectly matched discrete layers for the surrounding fluid. Also, used is the recently developed complex-length finite element method for platelike structures. Several numerical examples are presented to illustrate the practicality and effectiveness of the developed techniques in computing shear wave dispersion in a variety of waveguides.Reverberation is essential for the realistic auralisation of enclosed spaces. However, it can be computationally expensive to render with high fidelity and, in practice, simplified models are typically used to lower costs while preserving perceived quality. Ambisonics-based methods may be employed to this purpose as they allow us to render a reverberant sound field more efficiently by limiting its spatial resolution. The present study explores the perceptual impact of two simplifications of Ambisonics-based binaural reverberation that aim to improve efficiency. First, a “hybrid Ambisonics” approach is proposed in which the direct sound path is generated by convolution with a spatially dense head related impulse response set, separately from reverberation. Second, the reverberant virtual loudspeaker method (RVL) is presented as a computationally efficient approach to dynamically render binaural reverberation for multiple sources with the potential limitation of inaccurately simulating listener’s head rotations. Numerical and perceptual evaluations suggest that the perceived quality of hybrid Ambisonics auralisations of two measured rooms ceased to improve beyond the third order, which is a lower threshold than what was found by previous studies in which the direct sound path was not processed separately. Additionally, RVL is shown to produce auralisations with comparable perceived quality to Ambisonics renderings.Speech plays an important role in human-computer emotional interaction. FaceNet used in face recognition achieves great success due to its excellent feature extraction. In this study, we adopt the FaceNet model and improve it for speech emotion recognition. To apply this model for our work, speech signals are divided into segments at a given time interval, and the signal segments are transformed into a discrete waveform diagram and spectrogram. Subsequently, the waveform and spectrogram are separately fed into FaceNet for end-to-end training. Our empirical study shows that the pretraining is effective on the spectrogram for FaceNet. Hence, we pretrain the network on the CASIA dataset and then fine-tune it on the IEMOCAP dataset with waveforms. It will derive the maximum transfer learning knowledge from the CASIA dataset due to its high accuracy. This high accuracy may be due to its clean signals. Our preliminary experimental results show an accuracy of 68.96% and 90% on the emotion benchmark datasets IEMOCAP and CASIA, respectively.