SpeechQoE: A Novel Personalized QoE Assessment Model for Voice Services via Speech Sensing
Abstract
Quality of Experience (QoE) assessment is a long-lasting but yet-tobe-resolved task. Existing approaches, especially for conversational
voice services, are restricted to leveraging network-centric parameters. However, their performances are hardly satisfactory due to the
failure to consider comprehensive QoE-related factors. Moreover,
they develop a one-for-all model that is uniform for all individuals
and thus incapable of handling user diversity in QoE perception.
This paper proposes a personalized QoE assessment model, namely
SpeechQoE. It exploits speaker’s speech signals to infer individual’s perceived quality in voice services. SpeechQoE fundamentally
addresses the drawback of conventional models. Instead of enumerating and incorporating unlimited QoE-related factors, SpeechQoE
takes as input speech signals that inherently bear rich information
needed for QoE assessment of the speaker. SpeechQoE employs
an efficient few-shot learning framework to adapt the model to a
new user quickly. We additionally design a lightweight data synthetic scheme to minimize the overhead of data collection needed
for model adaption. A modular integration with a conventional
parametric model is further implemented to avoid issues caused
by the clean-slate data-driven approach. Our experiments show
that SpeechQoE achieves an accuracy of 91.4% in QoE assessment
which outperforms the state-of-the-art solutions by a clear margin.
As another contribution of this work, we build a dataset that would
be the first source of annotated audio tracks for QoE assessment of
conversational calls.