Speech contains information about the identity of the speaker. A speech signal includes also the language this is spoken, the presence and type of speech pathologies, the physical and emotional state of the speaker. Often, humans are able to extract the identity information when the speech comes from a speaker they are acquainted with.
A speech signal is a very complex function of the speaker and his environment that can be captured easily with a standard microphone. In contradiction to a physical biometric technology such as fingerprint, in speaker recognition are not fixed, no static and no physical characteristics. In speaker recognition there are only information depending on an act. The state of-the-art approach to Speaker Identification (SID) is to build a stochastic model of a speaker, based on speaker characteristics extracted from the available amount of training speech.
Identify speakers instantly and with the highest accuracy based on the unique characteristics of their voice and improve their authentication experiences by reducing unnecessary security passwords.
Is language-, accent-, text-, and channel-independent.
Identify and segment speaker changes through either separate audio channels or via advanced speaker diarization (the separation of audio streams into homogeneous segments for each speaker) on single audio channels.
Index timestamps in parallel with words spoken for fast metadata retrieval of an individual keyword or group of phrases inside audio files.