Speaker based segmentation on broadcast news- on the use of ISI technique
DOI:
https://doi.org/10.36505/ExLing-2006/01/0042/000042Abstract
In this paper we propose a new segmentation technique called ISI or “Interlaced Speech Indexing”, developed and implemented for the task of broadcast news in-dexing. It consists in finding the identity of a well-defined speaker and the moments of his interventions inside an audio document, in order to access rapidly, directly and easily to his speech and then to his talk. Our segmentation procedure is based on an interlaced equidistant segmentation (IES) associated with our new ISI algorithm. This approach uses a speaker identification method based on Second Order Statisti-cal Measures. As SOSM measures, we choose the “µGc” one, which is based on the covariance matrix. However, experiments showed that this method needs, at least, a speech length of 2 seconds, which means that the segmentation resolution will be 2 seconds. By combining the SOSM with the new Indexing technique (ISI), we dem-onstrate that the average segmentation error is reduced to only 0.5 second, which is more accurate and more interesting for real-time applications. Results indicate that this association provides a high resolution and a high tracking performance: the in-dexing score (percentage of correctly labelled segments) is 95% on TIMIT database and 92.4% on Hub4 Broadcast news 96 database.
References
Bimbot F. et al. 1995. Second-Order Statistical measures for text-independent Broadcaster Identification. Speech Communication, 17, 177-192.
Bonastre J.F. et al. 2000. A speaker tracking system based on speaker turn detection for NIST evaluation. IEEE ICASSP, Istanbul, june 2000.
Delacourt P. et al. 2000. DISTBIC: a speaker-based segmentation for audio data indexing, Speech Communication, 32, Issue 1-2.
Gish H. 1990. Robust discrimination in automatic speaker identification. IEEE Inter. Conference on Acoustics Speech and Signal Processing. April 90, New Mexico, 289-292.
Liu D., and Kubala F. 1999, “Fast speaker change detection for broadcast news transcription and indexing”. Eurospeech, 1999. Vol. 3, 1031-1034.
Reynolds D.A. et al. 1998, “Blind clustering of speech utterances based on speaker and language characteristics”. ICSLP, 1998. Vol. 7, 3193-3196.
Downloads
Published
Issue
Section
License
Copyright (c) 2006 S. Ouamour (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.
Articles are published under the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.