음성 인식 오픈 API의 음성 인식 정확도 비교 분석

최승주; 김종배

서지반출

국문초록

음성인식기술은 마이크와 같은 소리 센서를 통해 얻은 음향학적 신호를 단어나 문장으로 변환시키 는 기술을 말한다. 이 기술과 인공지능을 결합한 음성 대화 시스템은 차세대 인터페이스로 주목받고 있으며, 스마트폰, 스마트TV, 자동차 등 다양한 분야에서 사용되고 있다. 최근에는 삼성전자에서 인공 지능과 음성인식을 결합한 ‘빅스비’를 출시하였으며, Google, Naver 등 다양한 기업들은 음성인식기술 을 오픈 API로 제공하고 있다. 본 논문에서는 대표적인 음성 인식 오픈 API 3개를 선택하여 각 특징 을 비교 분석한다. 또한 한 3번의 실험을 통해 모바일 환경에서 각 음성인식 API별 인식률을 비교하 였다. 첫 번째로 숫자 인식을 실험하였고, 두 번째로는 가나다 한글 인식을 실험하였다. 세 번째 실험 에서는 모바일 음성인식 프로그램에서 쓰이는 대표적인 명령 문장을 입력하여 문장 인식 실험을 진 행하였다. 이러한 비교실험을 통해 한국어를 지원하는 음성인식 오픈 API의 선택 기준을 제시하여 상황별로 적절한 API를 사용하는 데에 도움을 줄 수 있을 것으로 기대한다.

영문초록

Speech recognition technology is transformation skill using sound sensor such as microphone to transfer the acoustical signal to words or sentence. Speech conversation system using this technology and artificial intelligence is receiving attention as next generation of interface, and it is used in variable areas like smartphone, smart TV, car and so on. Recently, Samsung released ‘Bixby’ which is speech conversation program with artificial intelligence, and a lot of company such as Google and Naver are providing speech recognition open API. In this paper, we select three typical APIs and do comparison analysis of APIs’ features. In addition to that, we do three experiment in mobile for analysis of APIs’ speech recognition accuracy. First, we test number recognition. In second test, we test Korean word recognition. Lastly, we test sentence recognition with mobile instruction sentence. With result, we expect developers can select appropriate speech recognition open API in each situation.

키워드

음성 인식 음성 대화 시스템 음성 이해 오픈소스 오픈 API

구매하기 (3,000)

장바구니

국문초록

영문초록

목차

키워드