CLOVA Speech 장문 인식 API

인쇄
공유
PDF

CLOVA Speech 장문 인식 API

인쇄
공유
PDF

기사 요약

이 요약이 도움이 되었나요?

의견을 보내 주셔서 감사합니다.

version

Version	Date	Changes
v1.0.0	2020-09-17	최초 작성
V1.1.0	2020-11-18	boostings,forbiddens 추가
V1.2.0	2021-04-08	화자 인식 기능 추가
V1.3.0	2021-05-27	영어 인식 기능 추가
V1.4.0	2021-07-22	한/영 동시 인식 기능 추가
V1.5.0	2021-11-25	비동기 모드 지원
V1.6.0	2022-02-17	일본어 인식 기능 추가
V1.7.0	2022-06-08	domain boosting support
V1.8.0	2022-10-20	중문 번체, 간체 인식 기능 추가
V1.9.0	2022-12-15	noiseFiltering support
V2.0.0	2024.03.21.	자막 추출 기능 추가

요청

Method	Request URI
POST	CLOVA Speech 도메인에서 생성된 API Gateway의 InvokeURL로 호출합니다. 각 도메인마다 고유의 호출 URL이 생성됩니다.

CLOVA Speech API 사용법

CLOVA Speech API 는 세 가지 방법 중 선택할 수 있습니다.

object storage 파일 url로 인식 요청

: object storage에 저장되어 있는 파일의 고유 url을 이용합니다. (인식을 원하는 파일은 먼저 object storage에 업로드 되어 있어야 합니다)

외부 url로 인식 요청

: 외부에서 접속가능한 파일의 고유 url을 이용합니다.

로컬의 파일 업로드해서 요청

: 파일시스템의 경로를 이용합니다.

인식요청 후 응답은 두 가지 방식으로 응답이 가능합니다.

sync

sync 로 요청 시, 인식 완료되면 response 결과(json)을 받을 수 있고

async
async로 요청 시, 요청 시 입력한 Callback url 주소 또는 ResultToObs(ObjectStorage)에 인식 결과를 리턴합니다.

Callback url	resultToObs(ObjectStorage)	result
URL 주소 있음(O)	True	Callback url과 ObjectStorage 모두 결과 리턴
URL 주소 있음(O)	False	Callback url에만 결과 리턴
URL 주소 없음(X)	True	ObjectStorage에만 결과 리턴
URL 주소 없음(X)	False	에러 리턴

1. object storage 파일 url로 인식 요청

: object storage에 저장되어 있는 파일의 고유 url을 이용합니다.
(인식을 원하는 파일은 먼저 object storage에 업로드 되어 있어야 합니다)

POST /recognizer/object-storage

recognize media from object storage

Method	Request URI
POST	${Invoke URL}/recognizer/object-storage

요청 헤더

헤더명	설명
Content-Type	`application/json`

요청 바디

name	desc	type	requirement	value	default
dataKey	인식을 원하는 파일의 ObjectStorage 경로에 접근하기 위한 Key	string	required
language	language	string	required	ko-KR, en-US, enko, ja, zh-cn, zh-tw	ko-KR
completion	sync, async 방식 중 선택	string	optional		async
callback	Callback 부분 참조	string	optional
userdata	json object	object	optional
wordAlignment	인식 결과에 word alignment 출력	boolean	optional		true
fullText	전체 인식 결과 텍스트를 출력	boolean	optional		true
resultToObs	도메인 생성시 선택한 저장소(object storage)에 결과 저장	boolean	optional		false
noiseFiltering	노이즈 필터링 여부	boolean	optional		true
boostings	boosting object array	array	optional
boostings.words	comma separated words	string	optional
useDomainBoostings	use domain boostings	boolean	optional		false
forbiddens	comma separated words	string	optional
diarization	화자 인식(diarization)에 대한 설정	object	optional
diarization.enable	화자 인식 여부	boolean	optional		true
format	response format	string	optional	JSON, SRT, SMI	JSON

Example (cURL shell)

curl --location --request POST '${Invoke URL}/recognizer/object-storage' \
--header 'X-CLOVASPEECH-API-KEY: ${Secret Key}' \
--header 'Content-Type: application/json' \
--data-raw '{
  "language": "ko-KR",
  "callback": "http://example/callback",
  "userdata": {
    "dataId": "1"
  },
  "boostings": [
  	{
  		"words": "comma separated words"
  	}
  ],
  "forbiddens": "comma separated words",
  "completion":"async",
  "dataKey": "data/sample.wav"
}'

Response: refer to Common Response

2. 외부 url로 인식 요청

외부에서 접속가능한 파일의 고유 url을 이용합니다.

POST /recognizer/url

recognize media from URL

Method	Request URI
POST	${Invoke URL}/recognizer/url

요청 헤더

헤더명	설명
Content-Type	`application/json`

요청 바디

name	desc	type	requirement	value	default
url	the media URL	string	required
language	language	string	required	ko-KR, en-US, enko, ja, zh-cn, zh-tw	ko-KR
completion	sync 방식과 async 방식 중 선택	string	optional		async
callback	Callback 부분 참조	string	optional
userdata	json object	object	optional
wordAlignment	인식 결과에 word alignment 를 출력	boolean	optional		true
fullText	전체 인식 결과 텍스트를 출력	boolean	optional		true
resultToObs	도메인 생성시 선택한 저장소(object storage)에 결과 저장	boolean	optional		false
noiseFiltering	노이즈 필터링 여부	boolean	optional		true
boostings	boosting object array	array	optional
boostings.words	comma separated words	string	optional
useDomainBoostings	use domain boostings	boolean	optional		false
forbiddens	comma separated words	string	optional
diarization	화자 인식(diarization)에 대한 설정	object	optional
diarization.enable	화자 인식 여부	boolean	optional		true
format	response format	string	optional	JSON, SRT, SMI	JSON

키워드 부스팅
- 인식 확률을 높이고 싶은 키워드 리스트들을 API 요청 바디에 포함시킬 수 있습니다.
- 요청 바디의 params.boostings , params.boostings.words 필드를 참조합니다.
- 지원하는 부스팅 가능한 키워드의 개수는 최대 1,000건 입니다.
- 부스팅 가능한 문자는 한글, 영어만 지원합니다.
- 네, 응, no와 같은 1음절의 단어는 오인식의 위험이 있어 부스팅을 지원하지 않습니다.
- 영문 인식 결과는 기본적으로 소문자로 변환되지만, 대문자 키워드를 부스팅 요청하는 경우 대문자로 치환됩니다.
- 띄어쓰기 여부와 관계 없이 부스팅처리합니다.
  예를들어 "클로바스피치"와 "클로바 스피치" 중 한 개의 키워드만 부스팅 요청하면 됩니다.
- 키워드 길이의 제약은 없지만, 부스팅할 대상이 여러 단어들이 조합된 구문일 경우 해당 구문이 아니면 부스팅의 영향을 받기 어렵습니다. 예를들어 "클로바 스피치"라고 키워드를 부스팅하면 "클로바 스피치"가 포함된 모든 문장은 부스팅의 영향을 받게됩니다. 반면에 "클로바 스피치의 미디어 음성인식 기술" 이라는 조합된 긴 길이의 키워드를 부스팅하면 "클로바 스피치"만 들어간 문장은 부스팅의 영향을 받기 어렵습니다.
민감 키워드 디텍팅
- 인식 결과에 표시하고 싶지 않은 키워드 리스트들을 API 요청 바디에 포함시킬 수 있습니다.
- 요청 바디의 params.forbiddens 필드를 참조합니다.
- 민감 키워드의 개수 및 길이의 제한은 없습니다.
- 띄어쓰기, 영문 대소문자 모두 완전 일치한 경우에만 디텍팅 처리가 가능합니다.

Example (cURL shell)

curl --location --request POST '${Invoke URL}/recognizer/url' \
--header 'X-CLOVASPEECH-API-KEY: ${Secret Key}' \
--header 'Content-Type: application/json' \
--data-raw '{
  "language": "ko-KR",
  "callback": "http://example/callback",
  "userdata": {
    "dataId": "1"
  },
  "boostings": [
  {
    "words": "comma separated words"
  }],
  "forbiddens": "comma separated words",
  "completion":"async",
  "url": "https://kr.object.ncloudstorage.com/nest/data/IMG_3866.mp4"
}'

Response: refer to Common Response

3. 로컬의 파일 업로드해서 요청

로컬 파일시스템의 경로를 이용합니다.

POST /recognizer/upload

upload a media for recognize

Method	Request URI
POST	${Invoke URL}/recognizer/upload

요청 헤더

헤더명	설명
Content-Type	`multipart/form-data`

요청 바디

name	desc	type	requirement	value	default
media	the media file	file	required
params		object	required
params.language	language	string	required	ko-KR, en-US, enko, ja, zh-cn, zh-tw	ko-KR
params.completion	sync, async	string	optional		async
params.callback	refer to Callback	string	optional
params.userdata	json object	object	optional
params.wordAlignment	인식 결과에 word alignment 를 출력	boolean	optional		true
params.fullText	전체 인식 결과 텍스트를 출력	boolean	optional		true
params.resultToObs	도메인 생성시 선택한 저장소(object storage)에 결과 저장	boolean	optional		false
params.noiseFiltering	노이즈 필터링 여부	boolean	optional		true
params.boostings	boosting object array	array	optional
params.boostings.words	comma separated words	string	optional
params.useDomainBoostings	use domain boostings	boolean	optional		false
params.forbiddens	comma separated words	string	optional
params.diarization	화자 인식(diarization)에 대한 설정	object	optional
params.diarization.enable	화자 인식 여부	boolean	optional		true
format	response format	string	optional	JSON, SRT, SMI	JSON

키워드 부스팅
- 인식 확률을 높이고 싶은 키워드 리스트들을 API 요청 바디에 포함시킬 수 있습니다.
- 요청 바디의 params.boostings , params.boostings.words 필드를 참조합니다.
- 지원하는 부스팅 가능한 키워드의 개수는 최대 1,000건 입니다.
- 부스팅 가능한 문자는 한글, 영어, 일본어, 중문, 숫자만 지원합니다.
- 영문 인식 결과는 기본적으로 소문자로 변환되지만, 대문자 키워드를 부스팅 요청하는 경우 대문자로 치환됩니다.
- 띄어쓰기 여부와 관계 없이 부스팅처리합니다.
  예를들어 "클로바스피치"와 "클로바 스피치" 중 한 개의 키워드만 부스팅 요청하면 됩니다.
- 키워드 길이의 제약은 없지만, 부스팅할 대상이 여러 단어들이 조합된 구문일 경우 해당 구문이 아니면 부스팅의 영향을 받기 어렵습니다. 예를들어 "클로바 스피치"라고 키워드를 부스팅하면 "클로바 스피치"가 포함된 모든 문장은 부스팅의 영향을 받게됩니다. 반면에 "클로바 스피치의 미디어 음성인식 기술" 이라는 조합된 긴 길이의 키워드를 부스팅하면 "클로바 스피치"만 들어간 문장은 부스팅의 영향을 받기 어렵습니다.
민감 키워드 디텍팅
- 인식 결과에 표시하고 싶지 않은 키워드 리스트들을 API 요청 바디에 포함시킬 수 있습니다.
- 요청 바디의 params.forbiddens 필드를 참조합니다.
- 민감 키워드 디텍팅 개수 및 길이의 제한은 없습니다.
- 띄어쓰기, 영문대소문자 모두 완전 일치한 경우에만 디텍팅 처리가 가능합니다.

Example (cURL shell)

curl --location --request POST '${Invoke URL}/recognizer/upload' \
--header 'X-CLOVASPEECH-API-KEY: ${Secret Key}' \
--form 'media=@/video/sample.wav' \
--form 'params={"language":"ko-KR","completion":"sync","callback":"http://localhost:9010","forbiddens":"comma separated words","boostings":[{"words": "comma separated words"}]};type=application/json'

Response: refer to Common Response

응답

인식요청 후 응답은 두 가지 방식으로 응답이 가능합니다.

sync

sync 로 요청 시, 인식 완료되면 response 결과(json)을 받을 수 있고

async

async로 요청 시, 요청 시 입력한 Callback url 주소 또는 ResultToObs(ObjectStorage)에 인식 결과를 리턴합니다.

Callback url	resultToObs(ObjectStorage)	result
URL 주소 있음(O)	True	Callback url과 ObjectStorage 모두 결과 리턴
URL 주소 있음(O)	False	Callback url에만 결과 리턴
URL 주소 없음(X)	True	ObjectStorage에만 결과 리턴
URL 주소 없음(X)	False	에러 리턴

Callback

요청 헤더

헤더명	설명
Content-Type	application/application-json; charset=utf-8

Method
Method
POST
Body
- Same as Common Response(sync)

Method
POST

4. Get job status

GET /recognizer/{token}

Get the status of async request

Method	Request URI
GET	${Invoke URL}/recognizer/{token}

요청 헤더

헤더명	설명
Content-Type	`application/json`

요청 바디

name	desc	type	requirement	value	default
token	token	string	required

Example (cURL shell)

curl --location --request GET '${Invoke URL}/recognizer/ceb77af3dae44a6c8c4de3dce519140a' \
--header 'X-CLOVASPEECH-API-KEY: ${Secret Key}'

Response

{
    "token": "ceb77af3dae44a6c8c4de3dce519140a",
    "result": "PROCESSING"
}

result:

WAITING
PROCESSING
FAILED
COMPLETED
TIMEOUT

Common Response

Response(async)

{
    "token": "a951af6a1015466bae2c926177f26310",
    "result": "SUCCEEDED",
    "message": "Succeeded"
}

Response(sync)

{
    "result": "COMPLETED",
    "message": "Succeeded",
    "token": "a951af6a1015466bae2c926177f26310",
    "version": "ncp_v2_b28559f_78416aa_20210311_",
    "params": {
        "service": "ncp",
        "domain": "general",
        "completion": "sync",
        "callback": "",
        "diarization": {
            "enable": true,
            "speakerCountMin": -1,
            "speakerCountMax": -1
        },
        "boostings": [
            {
                "words": "안녕하세요, 테스트"
            }
        ],
        "forbiddens": "",
        "wordAlignment": true,
        "fullText": true,
        "noiseFiltering": true,
        "resultToObs": false,
        "priority": 0,
        "userdata": {
            "_ncp_DomainCode": "NEST",
            "_ncp_DomainId": 1,
            "_ncp_TaskId": 7218,
            "_ncp_TraceId": "c316e0a367bf49f4b3d819538178ac11"
        }
    },
    "progress": 100,
    "segments": [
        {
            "start": 0,
            "end": 1110,
            "text": "크게 파스.",
            "confidence": 0.2,
            "diarization": {
                "label": "1"
            },
            "speaker": {
                "label": "1",
                "name": "A"
            },
            "words": [
                [
                    160,
                    440,
                    "크게"
                ],
                [
                    520,
                    1080,
                    "파스."
                ]
            ],
            "textEdited": "크게 파스."
        }
    ],
    "text": "크게 파스.",
    "confidence": 0.2,
    "speakers": [
        {
            "label": "1",
            "name": "A"
        }
    ]
}

Body

field	desc	type
`result`	결과 코드	string
`message`	결과 메시지	string
`token`	결과 토큰	string
`version`	엔진 버전	string
`params`	파라미터	object
`params: service`	서비스코드	string
`params: domain`	도메인	string
`params: lang`	인식언어	string
`params: completion`	요청방식	string
`params: diarization`	화자분리 정보	object
`params: diarization.enable`	화자분리 사용 여부	boolean
`params: diarization.speakerCountMin`	최소 화자 수	number
`params: diarization.speakerCountMax`	최대 화자 수	number
`params: boostings`	부스팅 정보	array
`params: boostings: words`	부스팅 키워드	string
`params: forbiddens`	민감 키워드	string
`params: fullText`	전체 인식 결과 텍스트 출력 여부	boolean
`params: noiseFiltering`	노이즈필터링 여부	boolean
`params: resultToObs`	Object Storage 저장 여부	boolean
`params: segment`	세그먼트	string
`params: morpheme`	형태소	string
`params: completion`	동기 비동기	string
`params: userdata`	유저데이터	object
`segments`	세그먼트 정보	array
`segments: start`	세그먼트 시작 시각 (ms)	number
`segments: end`	세그먼트 종료 시각(ms)	number
`segments: text`	세그먼트 텍스트	string
`segments: textEdited`	수정 내용	string
`segments: diarization`	인식된 화자	object
`segments: diarization.label`	인식화자 Number	string
`segments: speaker`	변경된 화자	object
`segments: speaker.label`	변경화자 Number	string
`segments: speaker.name`	변경화자명	string
`segments: confidence`	세그먼트 컨피던스 (0.0 ~ 1.0)	number
`segments: words`	세그먼트 어절	array
`segments: words: [0]`	세그먼트 어절 시간 시간 (ms)	number
`segments: words: [1]`	세그먼트 어절 종료 시간 (ms)	number
`segments: words: [2]`	세그먼트 어절 텍스트	string
`text`	전체 텍스트	string
`confidence`	전체 컨피던스	number

SRT format

1
00:00:00,000 --> 00:00:01,425 
A: 저 얼마 전에

2
00:00:02,533 --> 00:00:11,550 
A: 옥수수를 먹었거든요. 정말 달고 맛있던데요 그런데 나는 그게 동네 이름인 줄 알았어.

3
00:00:11,550 --> 00:00:19,025 
A: 초사이어있을 때 그 초자에다가 달다는 당이었어 몰랐어요. 몰랐어. 나는 초당이 초당두부 이런 건 줄 알았어.

4
00:00:19,025 --> 00:00:26,317 
C: 사카린 생각했죠. 조금. 작가님 초당으로 드셨는데.

5
00:00:26,317 --> 00:00:28,240 
A: 옥수수에요?

6
00:00:28,240 --> 00:00:35,318 
B: 아니 지금 두부 단 맛나는 두부가 어디 있어. 지금 이 도는 이해를 못 해. 상도면 초당 지역 아니야?

7
00:00:35,318 --> 00:00:42,800 
A: 아니. 초당 옥수수는 슈퍼 스위트란 뜻이었어. 초 달다고 아무도 이해 못했어요. 지금.

SMI format

<SAMI>
<Body>
  <SYNC Start=0>
    <P>A: 저 얼마 전에
  <SYNC Start=2533>
    <P>A: 옥수수를 먹었거든요. 정말 달고 맛있던데요 그런데 나는 그게 동네 이름인 줄 알았어.
  <SYNC Start=11550>
    <P>A: 초사이어있을 때 그 초자에다가 달다는 당이었어 몰랐어요. 몰랐어. 나는 초당이 초당두부 이런 건 줄 알았어.
  <SYNC Start=19025>
    <P>C: 사카린 생각했죠. 조금. 작가님 초당으로 드셨는데.
  <SYNC Start=26317>
    <P>A: 옥수수에요?
  <SYNC Start=28240>
    <P>B: 아니 지금 두부 단 맛나는 두부가 어디 있어. 지금 이 도는 이해를 못 해. 상도면 초당 지역 아니야?
  <SYNC Start=35318>
    <P>A: 아니. 초당 옥수수는 슈퍼 스위트란 뜻이었어. 초 달다고 아무도 이해 못했어요. 지금.
</Body>
</SAMI>

Examples

Java

dependency

<dependency>
    <groupId>org.apache.httpcomponents</groupId>
    <artifactId>httpclient</artifactId>
    <version>4.5.12</version>
</dependency>
<dependency>
    <groupId>org.apache.httpcomponents</groupId>
    <artifactId>httpmime</artifactId>
    <version>4.3.1</version>
</dependency>
<dependency>
    <groupId>com.google.code.gson</groupId>
    <artifactId>gson</artifactId>
    <version>2.8.5</version>
</dependency>

ClovaSpeechClient

package org.example.clovaspeech.client;

import java.io.File;
import java.nio.charset.StandardCharsets;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

import org.apache.http.Header;
import org.apache.http.HttpEntity;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.entity.ContentType;
import org.apache.http.entity.StringEntity;
import org.apache.http.entity.mime.MultipartEntityBuilder;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.message.BasicHeader;
import org.apache.http.util.EntityUtils;

import com.google.gson.Gson;

public class ClovaSpeechClient {

    // Clova Speech secret key
	private static final String SECRET = "";
    // Clova Speech invoke URL
	private static final String INVOKE_URL = "";

	private CloseableHttpClient httpClient = HttpClients.createDefault();
	private Gson gson = new Gson();

	private static final Header[] HEADERS = new Header[] {
		new BasicHeader("Accept", "application/json"),
		new BasicHeader("X-CLOVASPEECH-API-KEY", SECRET),
	};

    	public static class Boosting {
		private String words;

		public String getWords() {
			return words;
		}

		public void setWords(String words) {
			this.words = words;
		}
	}

	public static class Diarization {
		private Boolean enable = Boolean.FALSE;
		private Integer speakerCountMin;
		private Integer speakerCountMax;

		public Boolean getEnable() {
			return enable;
		}

		public void setEnable(Boolean enable) {
			this.enable = enable;
		}

		public Integer getSpeakerCountMin() {
			return speakerCountMin;
		}

		public void setSpeakerCountMin(Integer speakerCountMin) {
			this.speakerCountMin = speakerCountMin;
		}

		public Integer getSpeakerCountMax() {
			return speakerCountMax;
		}

		public void setSpeakerCountMax(Integer speakerCountMax) {
			this.speakerCountMax = speakerCountMax;
		}
	}

	public static class NestRequestEntity {
		private String language = "ko-KR";
		//completion optional, sync/async
		private String completion = "sync";
		//optional, used to receive the analyzed results
		private String callback;
		//optional, any data
		private Map<String, Object> userdata;
		private Boolean wordAlignment = Boolean.TRUE;
		private Boolean fullText = Boolean.TRUE;
		//boosting object array
		private List<Boosting> boostings;
		//comma separated words
		private String forbiddens;
		private Diarization diarization;

		public String getLanguage() {
			return language;
		}

		public void setLanguage(String language) {
			this.language = language;
		}

		public String getCompletion() {
			return completion;
		}

		public void setCompletion(String completion) {
			this.completion = completion;
		}

		public String getCallback() {
			return callback;
		}

		public Boolean getWordAlignment() {
			return wordAlignment;
		}

		public void setWordAlignment(Boolean wordAlignment) {
			this.wordAlignment = wordAlignment;
		}

		public Boolean getFullText() {
			return fullText;
		}

		public void setFullText(Boolean fullText) {
			this.fullText = fullText;
		}

		public void setCallback(String callback) {
			this.callback = callback;
		}

		public Map<String, Object> getUserdata() {
			return userdata;
		}

		public void setUserdata(Map<String, Object> userdata) {
			this.userdata = userdata;
		}

		public String getForbiddens() {
			return forbiddens;
		}

		public void setForbiddens(String forbiddens) {
			this.forbiddens = forbiddens;
		}

		public List<Boosting> getBoostings() {
			return boostings;
		}

		public void setBoostings(List<Boosting> boostings) {
			this.boostings = boostings;
		}

		public Diarization getDiarization() {
			return diarization;
		}

		public void setDiarization(Diarization diarization) {
			this.diarization = diarization;
		}
	}

	/**
	 * recognize media using URL
	 * @param url required, the media URL
	 * @param nestRequestEntity optional
	 * @return string
	 */
	public String url(String url, NestRequestEntity nestRequestEntity) {
		HttpPost httpPost = new HttpPost(INVOKE_URL + "/recognizer/url");
		httpPost.setHeaders(HEADERS);
		Map<String, Object> body = new HashMap<>();
		body.put("url", url);
		body.put("language", nestRequestEntity.getLanguage());
		body.put("completion", nestRequestEntity.getCompletion());
		body.put("callback", nestRequestEntity.getCallback());
		body.put("userdata", nestRequestEntity.getCallback());
		body.put("wordAlignment", nestRequestEntity.getWordAlignment());
		body.put("fullText", nestRequestEntity.getFullText());
		body.put("forbiddens", nestRequestEntity.getForbiddens());
		body.put("boostings", nestRequestEntity.getBoostings());
		body.put("diarization", nestRequestEntity.getDiarization());
		HttpEntity httpEntity = new StringEntity(gson.toJson(body), ContentType.APPLICATION_JSON);
		httpPost.setEntity(httpEntity);
		return execute(httpPost);
	}

	/**
	 * recognize media using Object Storage
	 * @param dataKey required, the Object Storage key
	 * @param nestRequestEntity optional
	 * @return string
	 */
	public String objectStorage(String dataKey, NestRequestEntity nestRequestEntity) {
		HttpPost httpPost = new HttpPost(INVOKE_URL + "/recognizer/object-storage");
		httpPost.setHeaders(HEADERS);
		Map<String, Object> body = new HashMap<>();
		body.put("dataKey", dataKey);
		body.put("language", nestRequestEntity.getLanguage());
		body.put("completion", nestRequestEntity.getCompletion());
		body.put("callback", nestRequestEntity.getCallback());
		body.put("userdata", nestRequestEntity.getCallback());
		body.put("wordAlignment", nestRequestEntity.getWordAlignment());
		body.put("fullText", nestRequestEntity.getFullText());
		body.put("forbiddens", nestRequestEntity.getForbiddens());
		body.put("boostings", nestRequestEntity.getBoostings());
		body.put("diarization", nestRequestEntity.getDiarization());
		StringEntity httpEntity = new StringEntity(gson.toJson(body), ContentType.APPLICATION_JSON);
		httpPost.setEntity(httpEntity);
		return execute(httpPost);
	}

	/**
	 *
	 * recognize media using a file
	 * @param file required, the media file
	 * @param nestRequestEntity optional
	 * @return string
	 */
	public String upload(File file, NestRequestEntity nestRequestEntity) {
		HttpPost httpPost = new HttpPost(INVOKE_URL + "/recognizer/upload");
		httpPost.setHeaders(HEADERS);
		HttpEntity httpEntity = MultipartEntityBuilder.create()
			.addTextBody("params", gson.toJson(nestRequestEntity), ContentType.APPLICATION_JSON)
			.addBinaryBody("media", file, ContentType.MULTIPART_FORM_DATA, file.getName())
			.build();
		httpPost.setEntity(httpEntity);
		return execute(httpPost);
	}

	private String execute(HttpPost httpPost) {
		try (final CloseableHttpResponse httpResponse = httpClient.execute(httpPost)) {
			final HttpEntity entity = httpResponse.getEntity();
			return EntityUtils.toString(entity, StandardCharsets.UTF_8);
		} catch (Exception e) {
			throw new RuntimeException(e);
		}
	}

	public static void main(String[] args) {
		final ClovaSpeechClient clovaSpeechClient = new ClovaSpeechClient();
		NestRequestEntity requestEntity = new NestRequestEntity();
		final String result =
			clovaSpeechClient.upload(new File("/data/sample.mp4"), requestEntity);
		//final String result = clovaSpeechClient.url("file URL", requestEntity);
		//final String result = clovaSpeechClient.objectStorage("Object Storage key", requestEntity);
		System.out.println(result);
	}
}

Python

import requests
import json


class ClovaSpeechClient:
    # Clova Speech invoke URL
    invoke_url = ''
    # Clova Speech secret key
    secret = ''

    def req_url(self, url, completion, callback=None, userdata=None, forbiddens=None, boostings=None, wordAlignment=True, fullText=True, diarization=None):
        request_body = {
            'url': url,
            'language': 'ko-KR',
            'completion': completion,
            'callback': callback,
            'userdata': userdata,
            'wordAlignment': wordAlignment,
            'fullText': fullText,
            'forbiddens': forbiddens,
            'boostings': boostings,
            'diarization': diarization,
        }
        headers = {
            'Accept': 'application/json;UTF-8',
            'Content-Type': 'application/json;UTF-8',
            'X-CLOVASPEECH-API-KEY': self.secret
        }
        return requests.post(headers=headers,
                             url=self.invoke_url + '/recognizer/url',
                             data=json.dumps(request_body).encode('UTF-8'))

    def req_object_storage(self, data_key, completion, callback=None, userdata=None, forbiddens=None, boostings=None,
                           wordAlignment=True, fullText=True, diarization=None):
        request_body = {
            'dataKey': data_key,
            'language': 'ko-KR',
            'completion': completion,
            'callback': callback,
            'userdata': userdata,
            'wordAlignment': wordAlignment,
            'fullText': fullText,
            'forbiddens': forbiddens,
            'boostings': boostings,
            'diarization': diarization,
        }
        headers = {
            'Accept': 'application/json;UTF-8',
            'Content-Type': 'application/json;UTF-8',
            'X-CLOVASPEECH-API-KEY': self.secret
        }
        return requests.post(headers=headers,
                             url=self.invoke_url + '/recognizer/object-storage',
                             data=json.dumps(request_body).encode('UTF-8'))

    def req_upload(self, file, completion, callback=None, userdata=None, forbiddens=None, boostings=None,
                   wordAlignment=True, fullText=True, diarization=None):
        request_body = {
            'language': 'ko-KR',
            'completion': completion,
            'callback': callback,
            'userdata': userdata,
            'wordAlignment': wordAlignment,
            'fullText': fullText,
            'forbiddens': forbiddens,
            'boostings': boostings,
            'diarization': diarization,
        }
        headers = {
            'Accept': 'application/json;UTF-8',
            'X-CLOVASPEECH-API-KEY': self.secret
        }
        print(json.dumps(request_body, ensure_ascii=False).encode('UTF-8'))
        files = {
            'media': open(file, 'rb'),
            'params': (None, json.dumps(request_body, ensure_ascii=False).encode('UTF-8'), 'application/json')
        }
        response = requests.post(headers=headers, url=self.invoke_url + '/recognizer/upload', files=files)
        return response

if __name__ == '__main__':
    # res = ClovaSpeechClient().req_url(url='http://example.com/media.mp3', completion='sync')
    # res = ClovaSpeechClient().req_object_storage(data_key='data/media.mp3', completion='sync')
    res = ClovaSpeechClient().req_upload(file='/data/media.mp3', completion='sync')
    print(res.text)

PHP

<?php

$secret = '';
$invoke_url = '';

function req_url($url, $completion, $callback, $userdata, $forbiddens, $boostings,
                 $wordAlignment, $fullText, $diarization)
{
    $object = (object)[
        'language' => 'ko-KR',
        'completion' => $completion,
        'callback' => $callback,
        'url' => $url,
        'userdata' => $userdata,
        'forbiddens' => $forbiddens,
        'boostings' => $boostings,
        'wordAlignment' => $wordAlignment,
        'fullText' => $fullText,
        'diarization' => $diarization,
    ];
    return execute('/recognizer/url', json_encode($object), array('Content-Type: application/json'));
}

function req_object_storage($dataKey, $completion, $callback, $userdata, $forbiddens, $boostings,
                            $wordAlignment, $fullText, $diarization)
{
    $object = (object)[
        'language' => 'ko-KR',
        'completion' => $completion,
        'callback' => $callback,
        'dataKey' => $dataKey,
        'userdata' => $userdata,
        'forbiddens' => $forbiddens,
        'boostings' => $boostings,
        'wordAlignment' => $wordAlignment,
        'fullText' => $fullText,
        'diarization' => $diarization,
    ];
    return execute('/recognizer/object-storage', json_encode($object), array('Content-Type: application/json'));
}

function req_upload($filePath, $completion, $callback, $userdata, $forbiddens, $boostings,
                    $wordAlignment, $fullText, $diarization)
{
    $object = (object)[
        'language' => 'ko-KR',
        'completion' => $completion,
        'callback' => $callback,
        'userdata' => $userdata,
        'forbiddens' => $forbiddens,
        'boostings' => $boostings,
        'wordAlignment' => $wordAlignment,
        'fullText' => $fullText,
        'diarization' => $diarization,
    ];
    $fields = array(
        'media' => new CURLFile($filePath),
        'params' => json_encode($object),
    );
    return execute('/recognizer/upload', $fields, null);
}

function execute($uri, $postFields, $customHeaders)
{
    try {
        $ch = curl_init($GLOBALS['invoke_url'] . $uri);
        curl_setopt($ch, CURLOPT_POST, true);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
        curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
        curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'POST');
        curl_setopt($ch, CURLOPT_POSTFIELDS, $postFields);
        curl_setopt($ch, CURLOPT_VERBOSE, true);
        curl_setopt($ch, CURLOPT_TIMEOUT, 600);
        $headers = array();
        $headers[] = 'X-CLOVASPEECH-API-KEY: ' . $GLOBALS['secret'];
        if (!is_null($customHeaders)) {
            $headers = array_merge($headers, $customHeaders);
        }
        curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
        $response = curl_exec($ch);
        $err = curl_error($ch);
        curl_close($ch);
        if ($err) {
            echo 'cURL Error #:' . $err;
            return $err;
        }
        return $response;
    } catch (Exception $E) {
        echo 'Response: ' . $E . '\n';
        return $E->lastResponse;
    }
}

//$response = req_url('https://example.com/sample.mp4', 'sync', null, null, null, null, null, null, null);
//$response = req_object_storage('data/sample.mp4', 'sync', null, null, null, null, null, null, null);
$response = req_upload('/data/sample.mp4', 'sync', null, null, null, null, null, null, null);
echo $response;
?>

C#

using System;
using System.Globalization;
using System.Net.Http;
using System.Net.Http.Headers;
using System.Text.RegularExpressions;
using System.Threading.Channels;
using System.Threading.Tasks;
using System.Text.Json;
using System.Text.Json.Serialization;
using System.Text;
using System.Diagnostics;

namespace HttpClientStatus
{
    public class ClovaSpeechRequest
    {
        public string language { get; set; }
        public string completion { get; set; }
        // Other fields are omitted, please refer to: https://api.ncloud-docs.com/release-20230525/docs/ai-application-service-clovaspeech-clovaspeech for available fields
    }
    public class Program
    {
        private static readonly string secretKey = "";
        private static readonly string invokeUrl = "";
        public static async Task<string> Upload(ClovaSpeechRequest clovaSpeechRequest, string path)
        {

            using (var client = new HttpClient())
            {
                var multiForm = new MultipartFormDataContent();
                multiForm.Headers.Add("X-CLOVASPEECH-API-KEY", secretKey);
                multiForm.Add(new StringContent(JsonSerializer.Serialize(clovaSpeechRequest)), "params");
                FileStream fs = File.OpenRead(path);
                Console.WriteLine(Path.GetFileName(path));
                multiForm.Add(new StreamContent(fs), "media", Path.GetFileName(path));
                var message = await client.PostAsync(invokeUrl+ "/recognizer/upload", multiForm);
                return await message.Content.ReadAsStringAsync();
            }
        }

        static async Task Main(string[] args)
        {
            var clovaSpeechRequest = new ClovaSpeechRequest
            {
                language = "ko-KR",
                completion = "sync"
            };

            var result = await Upload(clovaSpeechRequest, @"D:\media\video\\sample.mp3");
            Console.WriteLine(result);
        }
    }
}

오류 코드

Error Response Body:

{
  "result": "FAILED",
  "message": "지원하지 않는 파일 포맷입니다.",
  "token": ''
}

Result	Message
SUCCEEDED	Succeeded
PROCESSING	Processing
ERROR_SERVER_BUSY	Server too busy
ERROR_TOKEN_INVALID	Token does not exist
ERROR_AUDIO_EMPTY	Audio is empty
ERROR_AUDIO_CONVERSION	Audio conversion has been failed
ERROR_PARAMS_FORMAT_INVALID	Params must be JSON format
ERROR_REQUEST_PARAMETER	Invalid request parameters
ERROR_REQUEST_PARAMETER	Speaker detect is off
ERROR_INVALID_SECRET	Invalid secret
ERROR_DATA_NOT_FOUND	Not found
ERROR_DATA_CONFLICT	Data conflict
ERROR_INTERNAL_ERROR	Internal Server Error
ERROR_EXTERNAL_ERROR	Service Unavailable
ERROR_TOO_MANY_JOBS	Too many jobs
ERROR_GATEWAY_TIMEOUT	Gateway timeout
FAILED	Other errors

이 문서가 도움이 되었습니까?

What's Next

CLOVA Speech 단문 인식 API

version
요청
CLOVA Speech API 사용법
1. object storage 파일 url로 인식 요청
2. 외부 url로 인식 요청
3. 로컬의 파일 업로드해서 요청
응답
4. Get job status
Examples
Java
Python
PHP
C#
오류 코드

태그

CLOVA Speech