From 09350783b1907a4bb1b4213ea2095c9ed30bf432 Mon Sep 17 00:00:00 2001 From: Minseo-Jo Date: Thu, 23 Oct 2025 10:12:35 +0900 Subject: [PATCH] =?UTF-8?q?STT=20=EC=84=9C=EB=B9=84=EC=8A=A4=20=EB=82=B4?= =?UTF-8?q?=EB=B6=80=20=EC=8B=9C=ED=80=80=EC=8A=A4=20=ED=86=B5=ED=95=A9=20?= =?UTF-8?q?=EB=B0=8F=20=EC=A4=91=EB=B3=B5=20=EC=A0=9C=EA=B1=B0?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - 4개의 중복된 STT 시퀀스를 2개로 통합 - 녹음 시작 및 화자 인식 플로우 통합 (stt-녹음시작및인식.puml) - 텍스트 변환 플로우 통합 - 실시간/배치 모드 포함 (stt-텍스트변환통합.puml) - 중복 파일 4개 삭제 (음성녹음시작, 음성텍스트변환, 음성녹음인식, 텍스트변환) - Azure Speech Service 설정 및 신뢰도 검증 기준 통일 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude --- .../sequence/inner/stt-녹음시작및인식.puml | 178 +++++++++++++ .../sequence/inner/stt-음성녹음시작.puml | 87 ------- .../sequence/inner/stt-음성녹음인식.puml | 117 --------- .../sequence/inner/stt-음성텍스트변환.puml | 115 --------- .../sequence/inner/stt-텍스트변환.puml | 145 ----------- .../sequence/inner/stt-텍스트변환통합.puml | 244 ++++++++++++++++++ 6 files changed, 422 insertions(+), 464 deletions(-) create mode 100644 design/backend/sequence/inner/stt-녹음시작및인식.puml delete mode 100644 design/backend/sequence/inner/stt-음성녹음시작.puml delete mode 100644 design/backend/sequence/inner/stt-음성녹음인식.puml delete mode 100644 design/backend/sequence/inner/stt-음성텍스트변환.puml delete mode 100644 design/backend/sequence/inner/stt-텍스트변환.puml create mode 100644 design/backend/sequence/inner/stt-텍스트변환통합.puml diff --git a/design/backend/sequence/inner/stt-녹음시작및인식.puml b/design/backend/sequence/inner/stt-녹음시작및인식.puml new file mode 100644 index 0000000..32f482a --- /dev/null +++ b/design/backend/sequence/inner/stt-녹음시작및인식.puml @@ -0,0 +1,178 @@ +@startuml +!theme mono + +title STT Service - 음성 녹음 시작 및 화자 인식 (통합) + +participant "Frontend<>" as Frontend +participant "API Gateway<>" as Gateway +participant "RecordingController" as Controller +participant "RecordingService" as Service +participant "AudioStreamManager" as StreamManager +participant "SpeakerIdentifier" as Speaker +participant "RecordingRepository" as Repository +participant "AzureSpeechClient" as AzureClient +database "STT DB" as DB +database "Azure Blob Storage<>" as BlobStorage +queue "Azure Event Hubs<>" as EventHub + +== 회의 시작 이벤트 수신 및 녹음 준비 == + +EventHub -> Controller: MeetingStarted 이벤트 수신\n(meetingId, sessionId) +activate Controller + +Controller -> Service: prepareRecording(meetingId, sessionId) +activate Service + +Service -> Service: 녹음 세션 검증 +note right + - 중복 녹음 방지 체크 + - meetingId 유효성 검증 +end note + +Service -> Repository: createRecording(meetingId, sessionId) +activate Repository + +Repository -> DB: 녹음 세션 생성\n(녹음ID, 회의ID, 세션ID, 상태, 생성일시) +activate DB +DB --> Repository: recordingId 반환 +deactivate DB + +Repository --> Service: RecordingEntity 반환 +deactivate Repository + +== Azure Speech Service 초기화 == + +Service -> AzureClient: initializeRecognizer(recordingId, sessionId) +activate AzureClient + +AzureClient -> AzureClient: 음성 인식기 설정 +note right + Azure Speech 설정: + - 언어: ko-KR + - Format: PCM 16kHz + - 샘플레이트: 16kHz + - 화자 식별 활성화 + - 실시간 스트리밍 모드 + - Continuous recognition +end note + +AzureClient -> BlobStorage: 녹음 파일 저장 경로 생성\n(path: recordings/{meetingId}/{sessionId}.wav) +activate BlobStorage +BlobStorage --> AzureClient: 저장 경로 URL 반환 +deactivate BlobStorage + +AzureClient --> Service: RecognizerConfig 반환 +deactivate AzureClient + +== 녹음 상태 업데이트 == + +Service -> Repository: updateRecordingStatus(recordingId, "RECORDING") +activate Repository + +Repository -> DB: 녹음 상태 업데이트\n(상태='녹음중', 시작일시, 저장경로) +activate DB +DB --> Repository: 업데이트 완료 +deactivate DB + +Repository --> Service: 업데이트 완료 +deactivate Repository + +Service --> Controller: RecordingResponse(recordingId, status, storagePath) +deactivate Service + +Controller --> EventHub: RecordingStarted 이벤트 발행\n(recordingId, meetingId, status) + +Controller --> Gateway: 200 OK\n{sessionId, streamUrl} +deactivate Controller + +== 음성 스트리밍 및 화자 식별 처리 == + +Frontend -> Gateway: WebSocket /ws/stt/{sessionId}\n[audio stream] +activate Gateway + +Gateway -> Controller: 음성 데이터 수신 +activate Controller + +Controller -> Service: processAudioStream(sessionId, audioData) +activate Service + +Service -> StreamManager: streamAudio(audioData) +activate StreamManager + +StreamManager -> AzureClient: recognizeAsync(audioData) +activate AzureClient + +AzureClient --> StreamManager: partial result\n(text, timestamp) +deactivate AzureClient + +StreamManager --> Service: recognized text +deactivate StreamManager + +== 화자 식별 == + +Service -> Speaker: identifySpeaker(audioFrame) +activate Speaker + +Speaker -> AzureClient: analyzeSpeakerProfile()\n(Speaker Recognition API) +activate AzureClient +note right + 화자 식별: + - Voice signature 생성 + - 기존 프로필과 매칭 + - 신규 화자 자동 등록 +end note + +AzureClient --> Speaker: speakerId +deactivate AzureClient + +Speaker --> Service: speaker info +deactivate Speaker + +== 화자별 세그먼트 저장 == + +Service -> Repository: saveSttSegment(segment) +activate Repository + +Repository -> DB: STT 세그먼트 저장\n(세션ID, 텍스트, 화자ID, 타임스탬프, 신뢰도) +activate DB +DB --> Repository: segment saved +deactivate DB + +Repository --> Service: saved +deactivate Repository + +Service -> Repository: updateSpeakerInfo(recordingId, speakerId) +activate Repository + +Repository -> DB: 화자 정보 저장/업데이트\n(녹음ID, 화자ID, 세그먼트수) +activate DB +DB --> Repository: 업데이트 완료 +deactivate DB + +Repository --> Service: 완료 +deactivate Repository + +Service --> Controller: streaming response\n{text, speaker, timestamp, confidence} +deactivate Service + +Controller --> Gateway: WebSocket message +deactivate Controller + +Gateway --> Frontend: 실시간 자막 전송\n{text, speaker, timestamp} +deactivate Gateway + +note over Frontend, EventHub +처리 시간: +- DB 녹음 생성: ~100ms +- Azure 인식기 초기화: ~500ms +- Blob 경로 생성: ~200ms +- 화자 식별: ~300ms +- 실시간 인식 지연: < 1초 +- 총 초기화 시간: ~1.1초 + +정확도: +- 화자 식별 정확도: > 90% +- 음성 인식 정확도: 60-95% +end note + +@enduml diff --git a/design/backend/sequence/inner/stt-음성녹음시작.puml b/design/backend/sequence/inner/stt-음성녹음시작.puml deleted file mode 100644 index 5cc2ce4..0000000 --- a/design/backend/sequence/inner/stt-음성녹음시작.puml +++ /dev/null @@ -1,87 +0,0 @@ -@startuml -!theme mono - -title STT Service - 음성녹음시작 내부 시퀀스 - -participant "RecordingController" as Controller -participant "RecordingService" as Service -participant "RecordingRepository" as Repository -participant "AzureSpeechClient" as AzureClient -database "STT DB" as DB -database "Azure Blob Storage<>" as BlobStorage -queue "Azure Event Hubs<>" as EventHub - -== MeetingStarted 이벤트 수신 == - -EventHub -> Controller: MeetingStarted 이벤트 수신\n(meetingId, sessionId) -activate Controller - -Controller -> Service: prepareRecording(meetingId, sessionId) -activate Service - -Service -> Service: 녹음 세션 검증 -note right - - 중복 녹음 방지 체크 - - meetingId 유효성 검증 -end note - -Service -> Repository: createRecording(meetingId, sessionId) -activate Repository - -Repository -> DB: 녹음 세션 생성\n(녹음ID, 회의ID, 세션ID, 상태, 생성일시) -activate DB -DB --> Repository: recordingId 반환 -deactivate DB - -Repository --> Service: RecordingEntity 반환 -deactivate Repository - -== Azure Speech Service 초기화 == - -Service -> AzureClient: initializeRecognizer(recordingId, sessionId) -activate AzureClient - -AzureClient -> AzureClient: 음성 인식기 설정 -note right - - 언어: ko-KR - - 샘플레이트: 16kHz - - 화자 식별 활성화 - - 실시간 스트리밍 모드 -end note - -AzureClient -> BlobStorage: 녹음 파일 저장 경로 생성\n(path: recordings/{meetingId}/{sessionId}.wav) -activate BlobStorage -BlobStorage --> AzureClient: 저장 경로 URL 반환 -deactivate BlobStorage - -AzureClient --> Service: RecognizerConfig 반환 -deactivate AzureClient - -== 녹음 상태 업데이트 == - -Service -> Repository: updateRecordingStatus(recordingId, "RECORDING") -activate Repository - -Repository -> DB: 녹음 상태 업데이트\n(상태='녹음중', 시작일시, 저장경로) -activate DB -DB --> Repository: 업데이트 완료 -deactivate DB - -Repository --> Service: 업데이트 완료 -deactivate Repository - -Service --> Controller: RecordingResponse(recordingId, status, storagePath) -deactivate Service - -Controller --> EventHub: RecordingStarted 이벤트 발행\n(recordingId, meetingId, status) -deactivate Controller - -note over Controller, EventHub -처리 시간: -- DB 녹음 생성: ~100ms -- Azure 인식기 초기화: ~500ms -- Blob 경로 생성: ~200ms -- 총 처리 시간: ~800ms -end note - -@enduml diff --git a/design/backend/sequence/inner/stt-음성녹음인식.puml b/design/backend/sequence/inner/stt-음성녹음인식.puml deleted file mode 100644 index 68d5ae8..0000000 --- a/design/backend/sequence/inner/stt-음성녹음인식.puml +++ /dev/null @@ -1,117 +0,0 @@ -@startuml -!theme mono - -title 음성녹음 및 화자 식별 내부 시퀀스 (UFR-STT-010) - -participant "API Gateway<>" as Gateway -participant "SttController" as Controller -participant "SttService" as Service -participant "AudioStreamManager" as StreamManager -participant "SpeakerIdentifier" as Speaker -participant "Azure Speech<>" as Speech -participant "SttRepository" as Repository -database "PostgreSQL<>" as DB -queue "Event Hub<>" as EventHub - -Gateway -> Controller: POST /api/v1/stt/start-recording\n{meetingId, userId} -activate Controller - -Controller -> Service: startRecording(meetingId, userId) -activate Service - -Service -> Repository: findMeetingById(meetingId) -activate Repository -Repository -> DB: 회의 정보 조회\n(회의ID 기준) -DB --> Repository: meeting data -Repository --> Service: Meeting entity -deactivate Repository - -Service -> StreamManager: initializeStream(meetingId) -activate StreamManager -StreamManager -> Speech: createRecognizer()\n(Azure Speech API) -note right - Azure Speech 설정: - - Language: ko-KR - - Format: PCM 16kHz - - Continuous recognition -end note -Speech --> StreamManager: recognizer instance -StreamManager --> Service: stream session -deactivate StreamManager - -Service -> Speaker: identifySpeaker(audioFrame) -activate Speaker -Speaker -> Speech: analyzeSpeakerProfile()\n(Speaker Recognition API) -note right - 화자 식별: - - Voice signature 생성 - - 기존 프로필과 매칭 - - 신규 화자 자동 등록 -end note -Speech --> Speaker: speakerId -Speaker --> Service: speaker info -deactivate Speaker - -Service -> Repository: saveSttSession(session) -activate Repository -Repository -> DB: STT 세션 저장\n(회의ID, 상태, 시작일시) -DB --> Repository: session saved -Repository --> Service: SttSession entity -deactivate Repository - -Service -> EventHub: publish(SttStartedEvent) -note right - Event: - - meetingId - - sessionId - - startedAt -end note - -Service --> Controller: RecordingStartResponse\n{sessionId, status} -deactivate Service - -Controller --> Gateway: 200 OK\n{sessionId, streamUrl} -deactivate Controller - -== 음성 스트리밍 처리 == - -Gateway -> Controller: WebSocket /ws/stt/{sessionId}\n[audio stream] -activate Controller - -Controller -> Service: processAudioStream(sessionId, audioData) -activate Service - -Service -> StreamManager: streamAudio(audioData) -activate StreamManager - -StreamManager -> Speech: recognizeAsync(audioData) -Speech --> StreamManager: partial result -note right - 실시간 인식: - - Partial text - - Confidence score - - Timestamp -end note - -StreamManager --> Service: recognized text -deactivate StreamManager - -Service -> Speaker: updateSpeakerMapping(text, timestamp) -activate Speaker -Speaker --> Service: speaker segment -deactivate Speaker - -Service -> Repository: saveSttSegment(segment) -activate Repository -Repository -> DB: STT 세그먼트 저장\n(세션ID, 텍스트, 화자ID, 타임스탬프) -DB --> Repository: segment saved -Repository --> Service: saved -deactivate Repository - -Service --> Controller: streaming response -deactivate Service - -Controller --> Gateway: WebSocket message\n{text, speaker, timestamp} -deactivate Controller - -@enduml diff --git a/design/backend/sequence/inner/stt-음성텍스트변환.puml b/design/backend/sequence/inner/stt-음성텍스트변환.puml deleted file mode 100644 index a377d46..0000000 --- a/design/backend/sequence/inner/stt-음성텍스트변환.puml +++ /dev/null @@ -1,115 +0,0 @@ -@startuml -!theme mono - -title STT Service - 음성텍스트변환 내부 시퀀스 - -participant "Frontend<>" as Frontend -participant "TranscriptController" as Controller -participant "TranscriptService" as Service -participant "RecordingRepository" as RecordingRepo -participant "TranscriptRepository" as TranscriptRepo -participant "AzureSpeechClient" as AzureClient -database "STT DB" as DB -database "Azure Blob Storage<>" as BlobStorage -queue "Azure Event Hubs<>" as EventHub - -== 음성 데이터 스트리밍 수신 (5초 간격 배치) == - -Frontend -> Controller: POST /api/transcripts/stream\n(audioData, recordingId, timestamp) -activate Controller - -Controller -> Service: processAudioStream(audioData, recordingId) -activate Service - -== 음성 인식 처리 == - -Service -> AzureClient: recognizeAudio(audioData) -activate AzureClient - -AzureClient -> AzureClient: 음성 인식 수행 -note right - - 실시간 STT 처리 - - 화자 식별 (Speaker Diarization) - - 타임스탬프 자동 기록 - - 신뢰도 점수 계산 -end note - -AzureClient -> BlobStorage: 음성 파일 저장\n(chunk 단위 저장) -activate BlobStorage -BlobStorage --> AzureClient: 저장 완료 -deactivate BlobStorage - -AzureClient --> Service: RecognitionResult\n(text, speakerId, confidence, timestamp) -deactivate AzureClient - -== 정확도 검증 및 처리 == - -Service -> Service: 정확도 점수 검증 -note right - confidence >= 60%: 정상 처리 - confidence < 60%: 경고 플래그 설정 -end note - -== 변환 결과 저장 == - -Service -> TranscriptRepo: createTranscript(recordingId, text, metadata) -activate TranscriptRepo - -TranscriptRepo -> DB: 변환 결과 저장\n(텍스트ID, 녹음ID, 화자ID, 텍스트, 신뢰도, 타임스탬프, 경고플래그) -activate DB -DB --> TranscriptRepo: transcriptId 반환 -deactivate DB - -TranscriptRepo --> Service: TranscriptEntity 반환 -deactivate TranscriptRepo - -== 화자 정보 업데이트 == - -Service -> RecordingRepo: updateSpeakerInfo(recordingId, speakerId) -activate RecordingRepo - -RecordingRepo -> DB: 화자 정보 저장/업데이트\n(녹음ID, 화자ID, 세그먼트수) -activate DB -DB --> RecordingRepo: 업데이트 완료 -deactivate DB - -RecordingRepo --> Service: 완료 -deactivate RecordingRepo - -== 이벤트 발행 == - -Service -> EventHub: TranscriptReady 이벤트 발행 -activate EventHub -note right of EventHub - 이벤트 데이터: - - transcriptId - - recordingId - - meetingId - - text - - speakerId - - timestamp - - confidence -end note -EventHub --> Service: 발행 완료 -deactivate EventHub - -Service --> Controller: TranscriptResponse\n(transcriptId, text, confidence, warningFlag) -deactivate Service - -Controller --> Frontend: 200 OK\n(transcriptId, text, speakerId, timestamp, confidence) -deactivate Controller - -note over Frontend, EventHub -처리 시간: -- Azure STT 처리: 1-3초 -- DB 저장: ~100ms -- Event 발행: ~50ms -- 총 처리 시간: 1-4초 - -정확도 경고: -- 60% 미만: 수동 수정 권장 -- 60-80%: 검토 권장 -- 80% 이상: 정상 -end note - -@enduml diff --git a/design/backend/sequence/inner/stt-텍스트변환.puml b/design/backend/sequence/inner/stt-텍스트변환.puml deleted file mode 100644 index c234277..0000000 --- a/design/backend/sequence/inner/stt-텍스트변환.puml +++ /dev/null @@ -1,145 +0,0 @@ -@startuml -!theme mono - -title 음성-텍스트 변환 내부 시퀀스 (UFR-STT-020) - -participant "API Gateway<>" as Gateway -participant "SttController" as Controller -participant "SttService" as Service -participant "TranscriptionEngine" as Engine -participant "Azure Speech<>" as Speech -participant "SttRepository" as Repository -database "PostgreSQL<>" as DB -queue "Event Hub<>" as EventHub - -Gateway -> Controller: POST /api/v1/stt/transcribe\n{sessionId, audioFile} -activate Controller - -Controller -> Service: transcribeAudio(sessionId, audioFile) -activate Service - -Service -> Repository: findSessionById(sessionId) -activate Repository -Repository -> DB: STT 세션 조회\n(세션ID 기준) -DB --> Repository: session data -Repository --> Service: SttSession entity -deactivate Repository - -alt 실시간 변환 모드 - Service -> Engine: streamingTranscribe(audioFile) - activate Engine - - Engine -> Speech: createRecognizer()\nsetContinuousRecognition() - note right - Azure Speech 설정: - - Mode: Continuous - - Language: ko-KR - - Enable diarization - - Profanity filter - end note - - Speech --> Engine: recognizer instance - - loop 오디오 청크 처리 - Engine -> Speech: recognizeOnceAsync(audioChunk) - Speech --> Engine: recognition result - note right - 결과 포함: - - Text - - Confidence - - Duration - - Speaker ID - end note - - Engine -> Engine: validateConfidence(result) - note right - 신뢰도 검증: - - Threshold: 0.7 - - Low confidence 처리 - end note - - Engine --> Service: transcription segment - - Service -> Repository: saveSttSegment(segment) - activate Repository - Repository -> DB: STT 세그먼트 저장\n(세션ID, 텍스트, 신뢰도, 타임스탬프) - DB --> Repository: saved - Repository --> Service: segment saved - deactivate Repository - - Service -> EventHub: publish(TranscriptionSegmentEvent) - note right - Event: - - sessionId - - segmentId - - text - - timestamp - end note - end - - Engine --> Service: streaming complete - deactivate Engine - -else 배치 변환 모드 - Service -> Engine: batchTranscribe(audioFile) - activate Engine - - Engine -> Speech: batchTranscriptionAsync(audioUrl) - note right - 배치 처리: - - 전체 파일 업로드 - - 백그라운드 처리 - - Callback URL 제공 - end note - - Speech --> Engine: transcription job ID - - Engine --> Service: job submitted - deactivate Engine - - Service -> Repository: updateSessionStatus(sessionId, "PROCESSING") - activate Repository - Repository -> DB: 세션 상태 업데이트\n(상태='처리중') - DB --> Repository: updated - Repository --> Service: updated - deactivate Repository -end - -Service -> Repository: aggregateTranscription(sessionId) -activate Repository -Repository -> DB: 세그먼트 목록 조회\n(세션ID 기준, 타임스탬프 순 정렬) -DB --> Repository: segments -Repository --> Service: ordered segments -deactivate Repository - -Service -> Service: mergeSegments(segments) -note right - 세그먼트 병합: - - 화자별 그룹화 - - 시간 순서 정렬 - - 문장 경계 보정 -end note - -Service -> Repository: saveTranscription(fullText) -activate Repository -Repository -> DB: 전체 텍스트 저장 및 상태 업데이트\n(전체텍스트, 상태='완료') -DB --> Repository: saved -Repository --> Service: updated session -deactivate Repository - -Service -> EventHub: publish(TranscriptionCompletedEvent) -note right - Event: - - sessionId - - meetingId - - fullText - - completedAt -end note - -Service --> Controller: TranscriptionResponse\n{sessionId, text, segments} -deactivate Service - -Controller --> Gateway: 200 OK\n{transcription, metadata} -deactivate Controller - -@enduml diff --git a/design/backend/sequence/inner/stt-텍스트변환통합.puml b/design/backend/sequence/inner/stt-텍스트변환통합.puml new file mode 100644 index 0000000..9ce6586 --- /dev/null +++ b/design/backend/sequence/inner/stt-텍스트변환통합.puml @@ -0,0 +1,244 @@ +@startuml +!theme mono + +title STT Service - 음성-텍스트 변환 (실시간/배치 통합) + +participant "Frontend<>" as Frontend +participant "API Gateway<>" as Gateway +participant "TranscriptController" as Controller +participant "TranscriptService" as Service +participant "TranscriptionEngine" as Engine +participant "RecordingRepository" as RecordingRepo +participant "TranscriptRepository" as TranscriptRepo +participant "AzureSpeechClient" as AzureClient +database "STT DB" as DB +database "Azure Blob Storage<>" as BlobStorage +queue "Azure Event Hubs<>" as EventHub + +== 음성 데이터 스트리밍 수신 (실시간 모드) == + +Frontend -> Gateway: POST /api/transcripts/stream\n(audioData, recordingId, timestamp) +activate Gateway + +Gateway -> Controller: 음성 스트림 요청 +activate Controller + +Controller -> Service: processAudioStream(audioData, recordingId) +activate Service + +alt 실시간 변환 모드 + Service -> Engine: streamingTranscribe(audioData) + activate Engine + + Engine -> AzureClient: recognizeAsync(audioData) + activate AzureClient + + AzureClient -> AzureClient: 실시간 음성 인식 수행 + note right + Azure Speech 설정: + - Mode: Continuous + - 언어: ko-KR + - 화자 식별 활성화 + - 타임스탬프 자동 기록 + - 신뢰도 점수 계산 + - Profanity filter + end note + + AzureClient -> BlobStorage: 음성 파일 저장\n(chunk 단위 저장) + activate BlobStorage + BlobStorage --> AzureClient: 저장 완료 + deactivate BlobStorage + + AzureClient --> Engine: RecognitionResult\n(text, speakerId, confidence, timestamp, duration) + deactivate AzureClient + + == 정확도 검증 및 처리 == + + Engine -> Engine: validateConfidence(result) + note right + 신뢰도 검증: + - Threshold: 0.7 (70%) + - confidence >= 80%: 정상 처리 + - 60% <= confidence < 80%: 검토 권장 + - confidence < 60%: 경고 플래그 설정 + end note + + Engine --> Service: transcription segment + deactivate Engine + + == 변환 결과 저장 == + + Service -> TranscriptRepo: createTranscript(recordingId, segment) + activate TranscriptRepo + + TranscriptRepo -> DB: 변환 결과 저장\n(텍스트ID, 녹음ID, 화자ID, 텍스트, 신뢰도, 타임스탬프, 경고플래그) + activate DB + DB --> TranscriptRepo: transcriptId 반환 + deactivate DB + + TranscriptRepo --> Service: TranscriptEntity 반환 + deactivate TranscriptRepo + + == 화자 정보 업데이트 == + + Service -> RecordingRepo: updateSpeakerInfo(recordingId, speakerId) + activate RecordingRepo + + RecordingRepo -> DB: 화자 정보 저장/업데이트\n(녹음ID, 화자ID, 세그먼트수) + activate DB + DB --> RecordingRepo: 업데이트 완료 + deactivate DB + + RecordingRepo --> Service: 완료 + deactivate RecordingRepo + + == 이벤트 발행 == + + Service -> EventHub: TranscriptSegmentReady 이벤트 발행 + activate EventHub + note right of EventHub + 이벤트 데이터: + - transcriptId + - recordingId + - meetingId + - text + - speakerId + - timestamp + - confidence + end note + EventHub --> Service: 발행 완료 + deactivate EventHub + + Service --> Controller: TranscriptResponse\n(transcriptId, text, confidence, warningFlag) + deactivate Service + + Controller --> Gateway: 200 OK\n(transcriptId, text, speakerId, timestamp, confidence) + deactivate Controller + + Gateway --> Frontend: 실시간 자막 응답 + deactivate Gateway + +else 배치 변환 모드 + Gateway -> Controller: POST /api/v1/stt/transcribe\n{sessionId, audioFile} + activate Controller + + Controller -> Service: transcribeAudio(sessionId, audioFile) + activate Service + + Service -> RecordingRepo: findSessionById(sessionId) + activate RecordingRepo + RecordingRepo -> DB: STT 세션 조회\n(세션ID 기준) + DB --> RecordingRepo: session data + RecordingRepo --> Service: RecordingEntity + deactivate RecordingRepo + + Service -> Engine: batchTranscribe(audioFile) + activate Engine + + Engine -> AzureClient: batchTranscriptionAsync(audioUrl) + activate AzureClient + note right + 배치 처리: + - 전체 파일 업로드 + - 백그라운드 처리 + - Callback URL 제공 + - 화자별 그룹화 + - 문장 경계 보정 + end note + + AzureClient --> Engine: transcription job ID + deactivate AzureClient + + Engine --> Service: job submitted + deactivate Engine + + Service -> RecordingRepo: updateSessionStatus(sessionId, "PROCESSING") + activate RecordingRepo + RecordingRepo -> DB: 세션 상태 업데이트\n(상태='처리중') + DB --> RecordingRepo: updated + RecordingRepo --> Service: updated + deactivate RecordingRepo + + Service --> Controller: 202 Accepted\n{jobId, status} + deactivate Service + + Controller --> Gateway: 202 Accepted + deactivate Controller + + == 배치 처리 완료 (Callback) == + + AzureClient -> Controller: POST /api/v1/stt/callback\n{jobId, segments} + activate Controller + + Controller -> Service: processBatchResult(jobId, segments) + activate Service + + loop 각 세그먼트 처리 + Service -> TranscriptRepo: createTranscript(recordingId, segment) + activate TranscriptRepo + TranscriptRepo -> DB: 변환 결과 저장 + DB --> TranscriptRepo: saved + TranscriptRepo --> Service: saved + deactivate TranscriptRepo + end + + == 전체 텍스트 통합 == + + Service -> TranscriptRepo: aggregateTranscription(sessionId) + activate TranscriptRepo + TranscriptRepo -> DB: 세그먼트 목록 조회\n(세션ID 기준, 타임스탬프 순 정렬) + DB --> TranscriptRepo: ordered segments + TranscriptRepo --> Service: segments + deactivate TranscriptRepo + + Service -> Service: mergeSegments(segments) + note right + 세그먼트 병합: + - 화자별 그룹화 + - 시간 순서 정렬 + - 문장 경계 보정 + end note + + Service -> RecordingRepo: saveTranscription(fullText) + activate RecordingRepo + RecordingRepo -> DB: 전체 텍스트 저장 및 상태 업데이트\n(전체텍스트, 상태='완료') + DB --> RecordingRepo: saved + RecordingRepo --> Service: updated session + deactivate RecordingRepo + + Service -> EventHub: TranscriptionCompletedEvent 발행 + note right + Event: + - sessionId + - meetingId + - fullText + - completedAt + end note + + Service --> Controller: TranscriptionResponse\n{sessionId, text, segments} + deactivate Service + + Controller --> Gateway: 200 OK\n{transcription, metadata} + deactivate Controller +end + +note over Frontend, EventHub +**실시간 모드 처리 시간:** +- Azure STT 처리: 1-3초 +- DB 저장: ~100ms +- Event 발행: ~50ms +- 총 처리 시간: 1-4초 + +**배치 모드 처리 시간:** +- 파일 업로드: ~1-2초 +- Azure 배치 처리: 5-30초 (파일 크기에 따라) +- DB 저장: ~500ms +- 총 처리 시간: 7-33초 + +**정확도 경고 기준:** +- < 60%: 수동 수정 권장 (경고 플래그) +- 60-80%: 검토 권장 +- >= 80%: 정상 +end note + +@enduml