STT 서비스 내부 시퀀스 통합 및 중복 제거

- 4개의 중복된 STT 시퀀스를 2개로 통합 - 녹음 시작 및 화자 인식 플로우 통합 (stt-녹음시작및인식.puml) - 텍스트 변환 플로우 통합 - 실시간/배치 모드 포함 (stt-텍스트변환통합.puml) - 중복 파일 4개 삭제 (음성녹음시작, 음성텍스트변환, 음성녹음인식, 텍스트변환) - Azure Speech Service 설정 및 신뢰도 검증 기준 통일 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2026-06-13 17:39:09 +00:00 · 2025-10-23 10:12:35 +09:00
parent 6e2baa2386
commit 09350783b1
6 changed files with 422 additions and 464 deletions
@@ -0,0 +1,244 @@
+@startuml
+!theme mono
+
+title STT Service - 음성-텍스트 변환 (실시간/배치 통합)
+
+participant "Frontend<<E>>" as Frontend
+participant "API Gateway<<E>>" as Gateway
+participant "TranscriptController" as Controller
+participant "TranscriptService" as Service
+participant "TranscriptionEngine" as Engine
+participant "RecordingRepository" as RecordingRepo
+participant "TranscriptRepository" as TranscriptRepo
+participant "AzureSpeechClient" as AzureClient
+database "STT DB" as DB
+database "Azure Blob Storage<<E>>" as BlobStorage
+queue "Azure Event Hubs<<E>>" as EventHub
+
+== 음성 데이터 스트리밍 수신 (실시간 모드) ==
+
+Frontend -> Gateway: POST /api/transcripts/stream\n(audioData, recordingId, timestamp)
+activate Gateway
+
+Gateway -> Controller: 음성 스트림 요청
+activate Controller
+
+Controller -> Service: processAudioStream(audioData, recordingId)
+activate Service
+
+alt 실시간 변환 모드
+    Service -> Engine: streamingTranscribe(audioData)
+    activate Engine
+
+    Engine -> AzureClient: recognizeAsync(audioData)
+    activate AzureClient
+
+    AzureClient -> AzureClient: 실시간 음성 인식 수행
+    note right
+      Azure Speech 설정:
+      - Mode: Continuous
+      - 언어: ko-KR
+      - 화자 식별 활성화
+      - 타임스탬프 자동 기록
+      - 신뢰도 점수 계산
+      - Profanity filter
+    end note
+
+    AzureClient -> BlobStorage: 음성 파일 저장\n(chunk 단위 저장)
+    activate BlobStorage
+    BlobStorage --> AzureClient: 저장 완료
+    deactivate BlobStorage
+
+    AzureClient --> Engine: RecognitionResult\n(text, speakerId, confidence, timestamp, duration)
+    deactivate AzureClient
+
+    == 정확도 검증 및 처리 ==
+
+    Engine -> Engine: validateConfidence(result)
+    note right
+      신뢰도 검증:
+      - Threshold: 0.7 (70%)
+      - confidence >= 80%: 정상 처리
+      - 60% <= confidence < 80%: 검토 권장
+      - confidence < 60%: 경고 플래그 설정
+    end note
+
+    Engine --> Service: transcription segment
+    deactivate Engine
+
+    == 변환 결과 저장 ==
+
+    Service -> TranscriptRepo: createTranscript(recordingId, segment)
+    activate TranscriptRepo
+
+    TranscriptRepo -> DB: 변환 결과 저장\n(텍스트ID, 녹음ID, 화자ID, 텍스트, 신뢰도, 타임스탬프, 경고플래그)
+    activate DB
+    DB --> TranscriptRepo: transcriptId 반환
+    deactivate DB
+
+    TranscriptRepo --> Service: TranscriptEntity 반환
+    deactivate TranscriptRepo
+
+    == 화자 정보 업데이트 ==
+
+    Service -> RecordingRepo: updateSpeakerInfo(recordingId, speakerId)
+    activate RecordingRepo
+
+    RecordingRepo -> DB: 화자 정보 저장/업데이트\n(녹음ID, 화자ID, 세그먼트수)
+    activate DB
+    DB --> RecordingRepo: 업데이트 완료
+    deactivate DB
+
+    RecordingRepo --> Service: 완료
+    deactivate RecordingRepo
+
+    == 이벤트 발행 ==
+
+    Service -> EventHub: TranscriptSegmentReady 이벤트 발행
+    activate EventHub
+    note right of EventHub
+      이벤트 데이터:
+      - transcriptId
+      - recordingId
+      - meetingId
+      - text
+      - speakerId
+      - timestamp
+      - confidence
+    end note
+    EventHub --> Service: 발행 완료
+    deactivate EventHub
+
+    Service --> Controller: TranscriptResponse\n(transcriptId, text, confidence, warningFlag)
+    deactivate Service
+
+    Controller --> Gateway: 200 OK\n(transcriptId, text, speakerId, timestamp, confidence)
+    deactivate Controller
+
+    Gateway --> Frontend: 실시간 자막 응답
+    deactivate Gateway
+
+else 배치 변환 모드
+    Gateway -> Controller: POST /api/v1/stt/transcribe\n{sessionId, audioFile}
+    activate Controller
+
+    Controller -> Service: transcribeAudio(sessionId, audioFile)
+    activate Service
+
+    Service -> RecordingRepo: findSessionById(sessionId)
+    activate RecordingRepo
+    RecordingRepo -> DB: STT 세션 조회\n(세션ID 기준)
+    DB --> RecordingRepo: session data
+    RecordingRepo --> Service: RecordingEntity
+    deactivate RecordingRepo
+
+    Service -> Engine: batchTranscribe(audioFile)
+    activate Engine
+
+    Engine -> AzureClient: batchTranscriptionAsync(audioUrl)
+    activate AzureClient
+    note right
+      배치 처리:
+      - 전체 파일 업로드
+      - 백그라운드 처리
+      - Callback URL 제공
+      - 화자별 그룹화
+      - 문장 경계 보정
+    end note
+
+    AzureClient --> Engine: transcription job ID
+    deactivate AzureClient
+
+    Engine --> Service: job submitted
+    deactivate Engine
+
+    Service -> RecordingRepo: updateSessionStatus(sessionId, "PROCESSING")
+    activate RecordingRepo
+    RecordingRepo -> DB: 세션 상태 업데이트\n(상태='처리중')
+    DB --> RecordingRepo: updated
+    RecordingRepo --> Service: updated
+    deactivate RecordingRepo
+
+    Service --> Controller: 202 Accepted\n{jobId, status}
+    deactivate Service
+
+    Controller --> Gateway: 202 Accepted
+    deactivate Controller
+
+    == 배치 처리 완료 (Callback) ==
+
+    AzureClient -> Controller: POST /api/v1/stt/callback\n{jobId, segments}
+    activate Controller
+
+    Controller -> Service: processBatchResult(jobId, segments)
+    activate Service
+
+    loop 각 세그먼트 처리
+        Service -> TranscriptRepo: createTranscript(recordingId, segment)
+        activate TranscriptRepo
+        TranscriptRepo -> DB: 변환 결과 저장
+        DB --> TranscriptRepo: saved
+        TranscriptRepo --> Service: saved
+        deactivate TranscriptRepo
+    end
+
+    == 전체 텍스트 통합 ==
+
+    Service -> TranscriptRepo: aggregateTranscription(sessionId)
+    activate TranscriptRepo
+    TranscriptRepo -> DB: 세그먼트 목록 조회\n(세션ID 기준, 타임스탬프 순 정렬)
+    DB --> TranscriptRepo: ordered segments
+    TranscriptRepo --> Service: segments
+    deactivate TranscriptRepo
+
+    Service -> Service: mergeSegments(segments)
+    note right
+      세그먼트 병합:
+      - 화자별 그룹화
+      - 시간 순서 정렬
+      - 문장 경계 보정
+    end note
+
+    Service -> RecordingRepo: saveTranscription(fullText)
+    activate RecordingRepo
+    RecordingRepo -> DB: 전체 텍스트 저장 및 상태 업데이트\n(전체텍스트, 상태='완료')
+    DB --> RecordingRepo: saved
+    RecordingRepo --> Service: updated session
+    deactivate RecordingRepo
+
+    Service -> EventHub: TranscriptionCompletedEvent 발행
+    note right
+      Event:
+      - sessionId
+      - meetingId
+      - fullText
+      - completedAt
+    end note
+
+    Service --> Controller: TranscriptionResponse\n{sessionId, text, segments}
+    deactivate Service
+
+    Controller --> Gateway: 200 OK\n{transcription, metadata}
+    deactivate Controller
+end
+
+note over Frontend, EventHub
+**실시간 모드 처리 시간:**
+- Azure STT 처리: 1-3초
+- DB 저장: ~100ms
+- Event 발행: ~50ms
+- 총 처리 시간: 1-4초
+
+**배치 모드 처리 시간:**
+- 파일 업로드: ~1-2초
+- Azure 배치 처리: 5-30초 (파일 크기에 따라)
+- DB 저장: ~500ms
+- 총 처리 시간: 7-33초
+
+**정확도 경고 기준:**
+- < 60%: 수동 수정 권장 (경고 플래그)
+- 60-80%: 검토 권장
+- >= 80%: 정상
+end note
+
+@enduml