STT 서비스 내부 시퀀스 통합 및 중복 제거

- 4개의 중복된 STT 시퀀스를 2개로 통합 - 녹음 시작 및 화자 인식 플로우 통합 (stt-녹음시작및인식.puml) - 텍스트 변환 플로우 통합 - 실시간/배치 모드 포함 (stt-텍스트변환통합.puml) - 중복 파일 4개 삭제 (음성녹음시작, 음성텍스트변환, 음성녹음인식, 텍스트변환) - Azure Speech Service 설정 및 신뢰도 검증 기준 통일 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-21 20:46:23 +00:00 · 2025-10-23 10:12:35 +09:00 · 2025-10-23 10:12:35 +09:00 · 09350783b1
commit 09350783b1
parent 6e2baa2386
6 changed files with 422 additions and 464 deletions
--- a/design/backend/sequence/inner/stt-녹음시작및인식.puml
+++ b/design/backend/sequence/inner/stt-녹음시작및인식.puml
@ -0,0 +1,178 @@
+@startuml
+!theme mono
+
+title STT Service - 음성 녹음 시작 및 화자 인식 (통합)
+
+participant "Frontend<<E>>" as Frontend
+participant "API Gateway<<E>>" as Gateway
+participant "RecordingController" as Controller
+participant "RecordingService" as Service
+participant "AudioStreamManager" as StreamManager
+participant "SpeakerIdentifier" as Speaker
+participant "RecordingRepository" as Repository
+participant "AzureSpeechClient" as AzureClient
+database "STT DB" as DB
+database "Azure Blob Storage<<E>>" as BlobStorage
+queue "Azure Event Hubs<<E>>" as EventHub
+
+== 회의 시작 이벤트 수신 및 녹음 준비 ==
+
+EventHub -> Controller: MeetingStarted 이벤트 수신\n(meetingId, sessionId)
+activate Controller
+
+Controller -> Service: prepareRecording(meetingId, sessionId)
+activate Service
+
+Service -> Service: 녹음 세션 검증
+note right
+  - 중복 녹음 방지 체크
+  - meetingId 유효성 검증
+end note
+
+Service -> Repository: createRecording(meetingId, sessionId)
+activate Repository
+
+Repository -> DB: 녹음 세션 생성\n(녹음ID, 회의ID, 세션ID, 상태, 생성일시)
+activate DB
+DB --> Repository: recordingId 반환
+deactivate DB
+
+Repository --> Service: RecordingEntity 반환
+deactivate Repository
+
+== Azure Speech Service 초기화 ==
+
+Service -> AzureClient: initializeRecognizer(recordingId, sessionId)
+activate AzureClient
+
+AzureClient -> AzureClient: 음성 인식기 설정
+note right
+  Azure Speech 설정:
+  - 언어: ko-KR
+  - Format: PCM 16kHz
+  - 샘플레이트: 16kHz
+  - 화자 식별 활성화
+  - 실시간 스트리밍 모드
+  - Continuous recognition
+end note
+
+AzureClient -> BlobStorage: 녹음 파일 저장 경로 생성\n(path: recordings/{meetingId}/{sessionId}.wav)
+activate BlobStorage
+BlobStorage --> AzureClient: 저장 경로 URL 반환
+deactivate BlobStorage
+
+AzureClient --> Service: RecognizerConfig 반환
+deactivate AzureClient
+
+== 녹음 상태 업데이트 ==
+
+Service -> Repository: updateRecordingStatus(recordingId, "RECORDING")
+activate Repository
+
+Repository -> DB: 녹음 상태 업데이트\n(상태='녹음중', 시작일시, 저장경로)
+activate DB
+DB --> Repository: 업데이트 완료
+deactivate DB
+
+Repository --> Service: 업데이트 완료
+deactivate Repository
+
+Service --> Controller: RecordingResponse(recordingId, status, storagePath)
+deactivate Service
+
+Controller --> EventHub: RecordingStarted 이벤트 발행\n(recordingId, meetingId, status)
+
+Controller --> Gateway: 200 OK\n{sessionId, streamUrl}
+deactivate Controller
+
+== 음성 스트리밍 및 화자 식별 처리 ==
+
+Frontend -> Gateway: WebSocket /ws/stt/{sessionId}\n[audio stream]
+activate Gateway
+
+Gateway -> Controller: 음성 데이터 수신
+activate Controller
+
+Controller -> Service: processAudioStream(sessionId, audioData)
+activate Service
+
+Service -> StreamManager: streamAudio(audioData)
+activate StreamManager
+
+StreamManager -> AzureClient: recognizeAsync(audioData)
+activate AzureClient
+
+AzureClient --> StreamManager: partial result\n(text, timestamp)
+deactivate AzureClient
+
+StreamManager --> Service: recognized text
+deactivate StreamManager
+
+== 화자 식별 ==
+
+Service -> Speaker: identifySpeaker(audioFrame)
+activate Speaker
+
+Speaker -> AzureClient: analyzeSpeakerProfile()\n(Speaker Recognition API)
+activate AzureClient
+note right
+  화자 식별:
+  - Voice signature 생성
+  - 기존 프로필과 매칭
+  - 신규 화자 자동 등록
+end note
+
+AzureClient --> Speaker: speakerId
+deactivate AzureClient
+
+Speaker --> Service: speaker info
+deactivate Speaker
+
+== 화자별 세그먼트 저장 ==
+
+Service -> Repository: saveSttSegment(segment)
+activate Repository
+
+Repository -> DB: STT 세그먼트 저장\n(세션ID, 텍스트, 화자ID, 타임스탬프, 신뢰도)
+activate DB
+DB --> Repository: segment saved
+deactivate DB
+
+Repository --> Service: saved
+deactivate Repository
+
+Service -> Repository: updateSpeakerInfo(recordingId, speakerId)
+activate Repository
+
+Repository -> DB: 화자 정보 저장/업데이트\n(녹음ID, 화자ID, 세그먼트수)
+activate DB
+DB --> Repository: 업데이트 완료
+deactivate DB
+
+Repository --> Service: 완료
+deactivate Repository
+
+Service --> Controller: streaming response\n{text, speaker, timestamp, confidence}
+deactivate Service
+
+Controller --> Gateway: WebSocket message
+deactivate Controller
+
+Gateway --> Frontend: 실시간 자막 전송\n{text, speaker, timestamp}
+deactivate Gateway
+
+note over Frontend, EventHub
+처리 시간:
+- DB 녹음 생성: ~100ms
+- Azure 인식기 초기화: ~500ms
+- Blob 경로 생성: ~200ms
+- 화자 식별: ~300ms
+- 실시간 인식 지연: < 1초
+- 총 초기화 시간: ~1.1초
+
+정확도:
+- 화자 식별 정확도: > 90%
+- 음성 인식 정확도: 60-95%
+end note
+
+@enduml
--- a/design/backend/sequence/inner/stt-음성녹음시작.puml
+++ b/design/backend/sequence/inner/stt-음성녹음시작.puml
@ -1,87 +0,0 @@
-@startuml
-!theme mono
-
-title STT Service - 음성녹음시작 내부 시퀀스
-
-participant "RecordingController" as Controller
-participant "RecordingService" as Service
-participant "RecordingRepository" as Repository
-participant "AzureSpeechClient" as AzureClient
-database "STT DB" as DB
-database "Azure Blob Storage<<E>>" as BlobStorage
-queue "Azure Event Hubs<<E>>" as EventHub
-
-== MeetingStarted 이벤트 수신 ==
-
-EventHub -> Controller: MeetingStarted 이벤트 수신\n(meetingId, sessionId)
-activate Controller
-
-Controller -> Service: prepareRecording(meetingId, sessionId)
-activate Service
-
-Service -> Service: 녹음 세션 검증
-note right
-  - 중복 녹음 방지 체크
-  - meetingId 유효성 검증
-end note
-
-Service -> Repository: createRecording(meetingId, sessionId)
-activate Repository
-
-Repository -> DB: 녹음 세션 생성\n(녹음ID, 회의ID, 세션ID, 상태, 생성일시)
-activate DB
-DB --> Repository: recordingId 반환
-deactivate DB
-
-Repository --> Service: RecordingEntity 반환
-deactivate Repository
-
-== Azure Speech Service 초기화 ==
-
-Service -> AzureClient: initializeRecognizer(recordingId, sessionId)
-activate AzureClient
-
-AzureClient -> AzureClient: 음성 인식기 설정
-note right
-  - 언어: ko-KR
-  - 샘플레이트: 16kHz
-  - 화자 식별 활성화
-  - 실시간 스트리밍 모드
-end note
-
-AzureClient -> BlobStorage: 녹음 파일 저장 경로 생성\n(path: recordings/{meetingId}/{sessionId}.wav)
-activate BlobStorage
-BlobStorage --> AzureClient: 저장 경로 URL 반환
-deactivate BlobStorage
-
-AzureClient --> Service: RecognizerConfig 반환
-deactivate AzureClient
-
-== 녹음 상태 업데이트 ==
-
-Service -> Repository: updateRecordingStatus(recordingId, "RECORDING")
-activate Repository
-
-Repository -> DB: 녹음 상태 업데이트\n(상태='녹음중', 시작일시, 저장경로)
-activate DB
-DB --> Repository: 업데이트 완료
-deactivate DB
-
-Repository --> Service: 업데이트 완료
-deactivate Repository
-
-Service --> Controller: RecordingResponse(recordingId, status, storagePath)
-deactivate Service
-
-Controller --> EventHub: RecordingStarted 이벤트 발행\n(recordingId, meetingId, status)
-deactivate Controller
-
-note over Controller, EventHub
-처리 시간:
- DB 녹음 생성: ~100ms
- Azure 인식기 초기화: ~500ms
- Blob 경로 생성: ~200ms
- 총 처리 시간: ~800ms
-end note
-
-@enduml
--- a/design/backend/sequence/inner/stt-음성녹음인식.puml
+++ b/design/backend/sequence/inner/stt-음성녹음인식.puml
@ -1,117 +0,0 @@
-@startuml
-!theme mono
-
-title 음성녹음 및 화자 식별 내부 시퀀스 (UFR-STT-010)
-
-participant "API Gateway<<E>>" as Gateway
-participant "SttController" as Controller
-participant "SttService" as Service
-participant "AudioStreamManager" as StreamManager
-participant "SpeakerIdentifier" as Speaker
-participant "Azure Speech<<E>>" as Speech
-participant "SttRepository" as Repository
-database "PostgreSQL<<E>>" as DB
-queue "Event Hub<<E>>" as EventHub
-
-Gateway -> Controller: POST /api/v1/stt/start-recording\n{meetingId, userId}
-activate Controller
-
-Controller -> Service: startRecording(meetingId, userId)
-activate Service
-
-Service -> Repository: findMeetingById(meetingId)
-activate Repository
-Repository -> DB: 회의 정보 조회\n(회의ID 기준)
-DB --> Repository: meeting data
-Repository --> Service: Meeting entity
-deactivate Repository
-
-Service -> StreamManager: initializeStream(meetingId)
-activate StreamManager
-StreamManager -> Speech: createRecognizer()\n(Azure Speech API)
-note right
-  Azure Speech 설정:
-  - Language: ko-KR
-  - Format: PCM 16kHz
-  - Continuous recognition
-end note
-Speech --> StreamManager: recognizer instance
-StreamManager --> Service: stream session
-deactivate StreamManager
-
-Service -> Speaker: identifySpeaker(audioFrame)
-activate Speaker
-Speaker -> Speech: analyzeSpeakerProfile()\n(Speaker Recognition API)
-note right
-  화자 식별:
-  - Voice signature 생성
-  - 기존 프로필과 매칭
-  - 신규 화자 자동 등록
-end note
-Speech --> Speaker: speakerId
-Speaker --> Service: speaker info
-deactivate Speaker
-
-Service -> Repository: saveSttSession(session)
-activate Repository
-Repository -> DB: STT 세션 저장\n(회의ID, 상태, 시작일시)
-DB --> Repository: session saved
-Repository --> Service: SttSession entity
-deactivate Repository
-
-Service -> EventHub: publish(SttStartedEvent)
-note right
-  Event:
-  - meetingId
-  - sessionId
-  - startedAt
-end note
-
-Service --> Controller: RecordingStartResponse\n{sessionId, status}
-deactivate Service
-
-Controller --> Gateway: 200 OK\n{sessionId, streamUrl}
-deactivate Controller
-
-== 음성 스트리밍 처리 ==
-
-Gateway -> Controller: WebSocket /ws/stt/{sessionId}\n[audio stream]
-activate Controller
-
-Controller -> Service: processAudioStream(sessionId, audioData)
-activate Service
-
-Service -> StreamManager: streamAudio(audioData)
-activate StreamManager
-
-StreamManager -> Speech: recognizeAsync(audioData)
-Speech --> StreamManager: partial result
-note right
-  실시간 인식:
-  - Partial text
-  - Confidence score
-  - Timestamp
-end note
-
-StreamManager --> Service: recognized text
-deactivate StreamManager
-
-Service -> Speaker: updateSpeakerMapping(text, timestamp)
-activate Speaker
-Speaker --> Service: speaker segment
-deactivate Speaker
-
-Service -> Repository: saveSttSegment(segment)
-activate Repository
-Repository -> DB: STT 세그먼트 저장\n(세션ID, 텍스트, 화자ID, 타임스탬프)
-DB --> Repository: segment saved
-Repository --> Service: saved
-deactivate Repository
-
-Service --> Controller: streaming response
-deactivate Service
-
-Controller --> Gateway: WebSocket message\n{text, speaker, timestamp}
-deactivate Controller
-
-@enduml
--- a/design/backend/sequence/inner/stt-음성텍스트변환.puml
+++ b/design/backend/sequence/inner/stt-음성텍스트변환.puml
@ -1,115 +0,0 @@
-@startuml
-!theme mono
-
-title STT Service - 음성텍스트변환 내부 시퀀스
-
-participant "Frontend<<E>>" as Frontend
-participant "TranscriptController" as Controller
-participant "TranscriptService" as Service
-participant "RecordingRepository" as RecordingRepo
-participant "TranscriptRepository" as TranscriptRepo
-participant "AzureSpeechClient" as AzureClient
-database "STT DB" as DB
-database "Azure Blob Storage<<E>>" as BlobStorage
-queue "Azure Event Hubs<<E>>" as EventHub
-
-== 음성 데이터 스트리밍 수신 (5초 간격 배치) ==
-
-Frontend -> Controller: POST /api/transcripts/stream\n(audioData, recordingId, timestamp)
-activate Controller
-
-Controller -> Service: processAudioStream(audioData, recordingId)
-activate Service
-
-== 음성 인식 처리 ==
-
-Service -> AzureClient: recognizeAudio(audioData)
-activate AzureClient
-
-AzureClient -> AzureClient: 음성 인식 수행
-note right
-  - 실시간 STT 처리
-  - 화자 식별 (Speaker Diarization)
-  - 타임스탬프 자동 기록
-  - 신뢰도 점수 계산
-end note
-
-AzureClient -> BlobStorage: 음성 파일 저장\n(chunk 단위 저장)
-activate BlobStorage
-BlobStorage --> AzureClient: 저장 완료
-deactivate BlobStorage
-
-AzureClient --> Service: RecognitionResult\n(text, speakerId, confidence, timestamp)
-deactivate AzureClient
-
-== 정확도 검증 및 처리 ==
-
-Service -> Service: 정확도 점수 검증
-note right
-  confidence >= 60%: 정상 처리
-  confidence < 60%: 경고 플래그 설정
-end note
-
-== 변환 결과 저장 ==
-
-Service -> TranscriptRepo: createTranscript(recordingId, text, metadata)
-activate TranscriptRepo
-
-TranscriptRepo -> DB: 변환 결과 저장\n(텍스트ID, 녹음ID, 화자ID, 텍스트, 신뢰도, 타임스탬프, 경고플래그)
-activate DB
-DB --> TranscriptRepo: transcriptId 반환
-deactivate DB
-
-TranscriptRepo --> Service: TranscriptEntity 반환
-deactivate TranscriptRepo
-
-== 화자 정보 업데이트 ==
-
-Service -> RecordingRepo: updateSpeakerInfo(recordingId, speakerId)
-activate RecordingRepo
-
-RecordingRepo -> DB: 화자 정보 저장/업데이트\n(녹음ID, 화자ID, 세그먼트수)
-activate DB
-DB --> RecordingRepo: 업데이트 완료
-deactivate DB
-
-RecordingRepo --> Service: 완료
-deactivate RecordingRepo
-
-== 이벤트 발행 ==
-
-Service -> EventHub: TranscriptReady 이벤트 발행
-activate EventHub
-note right of EventHub
-  이벤트 데이터:
-  - transcriptId
-  - recordingId
-  - meetingId
-  - text
-  - speakerId
-  - timestamp
-  - confidence
-end note
-EventHub --> Service: 발행 완료
-deactivate EventHub
-
-Service --> Controller: TranscriptResponse\n(transcriptId, text, confidence, warningFlag)
-deactivate Service
-
-Controller --> Frontend: 200 OK\n(transcriptId, text, speakerId, timestamp, confidence)
-deactivate Controller
-
-note over Frontend, EventHub
-처리 시간:
- Azure STT 처리: 1-3초
- DB 저장: ~100ms
- Event 발행: ~50ms
- 총 처리 시간: 1-4초
-
-정확도 경고:
- 60% 미만: 수동 수정 권장
- 60-80%: 검토 권장
- 80% 이상: 정상
-end note
-
-@enduml
--- a/design/backend/sequence/inner/stt-텍스트변환.puml
+++ b/design/backend/sequence/inner/stt-텍스트변환.puml
@ -1,145 +0,0 @@
-@startuml
-!theme mono
-
-title 음성-텍스트 변환 내부 시퀀스 (UFR-STT-020)
-
-participant "API Gateway<<E>>" as Gateway
-participant "SttController" as Controller
-participant "SttService" as Service
-participant "TranscriptionEngine" as Engine
-participant "Azure Speech<<E>>" as Speech
-participant "SttRepository" as Repository
-database "PostgreSQL<<E>>" as DB
-queue "Event Hub<<E>>" as EventHub
-
-Gateway -> Controller: POST /api/v1/stt/transcribe\n{sessionId, audioFile}
-activate Controller
-
-Controller -> Service: transcribeAudio(sessionId, audioFile)
-activate Service
-
-Service -> Repository: findSessionById(sessionId)
-activate Repository
-Repository -> DB: STT 세션 조회\n(세션ID 기준)
-DB --> Repository: session data
-Repository --> Service: SttSession entity
-deactivate Repository
-
-alt 실시간 변환 모드
-    Service -> Engine: streamingTranscribe(audioFile)
-    activate Engine
-
-    Engine -> Speech: createRecognizer()\nsetContinuousRecognition()
-    note right
-      Azure Speech 설정:
-      - Mode: Continuous
-      - Language: ko-KR
-      - Enable diarization
-      - Profanity filter
-    end note
-
-    Speech --> Engine: recognizer instance
-
-    loop 오디오 청크 처리
-        Engine -> Speech: recognizeOnceAsync(audioChunk)
-        Speech --> Engine: recognition result
-        note right
-          결과 포함:
-          - Text
-          - Confidence
-          - Duration
-          - Speaker ID
-        end note
-
-        Engine -> Engine: validateConfidence(result)
-        note right
-          신뢰도 검증:
-          - Threshold: 0.7
-          - Low confidence 처리
-        end note
-
-        Engine --> Service: transcription segment
-
-        Service -> Repository: saveSttSegment(segment)
-        activate Repository
-        Repository -> DB: STT 세그먼트 저장\n(세션ID, 텍스트, 신뢰도, 타임스탬프)
-        DB --> Repository: saved
-        Repository --> Service: segment saved
-        deactivate Repository
-
-        Service -> EventHub: publish(TranscriptionSegmentEvent)
-        note right
-          Event:
-          - sessionId
-          - segmentId
-          - text
-          - timestamp
-        end note
-    end
-
-    Engine --> Service: streaming complete
-    deactivate Engine
-
-else 배치 변환 모드
-    Service -> Engine: batchTranscribe(audioFile)
-    activate Engine
-
-    Engine -> Speech: batchTranscriptionAsync(audioUrl)
-    note right
-      배치 처리:
-      - 전체 파일 업로드
-      - 백그라운드 처리
-      - Callback URL 제공
-    end note
-
-    Speech --> Engine: transcription job ID
-
-    Engine --> Service: job submitted
-    deactivate Engine
-
-    Service -> Repository: updateSessionStatus(sessionId, "PROCESSING")
-    activate Repository
-    Repository -> DB: 세션 상태 업데이트\n(상태='처리중')
-    DB --> Repository: updated
-    Repository --> Service: updated
-    deactivate Repository
-end
-
-Service -> Repository: aggregateTranscription(sessionId)
-activate Repository
-Repository -> DB: 세그먼트 목록 조회\n(세션ID 기준, 타임스탬프 순 정렬)
-DB --> Repository: segments
-Repository --> Service: ordered segments
-deactivate Repository
-
-Service -> Service: mergeSegments(segments)
-note right
-  세그먼트 병합:
-  - 화자별 그룹화
-  - 시간 순서 정렬
-  - 문장 경계 보정
-end note
-
-Service -> Repository: saveTranscription(fullText)
-activate Repository
-Repository -> DB: 전체 텍스트 저장 및 상태 업데이트\n(전체텍스트, 상태='완료')
-DB --> Repository: saved
-Repository --> Service: updated session
-deactivate Repository
-
-Service -> EventHub: publish(TranscriptionCompletedEvent)
-note right
-  Event:
-  - sessionId
-  - meetingId
-  - fullText
-  - completedAt
-end note
-
-Service --> Controller: TranscriptionResponse\n{sessionId, text, segments}
-deactivate Service
-
-Controller --> Gateway: 200 OK\n{transcription, metadata}
-deactivate Controller
-
-@enduml
--- a/design/backend/sequence/inner/stt-텍스트변환통합.puml
+++ b/design/backend/sequence/inner/stt-텍스트변환통합.puml
@ -0,0 +1,244 @@
+@startuml
+!theme mono
+
+title STT Service - 음성-텍스트 변환 (실시간/배치 통합)
+
+participant "Frontend<<E>>" as Frontend
+participant "API Gateway<<E>>" as Gateway
+participant "TranscriptController" as Controller
+participant "TranscriptService" as Service
+participant "TranscriptionEngine" as Engine
+participant "RecordingRepository" as RecordingRepo
+participant "TranscriptRepository" as TranscriptRepo
+participant "AzureSpeechClient" as AzureClient
+database "STT DB" as DB
+database "Azure Blob Storage<<E>>" as BlobStorage
+queue "Azure Event Hubs<<E>>" as EventHub
+
+== 음성 데이터 스트리밍 수신 (실시간 모드) ==
+
+Frontend -> Gateway: POST /api/transcripts/stream\n(audioData, recordingId, timestamp)
+activate Gateway
+
+Gateway -> Controller: 음성 스트림 요청
+activate Controller
+
+Controller -> Service: processAudioStream(audioData, recordingId)
+activate Service
+
+alt 실시간 변환 모드
+    Service -> Engine: streamingTranscribe(audioData)
+    activate Engine
+
+    Engine -> AzureClient: recognizeAsync(audioData)
+    activate AzureClient
+
+    AzureClient -> AzureClient: 실시간 음성 인식 수행
+    note right
+      Azure Speech 설정:
+      - Mode: Continuous
+      - 언어: ko-KR
+      - 화자 식별 활성화
+      - 타임스탬프 자동 기록
+      - 신뢰도 점수 계산
+      - Profanity filter
+    end note
+
+    AzureClient -> BlobStorage: 음성 파일 저장\n(chunk 단위 저장)
+    activate BlobStorage
+    BlobStorage --> AzureClient: 저장 완료
+    deactivate BlobStorage
+
+    AzureClient --> Engine: RecognitionResult\n(text, speakerId, confidence, timestamp, duration)
+    deactivate AzureClient
+
+    == 정확도 검증 및 처리 ==
+
+    Engine -> Engine: validateConfidence(result)
+    note right
+      신뢰도 검증:
+      - Threshold: 0.7 (70%)
+      - confidence >= 80%: 정상 처리
+      - 60% <= confidence < 80%: 검토 권장
+      - confidence < 60%: 경고 플래그 설정
+    end note
+
+    Engine --> Service: transcription segment
+    deactivate Engine
+
+    == 변환 결과 저장 ==
+
+    Service -> TranscriptRepo: createTranscript(recordingId, segment)
+    activate TranscriptRepo
+
+    TranscriptRepo -> DB: 변환 결과 저장\n(텍스트ID, 녹음ID, 화자ID, 텍스트, 신뢰도, 타임스탬프, 경고플래그)
+    activate DB
+    DB --> TranscriptRepo: transcriptId 반환
+    deactivate DB
+
+    TranscriptRepo --> Service: TranscriptEntity 반환
+    deactivate TranscriptRepo
+
+    == 화자 정보 업데이트 ==
+
+    Service -> RecordingRepo: updateSpeakerInfo(recordingId, speakerId)
+    activate RecordingRepo
+
+    RecordingRepo -> DB: 화자 정보 저장/업데이트\n(녹음ID, 화자ID, 세그먼트수)
+    activate DB
+    DB --> RecordingRepo: 업데이트 완료
+    deactivate DB
+
+    RecordingRepo --> Service: 완료
+    deactivate RecordingRepo
+
+    == 이벤트 발행 ==
+
+    Service -> EventHub: TranscriptSegmentReady 이벤트 발행
+    activate EventHub
+    note right of EventHub
+      이벤트 데이터:
+      - transcriptId
+      - recordingId
+      - meetingId
+      - text
+      - speakerId
+      - timestamp
+      - confidence
+    end note
+    EventHub --> Service: 발행 완료
+    deactivate EventHub
+
+    Service --> Controller: TranscriptResponse\n(transcriptId, text, confidence, warningFlag)
+    deactivate Service
+
+    Controller --> Gateway: 200 OK\n(transcriptId, text, speakerId, timestamp, confidence)
+    deactivate Controller
+
+    Gateway --> Frontend: 실시간 자막 응답
+    deactivate Gateway
+
+else 배치 변환 모드
+    Gateway -> Controller: POST /api/v1/stt/transcribe\n{sessionId, audioFile}
+    activate Controller
+
+    Controller -> Service: transcribeAudio(sessionId, audioFile)
+    activate Service
+
+    Service -> RecordingRepo: findSessionById(sessionId)
+    activate RecordingRepo
+    RecordingRepo -> DB: STT 세션 조회\n(세션ID 기준)
+    DB --> RecordingRepo: session data
+    RecordingRepo --> Service: RecordingEntity
+    deactivate RecordingRepo
+
+    Service -> Engine: batchTranscribe(audioFile)
+    activate Engine
+
+    Engine -> AzureClient: batchTranscriptionAsync(audioUrl)
+    activate AzureClient
+    note right
+      배치 처리:
+      - 전체 파일 업로드
+      - 백그라운드 처리
+      - Callback URL 제공
+      - 화자별 그룹화
+      - 문장 경계 보정
+    end note
+
+    AzureClient --> Engine: transcription job ID
+    deactivate AzureClient
+
+    Engine --> Service: job submitted
+    deactivate Engine
+
+    Service -> RecordingRepo: updateSessionStatus(sessionId, "PROCESSING")
+    activate RecordingRepo
+    RecordingRepo -> DB: 세션 상태 업데이트\n(상태='처리중')
+    DB --> RecordingRepo: updated
+    RecordingRepo --> Service: updated
+    deactivate RecordingRepo
+
+    Service --> Controller: 202 Accepted\n{jobId, status}
+    deactivate Service
+
+    Controller --> Gateway: 202 Accepted
+    deactivate Controller
+
+    == 배치 처리 완료 (Callback) ==
+
+    AzureClient -> Controller: POST /api/v1/stt/callback\n{jobId, segments}
+    activate Controller
+
+    Controller -> Service: processBatchResult(jobId, segments)
+    activate Service
+
+    loop 각 세그먼트 처리
+        Service -> TranscriptRepo: createTranscript(recordingId, segment)
+        activate TranscriptRepo
+        TranscriptRepo -> DB: 변환 결과 저장
+        DB --> TranscriptRepo: saved
+        TranscriptRepo --> Service: saved
+        deactivate TranscriptRepo
+    end
+
+    == 전체 텍스트 통합 ==
+
+    Service -> TranscriptRepo: aggregateTranscription(sessionId)
+    activate TranscriptRepo
+    TranscriptRepo -> DB: 세그먼트 목록 조회\n(세션ID 기준, 타임스탬프 순 정렬)
+    DB --> TranscriptRepo: ordered segments
+    TranscriptRepo --> Service: segments
+    deactivate TranscriptRepo
+
+    Service -> Service: mergeSegments(segments)
+    note right
+      세그먼트 병합:
+      - 화자별 그룹화
+      - 시간 순서 정렬
+      - 문장 경계 보정
+    end note
+
+    Service -> RecordingRepo: saveTranscription(fullText)
+    activate RecordingRepo
+    RecordingRepo -> DB: 전체 텍스트 저장 및 상태 업데이트\n(전체텍스트, 상태='완료')
+    DB --> RecordingRepo: saved
+    RecordingRepo --> Service: updated session
+    deactivate RecordingRepo
+
+    Service -> EventHub: TranscriptionCompletedEvent 발행
+    note right
+      Event:
+      - sessionId
+      - meetingId
+      - fullText
+      - completedAt
+    end note
+
+    Service --> Controller: TranscriptionResponse\n{sessionId, text, segments}
+    deactivate Service
+
+    Controller --> Gateway: 200 OK\n{transcription, metadata}
+    deactivate Controller
+end
+
+note over Frontend, EventHub
+**실시간 모드 처리 시간:**
+- Azure STT 처리: 1-3초
+- DB 저장: ~100ms
+- Event 발행: ~50ms
+- 총 처리 시간: 1-4초
+
+**배치 모드 처리 시간:**
+- 파일 업로드: ~1-2초
+- Azure 배치 처리: 5-30초 (파일 크기에 따라)
+- DB 저장: ~500ms
+- 총 처리 시간: 7-33초
+
+**정확도 경고 기준:**
+- < 60%: 수동 수정 권장 (경고 플래그)
+- 60-80%: 검토 권장
+- >= 80%: 정상
+end note
+
+@enduml