@startuml !theme mono title STT Service - 음성 녹음 시작 및 화자 인식 (통합) participant "Frontend<>" as Frontend participant "API Gateway<>" as Gateway participant "RecordingController" as Controller participant "RecordingService" as Service participant "AudioStreamManager" as StreamManager participant "SpeakerIdentifier" as Speaker participant "RecordingRepository" as Repository participant "AzureSpeechClient" as AzureClient database "STT DB" as DB database "Azure Blob Storage<>" as BlobStorage queue "Azure Event Hubs<>" as EventHub == 회의 시작 이벤트 수신 및 녹음 준비 == EventHub -> Controller: MeetingStarted 이벤트 수신\n(meetingId, sessionId) activate Controller Controller -> Service: prepareRecording(meetingId, sessionId) activate Service Service -> Service: 녹음 세션 검증 note right - 중복 녹음 방지 체크 - meetingId 유효성 검증 end note Service -> Repository: createRecording(meetingId, sessionId) activate Repository Repository -> DB: 녹음 세션 생성\n(녹음ID, 회의ID, 세션ID, 상태, 생성일시) activate DB DB --> Repository: recordingId 반환 deactivate DB Repository --> Service: RecordingEntity 반환 deactivate Repository == Azure Speech Service 초기화 == Service -> AzureClient: initializeRecognizer(recordingId, sessionId) activate AzureClient AzureClient -> AzureClient: 음성 인식기 설정 note right Azure Speech 설정: - 언어: ko-KR - Format: PCM 16kHz - 샘플레이트: 16kHz - 화자 식별 활성화 - 실시간 스트리밍 모드 - Continuous recognition end note AzureClient -> BlobStorage: 녹음 파일 저장 경로 생성\n(path: recordings/{meetingId}/{sessionId}.wav) activate BlobStorage BlobStorage --> AzureClient: 저장 경로 URL 반환 deactivate BlobStorage AzureClient --> Service: RecognizerConfig 반환 deactivate AzureClient == 녹음 상태 업데이트 == Service -> Repository: updateRecordingStatus(recordingId, "RECORDING") activate Repository Repository -> DB: 녹음 상태 업데이트\n(상태='녹음중', 시작일시, 저장경로) activate DB DB --> Repository: 업데이트 완료 deactivate DB Repository --> Service: 업데이트 완료 deactivate Repository Service --> Controller: RecordingResponse(recordingId, status, storagePath) deactivate Service Controller --> EventHub: RecordingStarted 이벤트 발행\n(recordingId, meetingId, status) Controller --> Gateway: 200 OK\n{sessionId, streamUrl} deactivate Controller == 음성 스트리밍 및 화자 식별 처리 == Frontend -> Gateway: WebSocket /ws/stt/{sessionId}\n[audio stream] activate Gateway Gateway -> Controller: 음성 데이터 수신 activate Controller Controller -> Service: processAudioStream(sessionId, audioData) activate Service Service -> StreamManager: streamAudio(audioData) activate StreamManager StreamManager -> AzureClient: recognizeAsync(audioData) activate AzureClient AzureClient --> StreamManager: partial result\n(text, timestamp) deactivate AzureClient StreamManager --> Service: recognized text deactivate StreamManager == 화자 식별 == Service -> Speaker: identifySpeaker(audioFrame) activate Speaker Speaker -> AzureClient: analyzeSpeakerProfile()\n(Speaker Recognition API) activate AzureClient note right 화자 식별: - Voice signature 생성 - 기존 프로필과 매칭 - 신규 화자 자동 등록 end note AzureClient --> Speaker: speakerId deactivate AzureClient Speaker --> Service: speaker info deactivate Speaker == 화자별 세그먼트 저장 == Service -> Repository: saveSttSegment(segment) activate Repository Repository -> DB: STT 세그먼트 저장\n(세션ID, 텍스트, 화자ID, 타임스탬프, 신뢰도) activate DB DB --> Repository: segment saved deactivate DB Repository --> Service: saved deactivate Repository Service -> Repository: updateSpeakerInfo(recordingId, speakerId) activate Repository Repository -> DB: 화자 정보 저장/업데이트\n(녹음ID, 화자ID, 세그먼트수) activate DB DB --> Repository: 업데이트 완료 deactivate DB Repository --> Service: 완료 deactivate Repository Service --> Controller: streaming response\n{text, speaker, timestamp, confidence} deactivate Service Controller --> Gateway: WebSocket message deactivate Controller Gateway --> Frontend: 실시간 자막 전송\n{text, speaker, timestamp} deactivate Gateway note over Frontend, EventHub 처리 시간: - DB 녹음 생성: ~100ms - Azure 인식기 초기화: ~500ms - Blob 경로 생성: ~200ms - 화자 식별: ~300ms - 실시간 인식 지연: < 1초 - 총 초기화 시간: ~1.1초 정확도: - 화자 식별 정확도: > 90% - 음성 인식 정확도: 60-95% end note @enduml