@startuml !theme mono title 음성녹음 및 화자 식별 내부 시퀀스 (UFR-STT-010) participant "API Gateway<>" as Gateway participant "SttController" as Controller participant "SttService" as Service participant "AudioStreamManager" as StreamManager participant "SpeakerIdentifier" as Speaker participant "Azure Speech<>" as Speech participant "SttRepository" as Repository database "PostgreSQL<>" as DB queue "Event Hub<>" as EventHub Gateway -> Controller: POST /api/v1/stt/start-recording\n{meetingId, userId} activate Controller Controller -> Service: startRecording(meetingId, userId) activate Service Service -> Repository: findMeetingById(meetingId) activate Repository Repository -> DB: 회의 정보 조회\n(회의ID 기준) DB --> Repository: meeting data Repository --> Service: Meeting entity deactivate Repository Service -> StreamManager: initializeStream(meetingId) activate StreamManager StreamManager -> Speech: createRecognizer()\n(Azure Speech API) note right Azure Speech 설정: - Language: ko-KR - Format: PCM 16kHz - Continuous recognition end note Speech --> StreamManager: recognizer instance StreamManager --> Service: stream session deactivate StreamManager Service -> Speaker: identifySpeaker(audioFrame) activate Speaker Speaker -> Speech: analyzeSpeakerProfile()\n(Speaker Recognition API) note right 화자 식별: - Voice signature 생성 - 기존 프로필과 매칭 - 신규 화자 자동 등록 end note Speech --> Speaker: speakerId Speaker --> Service: speaker info deactivate Speaker Service -> Repository: saveSttSession(session) activate Repository Repository -> DB: STT 세션 저장\n(회의ID, 상태, 시작일시) DB --> Repository: session saved Repository --> Service: SttSession entity deactivate Repository Service -> EventHub: publish(SttStartedEvent) note right Event: - meetingId - sessionId - startedAt end note Service --> Controller: RecordingStartResponse\n{sessionId, status} deactivate Service Controller --> Gateway: 200 OK\n{sessionId, streamUrl} deactivate Controller == 음성 스트리밍 처리 == Gateway -> Controller: WebSocket /ws/stt/{sessionId}\n[audio stream] activate Controller Controller -> Service: processAudioStream(sessionId, audioData) activate Service Service -> StreamManager: streamAudio(audioData) activate StreamManager StreamManager -> Speech: recognizeAsync(audioData) Speech --> StreamManager: partial result note right 실시간 인식: - Partial text - Confidence score - Timestamp end note StreamManager --> Service: recognized text deactivate StreamManager Service -> Speaker: updateSpeakerMapping(text, timestamp) activate Speaker Speaker --> Service: speaker segment deactivate Speaker Service -> Repository: saveSttSegment(segment) activate Repository Repository -> DB: STT 세그먼트 저장\n(세션ID, 텍스트, 화자ID, 타임스탬프) DB --> Repository: segment saved Repository --> Service: saved deactivate Repository Service --> Controller: streaming response deactivate Service Controller --> Gateway: WebSocket message\n{text, speaker, timestamp} deactivate Controller @enduml