@startuml !theme mono title 음성녹음 및 화자 식별 내부 시퀀스 (UFR-STT-010) participant "API Gateway<>" as Gateway participant "SttController" as Controller participant "SttService" as Service participant "AudioStreamManager" as StreamManager participant "SpeakerIdentifier" as Speaker participant "Azure Speech<>" as Speech participant "SttRepository" as Repository database "PostgreSQL<>" as DB queue "Event Hub<>" as EventHub Gateway -> Controller: POST /api/v1/stt/start-recording\n{meetingId, userId} activate Controller Controller -> Service: startRecording(meetingId, userId) activate Service Service -> Repository: findMeetingById(meetingId) activate Repository Repository -> DB: SELECT * FROM meetings\nWHERE meeting_id = ? DB --> Repository: meeting data Repository --> Service: Meeting entity deactivate Repository Service -> StreamManager: initializeStream(meetingId) activate StreamManager StreamManager -> Speech: createRecognizer()\n(Azure Speech API) note right Azure Speech 설정: - Language: ko-KR - Format: PCM 16kHz - Continuous recognition end note Speech --> StreamManager: recognizer instance StreamManager --> Service: stream session deactivate StreamManager Service -> Speaker: identifySpeaker(audioFrame) activate Speaker Speaker -> Speech: analyzeSpeakerProfile()\n(Speaker Recognition API) note right 화자 식별: - Voice signature 생성 - 기존 프로필과 매칭 - 신규 화자 자동 등록 end note Speech --> Speaker: speakerId Speaker --> Service: speaker info deactivate Speaker Service -> Repository: saveSttSession(session) activate Repository Repository -> DB: INSERT INTO stt_sessions\n(meeting_id, status, started_at) DB --> Repository: session saved Repository --> Service: SttSession entity deactivate Repository Service -> EventHub: publish(SttStartedEvent) note right Event: - meetingId - sessionId - startedAt end note Service --> Controller: RecordingStartResponse\n{sessionId, status} deactivate Service Controller --> Gateway: 200 OK\n{sessionId, streamUrl} deactivate Controller == 음성 스트리밍 처리 == Gateway -> Controller: WebSocket /ws/stt/{sessionId}\n[audio stream] activate Controller Controller -> Service: processAudioStream(sessionId, audioData) activate Service Service -> StreamManager: streamAudio(audioData) activate StreamManager StreamManager -> Speech: recognizeAsync(audioData) Speech --> StreamManager: partial result note right 실시간 인식: - Partial text - Confidence score - Timestamp end note StreamManager --> Service: recognized text deactivate StreamManager Service -> Speaker: updateSpeakerMapping(text, timestamp) activate Speaker Speaker --> Service: speaker segment deactivate Speaker Service -> Repository: saveSttSegment(segment) activate Repository Repository -> DB: INSERT INTO stt_segments\n(session_id, text, speaker_id, timestamp) DB --> Repository: segment saved Repository --> Service: saved deactivate Repository Service --> Controller: streaming response deactivate Service Controller --> Gateway: WebSocket message\n{text, speaker, timestamp} deactivate Controller @enduml