hgzero/design/backend/sequence/inner/stt-녹음시작및인식.puml

@startuml
!theme mono

title STT Service - 음성 녹음 시작 및 실시간 인식

participant "Frontend<<E>>" as Frontend
participant "API Gateway<<E>>" as Gateway
participant "RecordingController" as Controller
participant "RecordingService" as Service
participant "AudioStreamManager" as StreamManager
participant "RecordingRepository" as Repository
participant "AzureSpeechClient" as AzureClient
database "STT DB" as DB
database "Azure Blob Storage<<E>>" as BlobStorage
queue "Azure Event Hubs<<E>>" as EventHub

== 회의 시작 이벤트 수신 및 녹음 준비 ==

EventHub -> Controller: MeetingStarted 이벤트 수신\n(meetingId, sessionId)
activate Controller

Controller -> Service: prepareRecording(meetingId, sessionId)
activate Service

Service -> Service: 녹음 세션 검증
note right
  - 중복 녹음 방지 체크
  - meetingId 유효성 검증
end note

Service -> Repository: createRecording(meetingId, sessionId)
activate Repository

Repository -> DB: 녹음 세션 생성\n(녹음ID, 회의ID, 세션ID, 상태, 생성일시)
activate DB
DB --> Repository: recordingId 반환
deactivate DB

Repository --> Service: RecordingEntity 반환
deactivate Repository

== Azure Speech Service 초기화 ==

Service -> AzureClient: initializeRecognizer(recordingId, sessionId)
activate AzureClient

AzureClient -> AzureClient: 음성 인식기 설정
note right
  Azure Speech 설정:
  - 언어: ko-KR
  - Format: PCM 16kHz
  - 샘플레이트: 16kHz
  - 실시간 스트리밍 모드
  - Continuous recognition
end note

AzureClient -> BlobStorage: 녹음 파일 저장 경로 생성\n(path: recordings/{meetingId}/{sessionId}.wav)
activate BlobStorage
BlobStorage --> AzureClient: 저장 경로 URL 반환
deactivate BlobStorage

AzureClient --> Service: RecognizerConfig 반환
deactivate AzureClient

== 녹음 상태 업데이트 ==

Service -> Repository: updateRecordingStatus(recordingId, "RECORDING")
activate Repository

Repository -> DB: 녹음 상태 업데이트\n(상태='녹음중', 시작일시, 저장경로)
activate DB
DB --> Repository: 업데이트 완료
deactivate DB

Repository --> Service: 업데이트 완료
deactivate Repository

Service --> Controller: RecordingResponse(recordingId, status, storagePath)
deactivate Service

Controller --> EventHub: RecordingStarted 이벤트 발행\n(recordingId, meetingId, status)

Controller --> Gateway: 200 OK\n{sessionId, streamUrl}
deactivate Controller

== 음성 스트리밍 및 화자 식별 처리 ==

Frontend -> Gateway: WebSocket /ws/stt/{sessionId}\n[audio stream]
activate Gateway

Gateway -> Controller: 음성 데이터 수신
activate Controller

Controller -> Service: processAudioStream(sessionId, audioData)
activate Service

Service -> StreamManager: streamAudio(audioData)
activate StreamManager

StreamManager -> AzureClient: recognizeAsync(audioData)
activate AzureClient

AzureClient --> StreamManager: partial result\n(text, timestamp)
deactivate AzureClient

StreamManager --> Service: recognized text
deactivate StreamManager

== 세그먼트 저장 ==

Service -> Repository: saveSttSegment(segment)
activate Repository

Repository -> DB: STT 세그먼트 저장\n(세션ID, 텍스트, 타임스탬프, 신뢰도)
activate DB
DB --> Repository: segment saved
deactivate DB

Repository --> Service: saved
deactivate Repository

Service --> Controller: streaming response\n{text, timestamp, confidence}
deactivate Service

Controller --> Gateway: WebSocket message
deactivate Controller

Gateway --> Frontend: 실시간 자막 전송\n{text, timestamp}
deactivate Gateway

note over Frontend, EventHub
처리 시간:
- DB 녹음 생성: ~100ms
- Azure 인식기 초기화: ~500ms
- Blob 경로 생성: ~200ms
- 실시간 인식 지연: < 1초
- 총 초기화 시간: ~0.8초

정확도:
- 음성 인식 정확도: 60-95%
end note

@enduml