화자 식별 기능 제거 및 STT 서비스 단순화

프로토타입 검토 결과, 화자 식별 기능이 현재 요구사항에서 제외되어 관련 코드 및 설계 문서를 제거하고 현행화했습니다. 변경사항: 1. 백엔드 코드 정리 - Speaker 관련 컨트롤러, 서비스, 리포지토리 삭제 - Speaker 도메인, DTO, 이벤트 클래스 삭제 - Recording 및 Transcription 서비스에서 화자 관련 로직 제거 2. API 명세 현행화 (stt-service-api.yaml) - 화자 식별/관리 API 엔드포인트 제거 (/speakers/*) - 응답 스키마에서 speakerId, speakerName 필드 제거 - 화자 관련 스키마 전체 제거 (Speaker*) - API 설명에서 화자 식별 관련 내용 제거 3. 설계 문서 현행화 - STT 녹음 시퀀스: 화자 식별 단계 제거 - STT 텍스트변환 시퀀스: 화자 정보 업데이트 로직 제거, 배치 모드 제거 - 실시간 전용 기능으로 단순화 영향: - 화자별 발언 구분 기능 제거 - 실시간 음성-텍스트 변환에만 집중 - 시스템 복잡도 감소 및 성능 개선 (초기화 시간: 1.1초 → 0.8초) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2026-06-13 17:39:09 +00:00 · 2025-10-24 14:46:39 +09:00
parent e37d20942a
commit 694a84e4f5
29 changed files with 1115 additions and 1872 deletions
@@ -1,14 +1,13 @@
@startuml
 !theme mono

-title STT Service - 음성 녹음 시작 및 화자 인식 (통합)
+title STT Service - 음성 녹음 시작 및 실시간 인식

 participant "Frontend<<E>>" as Frontend
 participant "API Gateway<<E>>" as Gateway
 participant "RecordingController" as Controller
 participant "RecordingService" as Service
 participant "AudioStreamManager" as StreamManager
-participant "SpeakerIdentifier" as Speaker
 participant "RecordingRepository" as Repository
 participant "AzureSpeechClient" as AzureClient
 database "STT DB" as DB
@@ -51,7 +50,6 @@ note right
  - 언어: ko-KR
  - Format: PCM 16kHz
  - 샘플레이트: 16kHz
-  - 화자 식별 활성화
  - 실시간 스트리밍 모드
  - Continuous recognition
 end note
@@ -108,32 +106,12 @@ deactivate AzureClient
 StreamManager --> Service: recognized text
 deactivate StreamManager

-== 화자 식별 ==
-
-Service -> Speaker: identifySpeaker(audioFrame)
-activate Speaker
-
-Speaker -> AzureClient: analyzeSpeakerProfile()\n(Speaker Recognition API)
-activate AzureClient
-note right
-  화자 식별:
-  - Voice signature 생성
-  - 기존 프로필과 매칭
-  - 신규 화자 자동 등록
-end note
-
-AzureClient --> Speaker: speakerId
-deactivate AzureClient
-
-Speaker --> Service: speaker info
-deactivate Speaker
-
-== 화자별 세그먼트 저장 ==
+== 세그먼트 저장 ==

 Service -> Repository: saveSttSegment(segment)
 activate Repository

-Repository -> DB: STT 세그먼트 저장\n(세션ID, 텍스트, 화자ID, 타임스탬프, 신뢰도)
+Repository -> DB: STT 세그먼트 저장\n(세션ID, 텍스트, 타임스탬프, 신뢰도)
 activate DB
 DB --> Repository: segment saved
 deactivate DB
@@ -141,24 +119,13 @@ deactivate DB
 Repository --> Service: saved
 deactivate Repository

-Service -> Repository: updateSpeakerInfo(recordingId, speakerId)
-activate Repository
-
-Repository -> DB: 화자 정보 저장/업데이트\n(녹음ID, 화자ID, 세그먼트수)
-activate DB
-DB --> Repository: 업데이트 완료
-deactivate DB
-
-Repository --> Service: 완료
-deactivate Repository
-
-Service --> Controller: streaming response\n{text, speaker, timestamp, confidence}
+Service --> Controller: streaming response\n{text, timestamp, confidence}
 deactivate Service

 Controller --> Gateway: WebSocket message
 deactivate Controller

-Gateway --> Frontend: 실시간 자막 전송\n{text, speaker, timestamp}
+Gateway --> Frontend: 실시간 자막 전송\n{text, timestamp}
 deactivate Gateway

 note over Frontend, EventHub
@@ -166,12 +133,10 @@ note over Frontend, EventHub
 - DB 녹음 생성: ~100ms
 - Azure 인식기 초기화: ~500ms
 - Blob 경로 생성: ~200ms
- 화자 식별: ~300ms
 - 실시간 인식 지연: < 1초
- 총 초기화 시간: ~1.1초
+- 총 초기화 시간: ~0.8초

 정확도:
- 화자 식별 정확도: > 90%
 - 음성 인식 정확도: 60-95%
 end note