Merge branch 'main' of https://github.com/hwanny1128/HGZero

2026-07-29 09:29:10 +00:00 · 2025-10-24 15:02:24 +09:00
parent 734e182287 4f5b0ea776
commit ce2dfab9f9
50 changed files with 2568 additions and 1896 deletions
@@ -7,14 +7,12 @@ info:
    **핵심 기능:**
    - 음성 녹음 시작/중지 관리
    - 실시간 음성-텍스트 변환 (스트리밍)
-    - 배치 음성-텍스트 변환
-    - 화자 식별 및 관리
    - Azure Speech Service 통합

    **차별화 포인트:**
    - 기본 기능 (Hygiene Factor) - 경쟁사 대부분 제공
    - 실시간 스트리밍 처리로 즉각적인 자막 제공
-    - 화자 자동 식별 (90% 이상 정확도)
+    - **단순화**: 배치 처리 및 화자 식별 제거, 실시간 전용 기능
  version: 1.0.0
  contact:
    name: STT Service Team
@@ -25,7 +23,7 @@ servers:
    description: Production Server
  - url: https://dev-api.example.com/stt/v1
    description: Development Server
-  - url: http://localhost:8083/api/v1
+  - url: http://localhost:8084/api/v1
    description: Local Development Server

 tags:
@@ -33,8 +31,6 @@ tags:
    description: 음성 녹음 관리 API
  - name: Transcription
    description: 음성-텍스트 변환 API
-  - name: Speaker
-    description: 화자 식별 및 관리 API

 paths:
  /recordings/prepare:
@@ -50,7 +46,7 @@ paths:
        2. DB에 녹음 정보 생성
        3. Azure Speech 인식기 초기화
        4. Blob Storage 저장 경로 생성
-        5. RecordingStarted 이벤트 발행
+        5. RecordingStarted 이벤트 발행 (Kafka)
      operationId: prepareRecording
      x-user-story: UFR-STT-010
      x-controller: RecordingController
@@ -243,15 +239,14 @@ paths:
        **처리 흐름:**
        1. 음성 데이터 스트림 수신
        2. Azure Speech Service 실시간 인식
-        3. 화자 식별
-        4. 신뢰도 검증 (70% threshold)
-        5. DB에 세그먼트 저장
-        6. TranscriptSegmentReady 이벤트 발행
-        7. WebSocket으로 실시간 자막 전송
+        3. 신뢰도 검증 (70% threshold)
+        4. DB에 세그먼트 저장
+        5. TranscriptSegmentReady 이벤트 발행 (Kafka)
+        6. WebSocket으로 실시간 자막 전송

        **성능:**
        - 실시간 인식 지연: < 1초
-        - 처리 시간: 1-4초
+        - 처리 시간: 1-3초
      operationId: streamTranscription
      x-user-story: UFR-STT-020
      x-controller: TranscriptController
@@ -277,8 +272,6 @@ paths:
                transcriptId: "TRS-SEG-001"
                recordingId: "REC-20250123-001"
                text: "안녕하세요, 오늘 회의를 시작하겠습니다."
-                speakerId: "SPK-001"
-                speakerName: "김철수"
                timestamp: 1234567890
                duration: 3.5
                confidence: 0.92
@@ -290,94 +283,6 @@ paths:
      security:
        - BearerAuth: []

-  /transcripts/batch:
-    post:
-      tags:
-        - Transcription
-      summary: 배치 음성-텍스트 변환
-      description: |
-        전체 오디오 파일을 배치로 변환 (비동기 처리)
-
-        **처리 흐름:**
-        1. 전체 오디오 파일 업로드
-        2. Azure Batch Transcription Job 생성
-        3. 비동기 처리 시작
-        4. Job ID 반환 (202 Accepted)
-        5. 처리 완료 시 Callback으로 결과 수신
-
-        **처리 시간:**
-        - 파일 업로드: 1-2초
-        - Azure 배치 처리: 5-30초 (파일 크기 따라)
-        - 총 처리 시간: 7-33초
-      operationId: batchTranscription
-      x-user-story: UFR-STT-020
-      x-controller: TranscriptController
-      requestBody:
-        required: true
-        content:
-          multipart/form-data:
-            schema:
-              $ref: '#/components/schemas/BatchTranscriptionRequest'
-      responses:
-        '202':
-          description: 배치 작업 접수됨
-          content:
-            application/json:
-              schema:
-                $ref: '#/components/schemas/BatchTranscriptionResponse'
-              example:
-                jobId: "JOB-20250123-001"
-                recordingId: "REC-20250123-001"
-                status: "PROCESSING"
-                estimatedCompletionTime: "2025-01-23T10:31:00Z"
-                callbackUrl: "https://api.example.com/stt/v1/transcripts/callback"
-        '400':
-          $ref: '#/components/responses/BadRequest'
-        '500':
-          $ref: '#/components/responses/InternalServerError'
-      security:
-        - BearerAuth: []
-
-  /transcripts/callback:
-    post:
-      tags:
-        - Transcription
-      summary: 배치 변환 완료 콜백
-      description: |
-        Azure Speech Service로부터 배치 변환 완료 콜백 수신
-
-        **처리 흐름:**
-        1. 배치 결과 수신
-        2. 세그먼트별 DB 저장
-        3. 전체 텍스트 병합
-        4. TranscriptionCompleted 이벤트 발행
-      operationId: batchTranscriptionCallback
-      x-user-story: UFR-STT-020
-      x-controller: TranscriptController
-      requestBody:
-        required: true
-        content:
-          application/json:
-            schema:
-              $ref: '#/components/schemas/BatchCallbackRequest'
-      responses:
-        '200':
-          description: 콜백 처리 성공
-          content:
-            application/json:
-              schema:
-                $ref: '#/components/schemas/TranscriptionCompleteResponse'
-              example:
-                jobId: "JOB-20250123-001"
-                recordingId: "REC-20250123-001"
-                status: "COMPLETED"
-                segmentCount: 120
-                totalDuration: 1800
-                averageConfidence: 0.88
-        '400':
-          $ref: '#/components/responses/BadRequest'
-        '500':
-          $ref: '#/components/responses/InternalServerError'

  /transcripts/{recordingId}:
    get:
@@ -389,7 +294,7 @@ paths:

        **응답 데이터:**
        - 전체 텍스트
-        - 화자별 세그먼트 목록
+        - 세그먼트 목록
        - 타임스탬프 정보
        - 신뢰도 점수
      operationId: getTranscription
@@ -404,13 +309,6 @@ paths:
          schema:
            type: boolean
            default: false
-        - name: speakerId
-          in: query
-          description: 특정 화자의 발언만 필터링
-          required: false
-          schema:
-            type: string
-            example: "SPK-001"
      responses:
        '200':
          description: 변환 텍스트 조회 성공
@@ -420,16 +318,13 @@ paths:
                $ref: '#/components/schemas/TranscriptionResponse'
              example:
                recordingId: "REC-20250123-001"
-                fullText: "김철수: 안녕하세요...\n이영희: 네, 안녕하세요..."
+                fullText: "안녕하세요, 오늘 회의를 시작하겠습니다..."
                segmentCount: 120
                totalDuration: 1800
                averageConfidence: 0.88
-                speakerCount: 3
                segments:
                  - transcriptId: "TRS-SEG-001"
                    text: "안녕하세요, 오늘 회의를 시작하겠습니다."
-                    speakerId: "SPK-001"
-                    speakerName: "김철수"
                    timestamp: 0
                    duration: 3.5
                    confidence: 0.92
@@ -440,179 +335,6 @@ paths:
      security:
        - BearerAuth: []

-  /speakers/identify:
-    post:
-      tags:
-        - Speaker
-      summary: 화자 식별
-      description: |
-        음성 데이터로부터 화자 식별
-
-        **처리 흐름:**
-        1. Voice signature 생성
-        2. 기존 프로필과 매칭
-        3. 신규 화자 자동 등록
-        4. 화자 정보 반환
-
-        **정확도:**
-        - 화자 식별 정확도: > 90%
-        - 처리 시간: ~300ms
-      operationId: identifySpeaker
-      x-user-story: UFR-STT-010
-      x-controller: SpeakerController
-      requestBody:
-        required: true
-        content:
-          application/json:
-            schema:
-              $ref: '#/components/schemas/IdentifySpeakerRequest'
-            example:
-              recordingId: "REC-20250123-001"
-              audioFrame: "base64_encoded_audio_frame"
-              timestamp: 1234567890
-      responses:
-        '200':
-          description: 화자 식별 성공
-          content:
-            application/json:
-              schema:
-                $ref: '#/components/schemas/SpeakerIdentificationResponse'
-              example:
-                speakerId: "SPK-001"
-                speakerName: "김철수"
-                confidence: 0.95
-                isNewSpeaker: false
-                profileId: "PROFILE-12345"
-        '400':
-          $ref: '#/components/responses/BadRequest'
-        '500':
-          $ref: '#/components/responses/InternalServerError'
-      security:
-        - BearerAuth: []
-
-  /speakers/{speakerId}:
-    get:
-      tags:
-        - Speaker
-      summary: 화자 정보 조회
-      description: 특정 화자의 상세 정보 조회
-      operationId: getSpeaker
-      x-user-story: UFR-STT-010
-      x-controller: SpeakerController
-      parameters:
-        - name: speakerId
-          in: path
-          description: 화자 ID
-          required: true
-          schema:
-            type: string
-            example: "SPK-001"
-      responses:
-        '200':
-          description: 화자 정보 조회 성공
-          content:
-            application/json:
-              schema:
-                $ref: '#/components/schemas/SpeakerDetailResponse'
-              example:
-                speakerId: "SPK-001"
-                speakerName: "김철수"
-                profileId: "PROFILE-12345"
-                totalSegments: 45
-                totalDuration: 450
-                averageConfidence: 0.92
-                firstAppeared: "2025-01-23T10:30:15Z"
-                lastAppeared: "2025-01-23T11:00:00Z"
-        '404':
-          $ref: '#/components/responses/NotFound'
-        '500':
-          $ref: '#/components/responses/InternalServerError'
-      security:
-        - BearerAuth: []
-
-    put:
-      tags:
-        - Speaker
-      summary: 화자 정보 업데이트
-      description: 화자 이름 등 정보 수정
-      operationId: updateSpeaker
-      x-user-story: UFR-STT-010
-      x-controller: SpeakerController
-      parameters:
-        - name: speakerId
-          in: path
-          description: 화자 ID
-          required: true
-          schema:
-            type: string
-            example: "SPK-001"
-      requestBody:
-        required: true
-        content:
-          application/json:
-            schema:
-              $ref: '#/components/schemas/UpdateSpeakerRequest'
-            example:
-              speakerName: "김철수 팀장"
-              userId: "USER-123"
-      responses:
-        '200':
-          description: 화자 정보 업데이트 성공
-          content:
-            application/json:
-              schema:
-                $ref: '#/components/schemas/SpeakerDetailResponse'
-        '404':
-          $ref: '#/components/responses/NotFound'
-        '500':
-          $ref: '#/components/responses/InternalServerError'
-      security:
-        - BearerAuth: []
-
-  /recordings/{recordingId}/speakers:
-    get:
-      tags:
-        - Speaker
-      summary: 녹음의 화자 목록 조회
-      description: 특정 녹음에 참여한 모든 화자 목록 조회
-      operationId: getRecordingSpeakers
-      x-user-story: UFR-STT-010
-      x-controller: SpeakerController
-      parameters:
-        - $ref: '#/components/parameters/RecordingIdParam'
-      responses:
-        '200':
-          description: 화자 목록 조회 성공
-          content:
-            application/json:
-              schema:
-                $ref: '#/components/schemas/SpeakerListResponse'
-              example:
-                recordingId: "REC-20250123-001"
-                speakerCount: 3
-                speakers:
-                  - speakerId: "SPK-001"
-                    speakerName: "김철수"
-                    segmentCount: 45
-                    totalDuration: 450
-                    speakingRatio: 0.45
-                  - speakerId: "SPK-002"
-                    speakerName: "이영희"
-                    segmentCount: 38
-                    totalDuration: 380
-                    speakingRatio: 0.38
-                  - speakerId: "SPK-003"
-                    speakerName: "박민수"
-                    segmentCount: 17
-                    totalDuration: 170
-                    speakingRatio: 0.17
-        '404':
-          $ref: '#/components/responses/NotFound'
-        '500':
-          $ref: '#/components/responses/InternalServerError'
-      security:
-        - BearerAuth: []
-
 components:
  securitySchemes:
    BearerAuth:
@@ -657,7 +379,7 @@ components:
          example: "ko-KR"
        attendeeCount:
          type: integer
-          description: 참석자 수 (화자 식별 최적화용)
+          description: 참석자 수
          minimum: 1
          maximum: 50
          example: 5
@@ -809,10 +531,6 @@ components:
          type: integer
          description: 녹음 시간 (초)
          example: 300
-        speakerCount:
-          type: integer
-          description: 화자 수
-          example: 3
        segmentCount:
          type: integer
          description: 세그먼트 수
@@ -866,14 +584,6 @@ components:
          type: string
          description: 변환된 텍스트
          example: "안녕하세요, 오늘 회의를 시작하겠습니다."
-        speakerId:
-          type: string
-          description: 화자 ID
-          example: "SPK-001"
-        speakerName:
-          type: string
-          description: 화자 이름
-          example: "김철수"
        timestamp:
          type: integer
          description: 타임스탬프 (ms)
@@ -895,89 +605,6 @@ components:
          description: 낮은 신뢰도 경고 플래그 (< 60%)
          example: false

-    BatchTranscriptionRequest:
-      type: object
-      required:
-        - recordingId
-        - audioFile
-      properties:
-        recordingId:
-          type: string
-          description: 녹음 ID
-          example: "REC-20250123-001"
-        audioFile:
-          type: string
-          format: binary
-          description: 오디오 파일 (WAV, MP3 등)
-        language:
-          type: string
-          description: 음성 인식 언어
-          default: "ko-KR"
-          example: "ko-KR"
-        callbackUrl:
-          type: string
-          format: uri
-          description: 처리 완료 콜백 URL
-          example: "https://api.example.com/stt/v1/transcripts/callback"
-
-    BatchTranscriptionResponse:
-      type: object
-      properties:
-        jobId:
-          type: string
-          description: 배치 작업 ID
-          example: "JOB-20250123-001"
-        recordingId:
-          type: string
-          description: 녹음 ID
-          example: "REC-20250123-001"
-        status:
-          type: string
-          description: 작업 상태
-          enum:
-            - QUEUED
-            - PROCESSING
-            - COMPLETED
-            - FAILED
-          example: "PROCESSING"
-        estimatedCompletionTime:
-          type: string
-          format: date-time
-          description: 예상 완료 시간
-          example: "2025-01-23T10:31:00Z"
-        callbackUrl:
-          type: string
-          format: uri
-          description: 콜백 URL
-          example: "https://api.example.com/stt/v1/transcripts/callback"
-
-    BatchCallbackRequest:
-      type: object
-      required:
-        - jobId
-        - status
-        - segments
-      properties:
-        jobId:
-          type: string
-          description: 배치 작업 ID
-          example: "JOB-20250123-001"
-        status:
-          type: string
-          description: 작업 상태
-          enum:
-            - COMPLETED
-            - FAILED
-          example: "COMPLETED"
-        segments:
-          type: array
-          description: 변환 세그먼트 목록
-          items:
-            $ref: '#/components/schemas/TranscriptionSegment'
-        error:
-          type: string
-          description: 오류 메시지 (실패 시)
-          example: "Audio file format not supported"

    TranscriptionSegment:
      type: object
@@ -986,10 +613,6 @@ components:
          type: string
          description: 변환된 텍스트
          example: "안녕하세요, 오늘 회의를 시작하겠습니다."
-        speakerId:
-          type: string
-          description: 화자 ID
-          example: "SPK-001"
        timestamp:
          type: integer
          description: 시작 타임스탬프 (ms)
@@ -1061,10 +684,6 @@ components:
          format: float
          description: 평균 신뢰도 점수
          example: 0.88
-        speakerCount:
-          type: integer
-          description: 화자 수
-          example: 3
        segments:
          type: array
          description: 세그먼트 목록
@@ -1082,14 +701,6 @@ components:
          type: string
          description: 변환된 텍스트
          example: "안녕하세요, 오늘 회의를 시작하겠습니다."
-        speakerId:
-          type: string
-          description: 화자 ID
-          example: "SPK-001"
-        speakerName:
-          type: string
-          description: 화자 이름
-          example: "김철수"
        timestamp:
          type: integer
          description: 타임스탬프 (ms)
@@ -1105,151 +716,6 @@ components:
          description: 신뢰도 점수
          example: 0.92

-    IdentifySpeakerRequest:
-      type: object
-      required:
-        - recordingId
-        - audioFrame
-        - timestamp
-      properties:
-        recordingId:
-          type: string
-          description: 녹음 ID
-          example: "REC-20250123-001"
-        audioFrame:
-          type: string
-          format: byte
-          description: Base64 인코딩된 오디오 프레임
-          example: "UklGRiQAAABXQVZFZm10IBAAAAABA..."
-        timestamp:
-          type: integer
-          description: 타임스탬프 (ms)
-          example: 1234567890
-
-    SpeakerIdentificationResponse:
-      type: object
-      properties:
-        speakerId:
-          type: string
-          description: 화자 ID
-          example: "SPK-001"
-        speakerName:
-          type: string
-          description: 화자 이름
-          example: "김철수"
-        confidence:
-          type: number
-          format: float
-          description: 식별 신뢰도 (0-1)
-          minimum: 0
-          maximum: 1
-          example: 0.95
-        isNewSpeaker:
-          type: boolean
-          description: 신규 화자 여부
-          example: false
-        profileId:
-          type: string
-          description: Azure Speaker Profile ID
-          example: "PROFILE-12345"
-
-    SpeakerDetailResponse:
-      type: object
-      properties:
-        speakerId:
-          type: string
-          description: 화자 ID
-          example: "SPK-001"
-        speakerName:
-          type: string
-          description: 화자 이름
-          example: "김철수"
-        profileId:
-          type: string
-          description: Azure Speaker Profile ID
-          example: "PROFILE-12345"
-        userId:
-          type: string
-          description: 연결된 사용자 ID
-          example: "USER-123"
-        totalSegments:
-          type: integer
-          description: 총 발언 세그먼트 수
-          example: 45
-        totalDuration:
-          type: integer
-          description: 총 발언 시간 (초)
-          example: 450
-        averageConfidence:
-          type: number
-          format: float
-          description: 평균 식별 신뢰도
-          example: 0.92
-        firstAppeared:
-          type: string
-          format: date-time
-          description: 최초 등장 시간
-          example: "2025-01-23T10:30:15Z"
-        lastAppeared:
-          type: string
-          format: date-time
-          description: 최근 등장 시간
-          example: "2025-01-23T11:00:00Z"
-
-    UpdateSpeakerRequest:
-      type: object
-      properties:
-        speakerName:
-          type: string
-          description: 화자 이름
-          example: "김철수 팀장"
-        userId:
-          type: string
-          description: 연결할 사용자 ID
-          example: "USER-123"
-
-    SpeakerListResponse:
-      type: object
-      properties:
-        recordingId:
-          type: string
-          description: 녹음 ID
-          example: "REC-20250123-001"
-        speakerCount:
-          type: integer
-          description: 화자 수
-          example: 3
-        speakers:
-          type: array
-          description: 화자 목록
-          items:
-            $ref: '#/components/schemas/SpeakerSummary'
-
-    SpeakerSummary:
-      type: object
-      properties:
-        speakerId:
-          type: string
-          description: 화자 ID
-          example: "SPK-001"
-        speakerName:
-          type: string
-          description: 화자 이름
-          example: "김철수"
-        segmentCount:
-          type: integer
-          description: 발언 세그먼트 수
-          example: 45
-        totalDuration:
-          type: integer
-          description: 총 발언 시간 (초)
-          example: 450
-        speakingRatio:
-          type: number
-          format: float
-          description: 발언 비율 (0-1)
-          example: 0.45
-
    ErrorResponse:
      type: object
      properties:
@@ -1,14 +1,13 @@
@startuml
 !theme mono

-title STT Service - 음성 녹음 시작 및 화자 인식 (통합)
+title STT Service - 음성 녹음 시작 및 실시간 인식

 participant "Frontend<<E>>" as Frontend
 participant "API Gateway<<E>>" as Gateway
 participant "RecordingController" as Controller
 participant "RecordingService" as Service
 participant "AudioStreamManager" as StreamManager
-participant "SpeakerIdentifier" as Speaker
 participant "RecordingRepository" as Repository
 participant "AzureSpeechClient" as AzureClient
 database "STT DB" as DB
@@ -51,7 +50,6 @@ note right
  - 언어: ko-KR
  - Format: PCM 16kHz
  - 샘플레이트: 16kHz
-  - 화자 식별 활성화
  - 실시간 스트리밍 모드
  - Continuous recognition
 end note
@@ -108,32 +106,12 @@ deactivate AzureClient
 StreamManager --> Service: recognized text
 deactivate StreamManager

-== 화자 식별 ==
-
-Service -> Speaker: identifySpeaker(audioFrame)
-activate Speaker
-
-Speaker -> AzureClient: analyzeSpeakerProfile()\n(Speaker Recognition API)
-activate AzureClient
-note right
-  화자 식별:
-  - Voice signature 생성
-  - 기존 프로필과 매칭
-  - 신규 화자 자동 등록
-end note
-
-AzureClient --> Speaker: speakerId
-deactivate AzureClient
-
-Speaker --> Service: speaker info
-deactivate Speaker
-
-== 화자별 세그먼트 저장 ==
+== 세그먼트 저장 ==

 Service -> Repository: saveSttSegment(segment)
 activate Repository

-Repository -> DB: STT 세그먼트 저장\n(세션ID, 텍스트, 화자ID, 타임스탬프, 신뢰도)
+Repository -> DB: STT 세그먼트 저장\n(세션ID, 텍스트, 타임스탬프, 신뢰도)
 activate DB
 DB --> Repository: segment saved
 deactivate DB
@@ -141,24 +119,13 @@ deactivate DB
 Repository --> Service: saved
 deactivate Repository

-Service -> Repository: updateSpeakerInfo(recordingId, speakerId)
-activate Repository
-
-Repository -> DB: 화자 정보 저장/업데이트\n(녹음ID, 화자ID, 세그먼트수)
-activate DB
-DB --> Repository: 업데이트 완료
-deactivate DB
-
-Repository --> Service: 완료
-deactivate Repository
-
-Service --> Controller: streaming response\n{text, speaker, timestamp, confidence}
+Service --> Controller: streaming response\n{text, timestamp, confidence}
 deactivate Service

 Controller --> Gateway: WebSocket message
 deactivate Controller

-Gateway --> Frontend: 실시간 자막 전송\n{text, speaker, timestamp}
+Gateway --> Frontend: 실시간 자막 전송\n{text, timestamp}
 deactivate Gateway

 note over Frontend, EventHub
@@ -166,12 +133,10 @@ note over Frontend, EventHub
 - DB 녹음 생성: ~100ms
 - Azure 인식기 초기화: ~500ms
 - Blob 경로 생성: ~200ms
- 화자 식별: ~300ms
 - 실시간 인식 지연: < 1초
- 총 초기화 시간: ~1.1초
+- 총 초기화 시간: ~0.8초

 정확도:
- 화자 식별 정확도: > 90%
 - 음성 인식 정확도: 60-95%
 end note

@@ -1,7 +1,7 @@
@startuml
 !theme mono

-title STT Service - 음성-텍스트 변환 (실시간/배치 통합)
+title STT Service - 음성-텍스트 변환 (실시간 전용)

 participant "Frontend<<E>>" as Frontend
 participant "API Gateway<<E>>" as Gateway
@@ -15,7 +15,7 @@ database "STT DB" as DB
 database "Azure Blob Storage<<E>>" as BlobStorage
 queue "Azure Event Hubs<<E>>" as EventHub

-== 음성 데이터 스트리밍 수신 (실시간 모드) ==
+== 음성 데이터 스트리밍 수신 ==

 Frontend -> Gateway: POST /api/transcripts/stream\n(audioData, recordingId, timestamp)
 activate Gateway
@@ -26,9 +26,8 @@ activate Controller
 Controller -> Service: processAudioStream(audioData, recordingId)
 activate Service

-alt 실시간 변환 모드
-    Service -> Engine: streamingTranscribe(audioData)
-    activate Engine
+Service -> Engine: streamingTranscribe(audioData)
+activate Engine

    Engine -> AzureClient: recognizeAsync(audioData)
    activate AzureClient
@@ -38,7 +37,6 @@ alt 실시간 변환 모드
      Azure Speech 설정:
      - Mode: Continuous
      - 언어: ko-KR
-      - 화자 식별 활성화
      - 타임스탬프 자동 기록
      - 신뢰도 점수 계산
      - Profanity filter
@@ -49,7 +47,7 @@ alt 실시간 변환 모드
    BlobStorage --> AzureClient: 저장 완료
    deactivate BlobStorage

-    AzureClient --> Engine: RecognitionResult\n(text, speakerId, confidence, timestamp, duration)
+    AzureClient --> Engine: RecognitionResult\n(text, confidence, timestamp, duration)
    deactivate AzureClient

    == 정확도 검증 및 처리 ==
@@ -71,7 +69,7 @@ alt 실시간 변환 모드
    Service -> TranscriptRepo: createTranscript(recordingId, segment)
    activate TranscriptRepo

-    TranscriptRepo -> DB: 변환 결과 저장\n(텍스트ID, 녹음ID, 화자ID, 텍스트, 신뢰도, 타임스탬프, 경고플래그)
+    TranscriptRepo -> DB: 변환 결과 저장\n(텍스트ID, 녹음ID, 텍스트, 신뢰도, 타임스탬프, 경고플래그)
    activate DB
    DB --> TranscriptRepo: transcriptId 반환
    deactivate DB
@@ -79,19 +77,6 @@ alt 실시간 변환 모드
    TranscriptRepo --> Service: TranscriptEntity 반환
    deactivate TranscriptRepo

-    == 화자 정보 업데이트 ==
-
-    Service -> RecordingRepo: updateSpeakerInfo(recordingId, speakerId)
-    activate RecordingRepo
-
-    RecordingRepo -> DB: 화자 정보 저장/업데이트\n(녹음ID, 화자ID, 세그먼트수)
-    activate DB
-    DB --> RecordingRepo: 업데이트 완료
-    deactivate DB
-
-    RecordingRepo --> Service: 완료
-    deactivate RecordingRepo
-
    == 이벤트 발행 ==

    Service -> EventHub: TranscriptSegmentReady 이벤트 발행
@@ -102,7 +87,6 @@ alt 실시간 변환 모드
      - recordingId
      - meetingId
      - text
-      - speakerId
      - timestamp
      - confidence
    end note
@@ -112,128 +96,18 @@ alt 실시간 변환 모드
    Service --> Controller: TranscriptResponse\n(transcriptId, text, confidence, warningFlag)
    deactivate Service

-    Controller --> Gateway: 200 OK\n(transcriptId, text, speakerId, timestamp, confidence)
+    Controller --> Gateway: 200 OK\n(transcriptId, text, timestamp, confidence)
    deactivate Controller

    Gateway --> Frontend: 실시간 자막 응답
    deactivate Gateway

-else 배치 변환 모드
-    Gateway -> Controller: POST /api/v1/stt/transcribe\n{sessionId, audioFile}
-    activate Controller
-
-    Controller -> Service: transcribeAudio(sessionId, audioFile)
-    activate Service
-
-    Service -> RecordingRepo: findSessionById(sessionId)
-    activate RecordingRepo
-    RecordingRepo -> DB: STT 세션 조회\n(세션ID 기준)
-    DB --> RecordingRepo: session data
-    RecordingRepo --> Service: RecordingEntity
-    deactivate RecordingRepo
-
-    Service -> Engine: batchTranscribe(audioFile)
-    activate Engine
-
-    Engine -> AzureClient: batchTranscriptionAsync(audioUrl)
-    activate AzureClient
-    note right
-      배치 처리:
-      - 전체 파일 업로드
-      - 백그라운드 처리
-      - Callback URL 제공
-      - 화자별 그룹화
-      - 문장 경계 보정
-    end note
-
-    AzureClient --> Engine: transcription job ID
-    deactivate AzureClient
-
-    Engine --> Service: job submitted
-    deactivate Engine
-
-    Service -> RecordingRepo: updateSessionStatus(sessionId, "PROCESSING")
-    activate RecordingRepo
-    RecordingRepo -> DB: 세션 상태 업데이트\n(상태='처리중')
-    DB --> RecordingRepo: updated
-    RecordingRepo --> Service: updated
-    deactivate RecordingRepo
-
-    Service --> Controller: 202 Accepted\n{jobId, status}
-    deactivate Service
-
-    Controller --> Gateway: 202 Accepted
-    deactivate Controller
-
-    == 배치 처리 완료 (Callback) ==
-
-    AzureClient -> Controller: POST /api/v1/stt/callback\n{jobId, segments}
-    activate Controller
-
-    Controller -> Service: processBatchResult(jobId, segments)
-    activate Service
-
-    loop 각 세그먼트 처리
-        Service -> TranscriptRepo: createTranscript(recordingId, segment)
-        activate TranscriptRepo
-        TranscriptRepo -> DB: 변환 결과 저장
-        DB --> TranscriptRepo: saved
-        TranscriptRepo --> Service: saved
-        deactivate TranscriptRepo
-    end
-
-    == 전체 텍스트 통합 ==
-
-    Service -> TranscriptRepo: aggregateTranscription(sessionId)
-    activate TranscriptRepo
-    TranscriptRepo -> DB: 세그먼트 목록 조회\n(세션ID 기준, 타임스탬프 순 정렬)
-    DB --> TranscriptRepo: ordered segments
-    TranscriptRepo --> Service: segments
-    deactivate TranscriptRepo
-
-    Service -> Service: mergeSegments(segments)
-    note right
-      세그먼트 병합:
-      - 화자별 그룹화
-      - 시간 순서 정렬
-      - 문장 경계 보정
-    end note
-
-    Service -> RecordingRepo: saveTranscription(fullText)
-    activate RecordingRepo
-    RecordingRepo -> DB: 전체 텍스트 저장 및 상태 업데이트\n(전체텍스트, 상태='완료')
-    DB --> RecordingRepo: saved
-    RecordingRepo --> Service: updated session
-    deactivate RecordingRepo
-
-    Service -> EventHub: TranscriptionCompletedEvent 발행
-    note right
-      Event:
-      - sessionId
-      - meetingId
-      - fullText
-      - completedAt
-    end note
-
-    Service --> Controller: TranscriptionResponse\n{sessionId, text, segments}
-    deactivate Service
-
-    Controller --> Gateway: 200 OK\n{transcription, metadata}
-    deactivate Controller
-end
-
 note over Frontend, EventHub
-**실시간 모드 처리 시간:**
+**처리 시간:**
 - Azure STT 처리: 1-3초
 - DB 저장: ~100ms
 - Event 발행: ~50ms
- 총 처리 시간: 1-4초
-
-**배치 모드 처리 시간:**
- 파일 업로드: ~1-2초
- Azure 배치 처리: 5-30초 (파일 크기에 따라)
- DB 저장: ~500ms
- 총 처리 시간: 7-33초
+- 총 처리 시간: 1-3초

 **정확도 경고 기준:**
 - < 60%: 수동 수정 권장 (경고 플래그)
@@ -49,7 +49,7 @@
   - 실시간 협업: WebSocket 기반 실시간 동기화, 버전 관리, 충돌 해결
   - 템플릿 관리: 회의록 템플릿 관리
   - 통계 생성: 회의 및 Todo 통계
-3. **STT** - 음성 녹음 관리, 음성-텍스트 변환, 화자 식별 (기본 기능)
+3. **STT** - 음성 스트리밍 처리, 실시간 음성-텍스트 변환 (기본 기능)
 4. **AI** - AI 기반 회의록 자동화, Todo 추출, 지능형 검색 (RAG 통합)
   - LLM 기반 회의록 자동 작성
   - Todo 자동 추출 및 담당자 식별
@@ -477,30 +477,29 @@ UFR-MEET-055: [회의록수정] 회의 참석자로서 | 나는, 검증이 완
 3. STT 서비스 (음성 인식 및 변환 - 기본 기능)
 1) 음성 인식 및 변환
 UFR-STT-010: [음성녹음인식] 회의 참석자로서 | 나는, 발언 내용이 자동으로 기록되기 위해 | 음성이 실시간으로 녹음되고 인식되기를 원한다.
- 시나리오: 음성 녹음 및 발언 인식
-  회의가 시작된 상황에서 | 참석자가 발언을 시작하면 | 음성이 자동으로 녹음되고 화자가 식별되며 발언이 인식된다.
+- 시나리오: 음성 실시간 인식
+  회의가 시작된 상황에서 | 참석자가 발언을 시작하면 | 음성이 실시간으로 텍스트로 변환된다.

-  [음성 녹음 처리]
+  [음성 스트리밍 처리]
  - 오디오 스트림 실시간 캡처
  - 회의 ID와 연결
-  - 음성 데이터 저장 (Azure 스토리지)
+  - **음성 파일은 저장하지 않음** (실시간 스트리밍만 처리)

-  [발언 인식 처리]
+  [음성 인식 처리]
  - AI 음성인식 엔진 연동 (Azure Speech 등)
-  - 화자 자동 식별
-    - 참석자 목록 매칭
-    - 음성 특징 분석
+  - 실시간 텍스트 변환
  - 타임스탬프 기록
-  - 발언 구간 구분

  [처리 결과]
-  - 음성 녹음이 시작됨 (녹음 ID)
-  - 발언이 인식됨 (발언 ID, 화자, 타임스탬프)
+  - 음성 스트리밍이 시작됨 (세션 ID)
+  - 텍스트가 변환됨 (세그먼트 ID, 텍스트, 타임스탬프)
  - 실시간으로 텍스트 변환 요청 (UFR-STT-020 연동)
+  - **음성 파일은 저장되지 않고 스트리밍만 처리됨**
+  - **화자 식별 기능 없음** (단순 텍스트 변환만)

  [성능 요구사항]
-  - 발언 인식 지연 시간: 1초 이내
-  - 화자 식별 정확도: 90% 이상
+  - 음성 인식 지연 시간: 1초 이내
+  - 변환 정확도: 85% 이상

  [비고]
  - STT는 기본 기능으로 경쟁사 대부분이 제공하는 기능임