# STT (Speech-to-Text) ๊ตฌํ˜„๋ฐฉ์•ˆ ## ๐Ÿ“‹ ๋ฌธ์„œ ์ •๋ณด - **์ž‘์„ฑ์ผ**: 2025-10-21 - **์ตœ์ข… ์ˆ˜์ •์ผ**: 2025-10-21 - **์ž‘์„ฑ์ž**: ํšŒ์˜๋ก ์„œ๋น„์Šค ๊ฐœ๋ฐœํŒ€ - **๋ฒ„์ „**: 2.0 - **๊ฒ€ํ† ์ž**: ๋ฐ•์„œ์—ฐ(AI), ์ด์ค€ํ˜ธ(Backend), ์ด๋™์šฑ(Backend), ์ตœ์œ ์ง„(Frontend), ํ™๊ธธ๋™(Architect), ์ •๋„ํ˜„(QA) - **STT ์—”์ง„**: Azure Speech Services (์‹ค์‹œ๊ฐ„ ์ŠคํŠธ๋ฆฌ๋ฐ + ํ™”์ž ์‹๋ณ„) --- ## 1. ๊ฐœ์š” ### 1.1 ๋ชฉ์  ํšŒ์˜ ์ฐธ์„์ž์˜ ๋ฐœ์–ธ์„ ์‹ค์‹œ๊ฐ„์œผ๋กœ ์Œ์„ฑ ์ธ์‹ํ•˜์—ฌ ํ…์ŠคํŠธ๋กœ ๋ณ€ํ™˜ํ•˜๊ณ , AI ๊ธฐ๋ฐ˜ ํšŒ์˜๋ก ์ž๋™ ์ž‘์„ฑ์˜ ๊ธฐ๋ฐ˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ### 1.2 ํ•ต์‹ฌ ์š”๊ตฌ์‚ฌํ•ญ - **์‹ค์‹œ๊ฐ„์„ฑ**: ๋ฐœ์–ธ ํ›„ 1์ดˆ ์ด๋‚ด ํ™”๋ฉด ํ‘œ์‹œ (Azure ์‹ค์‹œ๊ฐ„ ์ŠคํŠธ๋ฆฌ๋ฐ) - **์ •ํ™•๋„**: STT confidence score 90% ์ด์ƒ - **ํ™”์ž ์‹๋ณ„**: ์ฐธ์„์ž๋ณ„ ๋ฐœ์–ธ ์ž๋™ ๊ตฌ๋ถ„ (Azure Speaker Diarization) - **์•ˆ์ •์„ฑ**: ๋„คํŠธ์›Œํฌ ์žฅ์•  ์‹œ์—๋„ ๋…น์Œ ๋ฐ์ดํ„ฐ ๋ณด์กด ### 1.3 Azure Speech Services ์„ ์ • ์ด์œ  - โœ… **์‹ค์‹œ๊ฐ„ ์ŠคํŠธ๋ฆฌ๋ฐ**: 1์ดˆ ์ด๋‚ด ์ง€์—ฐ ์‹œ๊ฐ„์œผ๋กœ ์š”๊ตฌ์‚ฌํ•ญ ์ถฉ์กฑ - โœ… **ํ™”์ž ์‹๋ณ„ ๊ธฐ๋ณธ ์ œ๊ณต**: Speaker Diarization ๋‚ด์žฅ (๋ณ„๋„ ๊ตฌํ˜„ ๋ถˆํ•„์š”) - โœ… **ํ•œ๊ตญ์–ด ์ตœ์ ํ™”**: Microsoft์˜ ํ•œ๊ตญ์–ด ํŠนํ™” ๋ชจ๋ธ๋กœ ๋†’์€ ์ •ํ™•๋„ - โœ… **์—”ํ„ฐํ”„๋ผ์ด์ฆˆ ์•ˆ์ •์„ฑ**: 99.9% SLA ๋ณด์žฅ - โœ… **Azure ์ƒํƒœ๊ณ„ ํ†ตํ•ฉ**: ํ–ฅํ›„ Azure ๊ธฐ๋ฐ˜ ์ธํ”„๋ผ ํ™•์žฅ ์šฉ์ด ### 1.4 ์ฐจ๋ณ„ํ™” ์ „๋žต STT ์ž์ฒด๋Š” ๊ธฐ๋ณธ ๊ธฐ๋Šฅ(Hygiene Factor)์ด๋‚˜, ๋‹ค์Œ ์ฐจ๋ณ„ํ™” ์š”์†Œ์™€ ์—ฐ๊ณ„๋ฉ๋‹ˆ๋‹ค: - ๋งฅ๋ฝ ๊ธฐ๋ฐ˜ ์šฉ์–ด ์„ค๋ช… (RAG) - AI ํšŒ์˜๋ก ์ž๋™ ์ž‘์„ฑ - Todo ์ž๋™ ์ถ”์ถœ --- ## 2. ์•„ํ‚คํ…์ฒ˜ ์„ค๊ณ„ ### 2.1 ์ „์ฒด ๊ตฌ์กฐ ``` โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ Client โ”‚โ”€โ”€โ”€โ”€โ”€โ–ถโ”‚ STT Gateway โ”‚โ”€โ”€โ”€โ”€โ”€โ–ถโ”‚ Azure Speech โ”‚ โ”‚ (Browser) โ”‚ โ”‚ Service โ”‚ โ”‚ Services โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ (์‹ค์‹œ๊ฐ„ ์ŠคํŠธ๋ฆฌ๋ฐ)โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ โ”‚ โ”‚ Speaker โ”‚ โ”‚ โ”‚ โ”‚ Diarization โ”‚ โ”‚ โ”‚ โ”‚ (ํ™”์ž ์‹๋ณ„) โ”‚ โ”‚ โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ”‚ โ”‚ โ–ผ โ”‚ โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ” โ”‚ WebSocket โ”‚โ—€โ”€โ”€โ”€โ”€โ”€โ”‚ RabbitMQ โ”‚โ—€โ”€โ”€โ”€โ”€โ”€โ”‚ Claude API โ”‚ โ”‚ Server โ”‚ โ”‚ Queue โ”‚ โ”‚ (ํ›„์ฒ˜๋ฆฌ) โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ”‚ โ”‚ โ–ผ โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ถโ”‚ Redis โ”‚ โ”‚ Cache โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ ``` ### 2.2 ๊ณ„์ธต๋ณ„ ์—ญํ•  #### **Client Layer (Frontend)** - **MediaRecorder API**: ๋ธŒ๋ผ์šฐ์ €์—์„œ ์‹ค์‹œ๊ฐ„ ์Œ์„ฑ ์บก์ฒ˜ - **WebSocket Client**: ์‹ค์‹œ๊ฐ„ ํ…์ŠคํŠธ ์ˆ˜์‹  ๋ฐ ํ™”๋ฉด ๋™๊ธฐํ™” - **๋กœ์ปฌ ์ €์žฅ**: ๋„คํŠธ์›Œํฌ ์žฅ์•  ์‹œ ์Œ์„ฑ ๋ฐ์ดํ„ฐ ์ž„์‹œ ์ €์žฅ #### **STT Gateway Service** - **์˜ค๋””์˜ค ์ŠคํŠธ๋ฆผ ์ˆ˜์‹ **: ํด๋ผ์ด์–ธํŠธ๋กœ๋ถ€ํ„ฐ ์‹ค์‹œ๊ฐ„ ์Œ์„ฑ ์ŠคํŠธ๋ฆผ ์ˆ˜์‹  - **Azure Speech ์—ฐ๋™**: Azure Speech Services ์‹ค์‹œ๊ฐ„ ์ŠคํŠธ๋ฆฌ๋ฐ API ํ˜ธ์ถœ - **ํ™”์ž ์‹๋ณ„ ์ฒ˜๋ฆฌ**: Azure Speaker Diarization ๊ฒฐ๊ณผ ์ˆ˜์‹  ๋ฐ ์ฐธ์„์ž ๋งค์นญ - **์ด๋ฒคํŠธ ๋ฐœํ–‰**: RabbitMQ์— `TextTranscribed` ์ด๋ฒคํŠธ ๋ฐœํ–‰ #### **Azure Speech Services** - **์‹ค์‹œ๊ฐ„ ์ŠคํŠธ๋ฆฌ๋ฐ STT**: ์Œ์„ฑ์„ ์‹ค์‹œ๊ฐ„์œผ๋กœ ํ…์ŠคํŠธ ๋ณ€ํ™˜ (< 1์ดˆ ์ง€์—ฐ) - **Speaker Diarization**: ํ™”์ž๋ณ„ ๋ฐœ์–ธ ์ž๋™ ๊ตฌ๋ถ„ - **์–ธ์–ด ๋ชจ๋ธ**: ํ•œ๊ตญ์–ด ํŠนํ™” ์ตœ์ ํ™” ๋ชจ๋ธ - **์‹ ๋ขฐ๋„ ์ ์ˆ˜**: ๊ฐ ๋ฐœ์–ธ์— ๋Œ€ํ•œ confidence score ์ œ๊ณต #### **Message Queue (RabbitMQ)** - **๋น„๋™๊ธฐ ์ฒ˜๋ฆฌ**: STT ๊ฒฐ๊ณผ๋ฅผ ๋น„๋™๊ธฐ๋กœ ํ›„์† ์„œ๋น„์Šค์— ์ „๋‹ฌ - **์ด๋ฒคํŠธ ๋ผ์šฐํŒ…**: `TextTranscribed` โ†’ AI Service, Meeting Service - **์žฌ์‹œ๋„ ๋กœ์ง**: ์‹คํŒจ ์‹œ ์ž๋™ ์žฌ์ฒ˜๋ฆฌ (์ตœ๋Œ€ 3ํšŒ) #### **AI Service (Claude API)** - **ํ…์ŠคํŠธ ํ›„์ฒ˜๋ฆฌ**: ๊ตฌ์–ด์ฒด โ†’ ๋ฌธ์–ด์ฒด ๋ณ€ํ™˜, ๋ฌธ๋ฒ• ๊ต์ • - **ํšŒ์˜๋ก ๊ตฌ์กฐํ™”**: ํ…œํ”Œ๋ฆฟ์— ๋งž์ถฐ ๋‚ด์šฉ ์ •๋ฆฌ - **Todo ์ถ”์ถœ**: ์•ก์…˜ ์•„์ดํ…œ ์ž๋™ ์‹๋ณ„ #### **Cache Layer (Redis)** - **์‹ค์‹œ๊ฐ„ ๋ฐœ์–ธ ์บ์‹ฑ**: `meeting:{meeting_id}:live_text` - **์„น์…˜๋ณ„ ๋‚ด์šฉ ์บ์‹ฑ**: `meeting:{meeting_id}:sections:{section_id}` - **ํ™”์ž ์ •๋ณด ์บ์‹ฑ**: `meeting:{meeting_id}:speakers` #### **WebSocket Server** - **์‹ค์‹œ๊ฐ„ ๋™๊ธฐํ™”**: ๋ชจ๋“  ์ฐธ์„์ž์—๊ฒŒ ํ…์ŠคํŠธ ๋ณ€ํ™˜ ๊ฒฐ๊ณผ ์ฆ‰์‹œ ์ „์†ก - **Delta ์ „์†ก**: ๋ณ€๊ฒฝ๋œ ๋ถ€๋ถ„๋งŒ ์ „์†กํ•˜์—ฌ ๋Œ€์—ญํญ ์ตœ์ ํ™” --- ## 3. ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ ์„ค๊ณ„ ### 3.1 Azure Speech ์ŠคํŠธ๋ฆฌ๋ฐ ์—ฐ๊ฒฐ ์„ค์ • ```json { "session_id": "SESSION_001", "meeting_id": "MTG_001", "config": { "language": "ko-KR", "sample_rate": 16000, "format": "audio/wav", "enable_diarization": true, "max_speakers": 10, "profanity_filter": "masked", "enable_dictation": true }, "participants": [ { "user_id": "USR_001", "name": "๊น€์ฒ ์ˆ˜", "voice_signature": null }, { "user_id": "USR_002", "name": "์ด์˜ํฌ", "voice_signature": null } ] } ``` ### 3.2 ์‹ค์‹œ๊ฐ„ ์˜ค๋””์˜ค ์ŠคํŠธ๋ฆผ ์ „์†ก (WebSocket) ```json { "type": "audio_chunk", "session_id": "SESSION_001", "audio_data": "base64_encoded_audio", "timestamp": "2025-10-21T14:30:15.000Z", "sequence": 42 } ``` ### 3.3 Azure Speech ์‹ค์‹œ๊ฐ„ ์‘๋‹ต (WebSocket) ```json { "type": "recognition_result", "session_id": "SESSION_001", "result_id": "RESULT_001", "recognition_status": "Success", "duration": 4500000000, "offset": 0, "text": "ํšŒ์˜๋ฅผ ์‹œ์ž‘ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ์˜ค๋Š˜์€ ํ”„๋กœ์ ํŠธ ํ‚ฅ์˜คํ”„ ํšŒ์˜์ž…๋‹ˆ๋‹ค.", "confidence": 0.95, "speaker_id": "Speaker_1", "lexical": "ํšŒ์˜๋ฅผ ์‹œ์ž‘ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค ์˜ค๋Š˜์€ ํ”„๋กœ์ ํŠธ ํ‚ฅ์˜คํ”„ ํšŒ์˜์ž…๋‹ˆ๋‹ค", "itn": "ํšŒ์˜๋ฅผ ์‹œ์ž‘ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ์˜ค๋Š˜์€ ํ”„๋กœ์ ํŠธ ํ‚ฅ์˜คํ”„ ํšŒ์˜์ž…๋‹ˆ๋‹ค.", "display": "ํšŒ์˜๋ฅผ ์‹œ์ž‘ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ์˜ค๋Š˜์€ ํ”„๋กœ์ ํŠธ ํ‚ฅ์˜คํ”„ ํšŒ์˜์ž…๋‹ˆ๋‹ค.", "words": [ { "word": "ํšŒ์˜๋ฅผ", "offset": 0, "duration": 400000000, "confidence": 0.96 }, { "word": "์‹œ์ž‘ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค", "offset": 400000000, "duration": 1100000000, "confidence": 0.94 } ], "is_final": true, "timestamp": "2025-10-21T14:30:16.000Z" } ``` ### 3.4 ํ™”์ž ๋งค์นญ ๊ฒฐ๊ณผ (STT Gateway ๋‚ด๋ถ€ ์ฒ˜๋ฆฌ) ```json { "result_id": "RESULT_001", "azure_speaker_id": "Speaker_1", "matched_user": { "user_id": "USR_001", "name": "๊น€์ฒ ์ˆ˜", "confidence": 0.88 }, "matching_method": "voice_pattern", "timestamp": "2025-10-21T14:30:16.000Z" } ``` ### 3.5 Claude API ํ˜ธ์ถœ ๊ตฌ์กฐ #### **์š”์ฒญ (STT Gateway โ†’ Claude API)** ```json { "model": "claude-3-5-sonnet-20241022", "max_tokens": 2048, "messages": [ { "role": "user", "content": "๋‹ค์Œ์€ ํšŒ์˜ ๋ฐœ์–ธ ๋‚ด์šฉ์ž…๋‹ˆ๋‹ค. ํšŒ์˜๋ก ํ˜•์‹์— ๋งž์ถฐ ์ •๋ฆฌํ•ด์ฃผ์„ธ์š”.\n\n๋ฐœ์–ธ: \"ํšŒ์˜๋ฅผ ์‹œ์ž‘ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ์˜ค๋Š˜์€ ํ”„๋กœ์ ํŠธ ํ‚ฅ์˜คํ”„ ํšŒ์˜์ž…๋‹ˆ๋‹ค.\"\nํ™”์ž: ๊น€์ฒ ์ˆ˜\n์‹œ๊ฐ„: 2025-10-21 14:30:15\n\nํ…œํ”Œ๋ฆฟ ์„น์…˜: ์•ˆ๊ฑด, ๋…ผ์˜ ๋‚ด์šฉ, ๊ฒฐ์ • ์‚ฌํ•ญ, Todo" } ], "temperature": 0.3, "system": "๋‹น์‹ ์€ ํšŒ์˜๋ก ์ž‘์„ฑ ์ „๋ฌธ๊ฐ€์ž…๋‹ˆ๋‹ค. ๋ฐœ์–ธ ๋‚ด์šฉ์„ ๊ตฌ์กฐํ™”ํ•˜์—ฌ ๋ช…ํ™•ํ•˜๊ณ  ๊ฐ„๊ฒฐํ•˜๊ฒŒ ์ •๋ฆฌํ•ฉ๋‹ˆ๋‹ค." } ``` #### **์‘๋‹ต (Claude API โ†’ AI Service)** ```json { "id": "msg_01XYZ...", "type": "message", "role": "assistant", "content": [ { "type": "text", "text": "## ์•ˆ๊ฑด\n- ํ”„๋กœ์ ํŠธ ํ‚ฅ์˜คํ”„ ํšŒ์˜ ์ง„ํ–‰\n\n## ๋…ผ์˜ ๋‚ด์šฉ\n- (๋ฐœ์–ธ ๋‚ด์šฉ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์ž๋™ ์ž‘์„ฑ๋ฉ๋‹ˆ๋‹ค)\n\n## ๊ฒฐ์ • ์‚ฌํ•ญ\n- (์•„์ง ๊ฒฐ์ •๋œ ์‚ฌํ•ญ ์—†์Œ)\n\n## Todo\n- (์•„์ง ํ• ๋‹น๋œ ์ž‘์—… ์—†์Œ)" } ], "model": "claude-3-5-sonnet-20241022", "stop_reason": "end_turn", "usage": { "input_tokens": 245, "output_tokens": 128 } } ``` ### 3.4 RabbitMQ ์ด๋ฒคํŠธ ๊ตฌ์กฐ ```json { "event_type": "TextTranscribed", "event_id": "EVT_001", "timestamp": "2025-10-21T14:30:18.000Z", "correlation_id": "CORR_001", "payload": { "meeting_id": "MTG_001", "speaker": { "id": "USR_001", "name": "๊น€์ฒ ์ˆ˜" }, "transcription": { "text": "ํšŒ์˜๋ฅผ ์‹œ์ž‘ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ์˜ค๋Š˜์€ ํ”„๋กœ์ ํŠธ ํ‚ฅ์˜คํ”„ ํšŒ์˜์ž…๋‹ˆ๋‹ค.", "confidence": 0.95, "segments": [...] }, "timestamp": "2025-10-21T14:30:15.000Z" }, "metadata": { "source": "stt-gateway-service", "version": "1.0" } } ``` ### 3.6 Redis ์บ์‹œ ๊ตฌ์กฐ ```javascript // 1. ์‹ค์‹œ๊ฐ„ ๋ฐœ์–ธ (TTL: 10๋ถ„) Key: "meeting:MTG_001:live_text" Value: { "speaker": "๊น€์ฒ ์ˆ˜", "text": "ํšŒ์˜๋ฅผ ์‹œ์ž‘ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค...", "timestamp": "2025-10-21T14:30:15.000Z", "is_final": true } // 2. ์„น์…˜๋ณ„ ๋‚ด์šฉ (TTL: ํšŒ์˜ ์ข…๋ฃŒ ํ›„ 1์‹œ๊ฐ„) Key: "meeting:MTG_001:sections:agenda" Value: { "section_id": "agenda", "section_name": "์•ˆ๊ฑด", "content": "ํ”„๋กœ์ ํŠธ ํ‚ฅ์˜คํ”„ ํšŒ์˜ ์ง„ํ–‰\n- ํ”„๋กœ์ ํŠธ ๋ชฉํ‘œ ๋ฐ ๋ฒ”์œ„ ํ™•์ •\n- ์—ญํ•  ๋ถ„๋‹ด ๋ฐ ์ผ์ • ๊ณ„ํš", "verified": false, "last_updated": "2025-10-21T14:32:00.000Z" } // 3. ํ™”์ž ์ •๋ณด (TTL: ํšŒ์˜ ์ข…๋ฃŒ ํ›„ 1์‹œ๊ฐ„) Key: "meeting:MTG_001:speakers" Value: [ { "id": "USR_001", "name": "๊น€์ฒ ์ˆ˜", "role": "์ฃผ๊ด€์ž", "speech_count": 15, "speech_duration_ms": 180000 }, { "id": "USR_002", "name": "์ด์˜ํฌ", "role": "์ฐธ์„์ž", "speech_count": 12, "speech_duration_ms": 150000 } ] // 4. ํšŒ์˜ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ (TTL: ํšŒ์˜ ์ข…๋ฃŒ ํ›„ 24์‹œ๊ฐ„) Key: "meeting:MTG_001:metadata" Value: { "meeting_id": "MTG_001", "title": "ํ”„๋กœ์ ํŠธ ํ‚ฅ์˜คํ”„ ํšŒ์˜", "status": "in_progress", "start_time": "2025-10-21T14:00:00.000Z", "participants": ["USR_001", "USR_002", "USR_003"], "total_speech_count": 42, "last_activity": "2025-10-21T14:32:00.000Z" } ``` ### 3.7 WebSocket ์‹ค์‹œ๊ฐ„ ๋™๊ธฐํ™” ๋ฉ”์‹œ์ง€ ```json { "type": "transcription_update", "message_id": "WS_MSG_001", "timestamp": "2025-10-21T14:30:18.000Z", "data": { "meeting_id": "MTG_001", "speaker": { "id": "USR_001", "name": "๊น€์ฒ ์ˆ˜" }, "transcription": { "text": "ํšŒ์˜๋ฅผ ์‹œ์ž‘ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.", "is_final": true, "confidence": 0.95 }, "target_section": "agenda", "action": "append" } } ``` --- ## 4. ์ฒ˜๋ฆฌ ํ๋ฆ„ (Sequence) ### 4.1 ์‹ค์‹œ๊ฐ„ ์ŠคํŠธ๋ฆฌ๋ฐ ํ๋ฆ„ ``` Client STT Gateway Azure Speech RabbitMQ AI Service WebSocket Server โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚โ”€1.WebSocket ์—ฐ๊ฒฐโ”€โ–ถโ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚โ”€2.Speech ์„ธ์…˜โ”€โ–ถโ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ ์‹œ์ž‘ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚โ—€โ”€3.์„ธ์…˜ ์ค€๋น„โ”€โ”€โ”€โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚โ”€4.์‹ค์‹œ๊ฐ„ ์Œ์„ฑโ”€โ”€โ–ถโ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ ์ŠคํŠธ๋ฆผ ์ „์†ก โ”‚โ”€5.์˜ค๋””์˜ค ์ „์†กโ”€โ–ถโ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚โ—€โ”€6.์‹ค์‹œ๊ฐ„ ํ…์ŠคํŠธโ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ (ํ™”์ž ์‹๋ณ„) โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚โ”€โ”€โ”€โ”€โ”€โ”€7.์ด๋ฒคํŠธ ๋ฐœํ–‰โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ถโ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚โ”€โ”€8.๊ตฌ๋…โ”€โ”€โ–ถโ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚โ”€โ”€9.Claudeโ”€โ”€โ–ถโ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ ํ›„์ฒ˜๋ฆฌ โ”‚ โ”‚โ—€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€10.์‹ค์‹œ๊ฐ„ ๋™๊ธฐํ™”โ”€โ”€โ”€โ”€โ”‚ ``` **๋‹จ๊ณ„๋ณ„ ์„ค๋ช…:** 1. **Client**: WebSocket์œผ๋กœ STT Gateway ์—ฐ๊ฒฐ 2. **STT Gateway**: Azure Speech Services ์ŠคํŠธ๋ฆฌ๋ฐ ์„ธ์…˜ ์‹œ์ž‘ 3. **Azure Speech**: ์„ธ์…˜ ์ค€๋น„ ์™„๋ฃŒ ์‘๋‹ต 4. **Client**: MediaRecorder๋กœ ์‹ค์‹œ๊ฐ„ ์Œ์„ฑ ์ŠคํŠธ๋ฆผ ์ „์†ก 5. **STT Gateway**: Azure Speech๋กœ ์˜ค๋””์˜ค ์ŠคํŠธ๋ฆผ ์ „๋‹ฌ 6. **Azure Speech**: ์‹ค์‹œ๊ฐ„ ํ…์ŠคํŠธ ๋ณ€ํ™˜ + ํ™”์ž ์‹๋ณ„ (< 1์ดˆ ์ง€์—ฐ) 7. **STT Gateway**: RabbitMQ์— `TextTranscribed` ์ด๋ฒคํŠธ ๋ฐœํ–‰ 8. **AI Service**: RabbitMQ ๊ตฌ๋…ํ•˜์—ฌ ์ด๋ฒคํŠธ ์ˆ˜์‹  9. **AI Service**: Claude API๋กœ ํ…์ŠคํŠธ ํ›„์ฒ˜๋ฆฌ (๊ตฌ์กฐํ™”, ์š”์•ฝ) 10. **WebSocket Server**: ๋ชจ๋“  ์ฐธ์„์ž์—๊ฒŒ ์‹ค์‹œ๊ฐ„ ๋™๊ธฐํ™” ### 4.2 ํ™”์ž ์‹๋ณ„ ํ๋ฆ„ ``` Azure Speech STT Gateway Redis Cache Participants DB โ”‚ โ”‚ โ”‚ โ”‚ โ”‚โ”€1.Speaker_1โ”€โ”€โ”€โ–ถโ”‚ โ”‚ โ”‚ โ”‚ ์ธ์‹ ๊ฒฐ๊ณผ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚โ”€2.Speaker_1โ”€โ”€โ–ถโ”‚ โ”‚ โ”‚ โ”‚ ๋งคํ•‘ ์กฐํšŒ โ”‚ โ”‚ โ”‚ โ”‚โ—€โ”€3.๋งคํ•‘ ์—†์Œโ”€โ”€โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€4.์ฐธ์„์ž ๋ชฉ๋ก ์กฐํšŒโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ถโ”‚ โ”‚ โ”‚โ—€โ”€โ”€โ”€โ”€โ”€โ”€โ”€5.์ฐธ์„์ž ๋ชฉ๋กโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚โ”€6.์Œ์„ฑ ํŒจํ„ดโ”€โ”€โ”€โ”‚ โ”‚ โ”‚ โ”‚ ๊ธฐ๋ฐ˜ ๋งค์นญ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚โ”€7.Speaker_1 =โ”€โ–ถโ”‚ โ”‚ โ”‚ โ”‚ USR_001 ์ €์žฅ โ”‚ โ”‚ ``` **ํ™”์ž ๋งค์นญ ์ „๋žต:** 1. **์ฒซ ๋ฐœ์–ธ**: Azure๊ฐ€ ์ œ๊ณตํ•œ Speaker_1, Speaker_2 ๋“ฑ์„ ์ฐธ์„์ž ๋ชฉ๋ก๊ณผ ๋งค์นญ 2. **์Œ์„ฑ ํŒจํ„ด ๋ถ„์„**: ๋ฐœ์–ธ ์ˆœ์„œ, ๋ฐœ์–ธ ๋นˆ๋„, ์Œ์„ฑ ํŠน์ง• ๊ธฐ๋ฐ˜ ์ถ”์ • 3. **Redis ์บ์‹ฑ**: ๋งค์นญ ๊ฒฐ๊ณผ๋ฅผ ์บ์‹ฑํ•˜์—ฌ ์ดํ›„ ๋ฐœ์–ธ์— ์žฌ์‚ฌ์šฉ 4. **์ˆ˜๋™ ๋ณด์ •**: ์‚ฌ์šฉ์ž๊ฐ€ ํ™”์ž๋ฅผ ์ˆ˜๋™์œผ๋กœ ์ง€์ • ๊ฐ€๋Šฅ --- ## 5. ๊ตฌํ˜„ ์ƒ์„ธ ### 5.1 Frontend (React) #### **์Œ์„ฑ ์บก์ฒ˜ ๋ฐ WebSocket ์ŠคํŠธ๋ฆฌ๋ฐ** ```javascript // Azure Speech ์‹ค์‹œ๊ฐ„ ์ŠคํŠธ๋ฆฌ๋ฐ class AzureSpeechRecorder { constructor(meetingId, speakerId) { this.meetingId = meetingId; this.speakerId = speakerId; this.ws = null; this.mediaRecorder = null; this.audioContext = null; } async start() { // WebSocket ์—ฐ๊ฒฐ this.ws = new WebSocket(`ws://localhost:3001/api/stt/stream`); this.ws.onopen = () => { // ์„ธ์…˜ ์‹œ์ž‘ ์š”์ฒญ this.ws.send(JSON.stringify({ type: 'session_start', session_id: `SESSION_${Date.now()}`, meeting_id: this.meetingId, config: { language: 'ko-KR', sample_rate: 16000, format: 'audio/wav', enable_diarization: true, max_speakers: 10 } })); }; this.ws.onmessage = (event) => { const message = JSON.parse(event.data); if (message.type === 'session_ready') { this.startRecording(); } }; this.ws.onerror = (error) => { console.error('WebSocket error:', error); }; } async startRecording() { const stream = await navigator.mediaDevices.getUserMedia({ audio: { sampleRate: 16000, // Azure ๊ถŒ์žฅ channelCount: 1, echoCancellation: true, noiseSuppression: true, autoGainControl: true } }); // AudioContext๋กœ PCM ๋ณ€ํ™˜ this.audioContext = new AudioContext({ sampleRate: 16000 }); const source = this.audioContext.createMediaStreamSource(stream); const processor = this.audioContext.createScriptProcessor(4096, 1, 1); processor.onaudioprocess = (e) => { if (this.ws && this.ws.readyState === WebSocket.OPEN) { const audioData = e.inputBuffer.getChannelData(0); // Float32 PCM to Int16 PCM ๋ณ€ํ™˜ const int16Array = new Int16Array(audioData.length); for (let i = 0; i < audioData.length; i++) { int16Array[i] = Math.max(-32768, Math.min(32767, audioData[i] * 32768)); } // Base64 ์ธ์ฝ”๋”ฉํ•˜์—ฌ ์ „์†ก const base64Audio = this.arrayBufferToBase64(int16Array.buffer); this.ws.send(JSON.stringify({ type: 'audio_chunk', session_id: this.sessionId, audio_data: base64Audio, timestamp: new Date().toISOString() })); } }; source.connect(processor); processor.connect(this.audioContext.destination); } arrayBufferToBase64(buffer) { let binary = ''; const bytes = new Uint8Array(buffer); for (let i = 0; i < bytes.byteLength; i++) { binary += String.fromCharCode(bytes[i]); } return btoa(binary); } stop() { if (this.ws) { this.ws.send(JSON.stringify({ type: 'session_end', session_id: this.sessionId })); this.ws.close(); } if (this.audioContext) { this.audioContext.close(); } } } ``` #### **WebSocket ์‹ค์‹œ๊ฐ„ ์ˆ˜์‹ ** ```javascript class TranscriptionWebSocket { constructor(meetingId, onTranscription) { this.meetingId = meetingId; this.onTranscription = onTranscription; this.ws = null; } connect() { this.ws = new WebSocket(`ws://localhost:8080/ws/meetings/${this.meetingId}`); this.ws.onmessage = (event) => { const message = JSON.parse(event.data); if (message.type === 'transcription_update') { this.onTranscription(message.data); } }; this.ws.onerror = (error) => { console.error('WebSocket error:', error); // ์žฌ์—ฐ๊ฒฐ ๋กœ์ง setTimeout(() => this.connect(), 3000); }; } disconnect() { if (this.ws) { this.ws.close(); } } } ``` ### 5.2 Backend (Node.js + Azure Speech SDK) #### **STT Gateway Service (WebSocket Server)** ```javascript const WebSocket = require('ws'); const sdk = require('microsoft-cognitiveservices-speech-sdk'); const amqp = require('amqplib'); const redis = require('redis'); const wss = new WebSocket.Server({ port: 3001, path: '/api/stt/stream' }); const redisClient = redis.createClient({ url: process.env.REDIS_URL }); // Azure Speech ์„ค์ • const AZURE_SPEECH_KEY = process.env.AZURE_SPEECH_KEY; const AZURE_SPEECH_REGION = process.env.AZURE_SPEECH_REGION; // e.g., 'koreacentral' // ์„ธ์…˜ ์ €์žฅ์†Œ const sessions = new Map(); wss.on('connection', (ws) => { console.log('Client connected'); let recognizer = null; let sessionId = null; ws.on('message', async (data) => { const message = JSON.parse(data); try { switch (message.type) { case 'session_start': sessionId = message.session_id; await startAzureSpeechSession(ws, sessionId, message.meeting_id, message.config); break; case 'audio_chunk': // ์˜ค๋””์˜ค ์ฒญํฌ๋Š” Azure Speech SDK๊ฐ€ ์ž๋™ ์ฒ˜๋ฆฌ break; case 'session_end': if (recognizer) { recognizer.stopContinuousRecognitionAsync(); } break; } } catch (error) { console.error('WebSocket message error:', error); ws.send(JSON.stringify({ type: 'error', error: error.message })); } }); ws.on('close', () => { if (recognizer) { recognizer.stopContinuousRecognitionAsync(); } sessions.delete(sessionId); console.log('Client disconnected'); }); }); // Azure Speech ์„ธ์…˜ ์‹œ์ž‘ async function startAzureSpeechSession(ws, sessionId, meetingId, config) { // Azure Speech SDK ์„ค์ • const speechConfig = sdk.SpeechConfig.fromSubscription( AZURE_SPEECH_KEY, AZURE_SPEECH_REGION ); speechConfig.speechRecognitionLanguage = config.language || 'ko-KR'; speechConfig.enableDictation(); speechConfig.setProfanity(sdk.ProfanityOption.Masked); // ์˜ค๋””์˜ค ์ŠคํŠธ๋ฆผ ์„ค์ • (Push Stream) const pushStream = sdk.AudioInputStream.createPushStream(); const audioConfig = sdk.AudioConfig.fromStreamInput(pushStream); // Conversation Transcriber (ํ™”์ž ์‹๋ณ„ ํฌํ•จ) const transcriber = new sdk.ConversationTranscriber(speechConfig, audioConfig); // ์‹ค์‹œ๊ฐ„ ์ธ์‹ ์ด๋ฒคํŠธ ํ•ธ๋“ค๋Ÿฌ transcriber.transcribed = async (s, e) => { if (e.result.reason === sdk.ResultReason.RecognizedSpeech) { const result = { text: e.result.text, speaker_id: e.result.speakerId, confidence: e.result.properties.getProperty('Confidence'), offset: e.result.offset, duration: e.result.duration }; console.log(`[${result.speaker_id}]: ${result.text}`); // ํ™”์ž ๋งค์นญ const matchedUser = await matchSpeaker(meetingId, result.speaker_id); // RabbitMQ ์ด๋ฒคํŠธ ๋ฐœํ–‰ const event = { event_type: 'TextTranscribed', event_id: `EVT_${Date.now()}`, timestamp: new Date().toISOString(), payload: { meeting_id: meetingId, speaker: { id: matchedUser?.user_id || 'Unknown', name: matchedUser?.name || result.speaker_id, azure_speaker_id: result.speaker_id }, transcription: { text: result.text, confidence: parseFloat(result.confidence) || 0.9 }, timestamp: new Date().toISOString() }, metadata: { source: 'azure-speech-service', version: '2.0' } }; await publishToQueue('text-transcribed', event); // WebSocket์œผ๋กœ ํด๋ผ์ด์–ธํŠธ์— ์‹ค์‹œ๊ฐ„ ์ „์†ก ws.send(JSON.stringify({ type: 'recognition_result', session_id: sessionId, result_id: `RESULT_${Date.now()}`, recognition_status: 'Success', text: result.text, confidence: result.confidence, speaker_id: result.speaker_id, matched_user: matchedUser, is_final: true, timestamp: new Date().toISOString() })); } }; // ์—๋Ÿฌ ํ•ธ๋“ค๋Ÿฌ transcriber.canceled = (s, e) => { console.error(`Recognition canceled: ${e.errorDetails}`); ws.send(JSON.stringify({ type: 'error', error: e.errorDetails })); }; // ์ธ์‹ ์‹œ์ž‘ transcriber.startTranscribingAsync(() => { console.log('Azure Speech recognition started'); ws.send(JSON.stringify({ type: 'session_ready', session_id: sessionId })); // ์„ธ์…˜ ์ €์žฅ sessions.set(sessionId, { transcriber, pushStream, meetingId }); }); // WebSocket์—์„œ ๋ฐ›์€ ์˜ค๋””์˜ค ๋ฐ์ดํ„ฐ๋ฅผ Push Stream์— ์ „๋‹ฌ ws.on('message', (data) => { const message = JSON.parse(data); if (message.type === 'audio_chunk' && message.session_id === sessionId) { const audioBuffer = Buffer.from(message.audio_data, 'base64'); pushStream.write(audioBuffer); } }); } // ํ™”์ž ๋งค์นญ ๋กœ์ง async function matchSpeaker(meetingId, azureSpeakerId) { // Redis์—์„œ ๊ธฐ์กด ๋งค์นญ ์กฐํšŒ const cacheKey = `meeting:${meetingId}:speaker_mapping:${azureSpeakerId}`; const cached = await redisClient.get(cacheKey); if (cached) { return JSON.parse(cached); } // ์‹ ๊ทœ ํ™”์ž์ธ ๊ฒฝ์šฐ ์ฐธ์„์ž ๋ชฉ๋ก์—์„œ ์ถ”์ • // TODO: ์‹ค์ œ๋กœ๋Š” ๋ฐœ์–ธ ํŒจํ„ด, ์ˆœ์„œ ๋“ฑ์„ ๋ถ„์„ํ•˜์—ฌ ๋งค์นญ const participants = await getParticipants(meetingId); if (participants && participants.length > 0) { // ๊ฐ„๋‹จํ•œ ๋งค์นญ ์ „๋žต: ์ˆœ์„œ๋Œ€๋กœ ํ• ๋‹น const speakerIndex = parseInt(azureSpeakerId.replace('Speaker_', '')) - 1; const matchedUser = participants[speakerIndex % participants.length]; // Redis์— ์บ์‹ฑ await redisClient.setEx(cacheKey, 3600, JSON.stringify(matchedUser)); return matchedUser; } return null; } // ์ฐธ์„์ž ๋ชฉ๋ก ์กฐํšŒ async function getParticipants(meetingId) { // TODO: ์‹ค์ œ DB์—์„œ ์กฐํšŒ // ์ž„์‹œ๋กœ Redis์—์„œ ์กฐํšŒ const key = `meeting:${meetingId}:participants`; const data = await redisClient.get(key); return data ? JSON.parse(data) : []; } // RabbitMQ ๋ฐœํ–‰ async function publishToQueue(queueName, message) { const connection = await amqp.connect(process.env.RABBITMQ_URL); const channel = await connection.createChannel(); await channel.assertQueue(queueName, { durable: true }); channel.sendToQueue(queueName, Buffer.from(JSON.stringify(message)), { persistent: true }); await channel.close(); await connection.close(); } console.log('Azure Speech STT Gateway running on port 3001'); ``` #### **AI Service (Claude ํ›„์ฒ˜๋ฆฌ)** ```javascript const Anthropic = require('@anthropic-ai/sdk'); const amqp = require('amqplib'); const redis = require('redis'); const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY }); const redisClient = redis.createClient({ url: process.env.REDIS_URL }); // RabbitMQ ๊ตฌ๋… async function consumeQueue() { const connection = await amqp.connect(process.env.RABBITMQ_URL); const channel = await connection.createChannel(); await channel.assertQueue('text-transcribed', { durable: true }); channel.consume('text-transcribed', async (msg) => { const event = JSON.parse(msg.content.toString()); await processTranscription(event); channel.ack(msg); }); } // Claude๋กœ ํ…์ŠคํŠธ ํ›„์ฒ˜๋ฆฌ async function processTranscription(event) { const { meeting_id, speaker, transcription } = event.payload; // Redis์—์„œ ๊ธฐ์กด ํšŒ์˜๋ก ๋‚ด์šฉ ์กฐํšŒ const sectionsKey = `meeting:${meeting_id}:sections:*`; const sections = await redisClient.keys(sectionsKey); const context = sections.length > 0 ? await redisClient.get(sections[0]) : '(์ƒˆ๋กœ์šด ํšŒ์˜)'; // Claude API ํ˜ธ์ถœ const message = await anthropic.messages.create({ model: 'claude-3-5-sonnet-20241022', max_tokens: 2048, temperature: 0.3, system: '๋‹น์‹ ์€ ํšŒ์˜๋ก ์ž‘์„ฑ ์ „๋ฌธ๊ฐ€์ž…๋‹ˆ๋‹ค. ๋ฐœ์–ธ ๋‚ด์šฉ์„ ๊ตฌ์กฐํ™”ํ•˜์—ฌ ๋ช…ํ™•ํ•˜๊ณ  ๊ฐ„๊ฒฐํ•˜๊ฒŒ ์ •๋ฆฌํ•ฉ๋‹ˆ๋‹ค.', messages: [ { role: 'user', content: `๋‹ค์Œ์€ ํšŒ์˜ ๋ฐœ์–ธ ๋‚ด์šฉ์ž…๋‹ˆ๋‹ค. ํšŒ์˜๋ก ํ˜•์‹์— ๋งž์ถฐ ์ •๋ฆฌํ•ด์ฃผ์„ธ์š”. ๋ฐœ์–ธ: "${transcription.text}" ํ™”์ž: ${speaker.name} ์‹œ๊ฐ„: ${event.payload.timestamp} ๊ธฐ์กด ํšŒ์˜๋ก ๋‚ด์šฉ: ${context} ํ…œํ”Œ๋ฆฟ ์„น์…˜: ์•ˆ๊ฑด, ๋…ผ์˜ ๋‚ด์šฉ, ๊ฒฐ์ • ์‚ฌํ•ญ, Todo` } ] }); const structuredContent = message.content[0].text; // Redis์— ์—…๋ฐ์ดํŠธ๋œ ๋‚ด์šฉ ์ €์žฅ await redisClient.setEx( `meeting:${meeting_id}:sections:discussion`, 3600, structuredContent ); // WebSocket์œผ๋กœ ์‹ค์‹œ๊ฐ„ ๋™๊ธฐํ™” await broadcastToWebSocket(meeting_id, { type: 'transcription_update', data: { meeting_id, speaker, transcription: { text: structuredContent, is_final: true, confidence: transcription.confidence }, target_section: 'discussion', action: 'append' } }); } // WebSocket ๋ธŒ๋กœ๋“œ์บ์ŠคํŠธ async function broadcastToWebSocket(meetingId, message) { // WebSocket ์„œ๋ฒ„๋กœ ๋ฉ”์‹œ์ง€ ์ „์†ก (๊ตฌํ˜„ ํ•„์š”) // ์‹ค์ œ๋กœ๋Š” Redis Pub/Sub ๋˜๋Š” ๋ณ„๋„ WebSocket ์„œ๋ฒ„ ์—ฐ๋™ } // ์„œ๋น„์Šค ์‹œ์ž‘ (async () => { await redisClient.connect(); await consumeQueue(); console.log('AI Service started'); })(); ``` --- ## 6. ์˜ค๋ฅ˜ ์ฒ˜๋ฆฌ ๋ฐ ๋ณต๊ตฌ ์ „๋žต ### 6.1 ์˜ค๋ฅ˜ ์‹œ๋‚˜๋ฆฌ์˜ค | ์‹œ๋‚˜๋ฆฌ์˜ค | ๊ฐ์ง€ ๋ฐฉ๋ฒ• | ๋Œ€์‘ ์ „๋žต | |----------|-----------|-----------| | Azure Speech ์žฅ์•  | SDK error callback | ์ž๋™ ์žฌ์—ฐ๊ฒฐ (exponential backoff), ๋กœ์ปฌ ๋…น์Œ ์ €์žฅ | | ๋„คํŠธ์›Œํฌ ๋‹จ์ ˆ | WebSocket ์—ฐ๊ฒฐ ๋Š๊น€ | ์ž๋™ ์žฌ์—ฐ๊ฒฐ (์ตœ๋Œ€ 5ํšŒ), ํด๋ผ์ด์–ธํŠธ ๋กœ์ปฌ ์ €์žฅ | | ๋‚ฎ์€ confidence | score < 0.7 | ์‚ฌ์šฉ์ž์—๊ฒŒ ๊ฒฝ๊ณ  ํ‘œ์‹œ, ์ˆ˜๋™ ์ˆ˜์ • ๊ถŒ์žฅ | | ํ™”์ž ์‹๋ณ„ ์‹คํŒจ | Speaker_Unknown | "๋ฏธ์ง€์ • ํ™”์ž"๋กœ ํ‘œ์‹œ, ์ˆ˜๋™ ์ง€์ • ์ธํ„ฐํŽ˜์ด์Šค ์ œ๊ณต | | RabbitMQ ์žฅ์•  | ๋ฉ”์‹œ์ง€ ๋ฐœํ–‰ ์‹คํŒจ | ์žฌ์‹œ๋„ 3ํšŒ ํ›„ Redis ์ž„์‹œ ์ €์žฅ, ์ˆ˜๋™ ๋ณต๊ตฌ | | Azure API ํ• ๋‹น๋Ÿ‰ ์ดˆ๊ณผ | 429 Too Many Requests | ๊ฒฝ๊ณ  ์•Œ๋ฆผ, ํšŒ์˜ ์ผ์‹œ ์ค‘์ง€ ๊ถŒ์žฅ | ### 6.2 ์žฌ์‹œ๋„ ๋กœ์ง ```javascript async function retryWithExponentialBackoff(fn, maxRetries = 3) { for (let attempt = 0; attempt < maxRetries; attempt++) { try { return await fn(); } catch (error) { if (attempt === maxRetries - 1) throw error; const delay = Math.pow(2, attempt) * 1000; // 1s, 2s, 4s await new Promise(resolve => setTimeout(resolve, delay)); } } } ``` --- ## 7. ์„ฑ๋Šฅ ๋ฐ ํ™•์žฅ์„ฑ ### 7.1 ์„ฑ๋Šฅ ๋ชฉํ‘œ | ์ง€ํ‘œ | ๋ชฉํ‘œ๊ฐ’ | ์ธก์ • ๋ฐฉ๋ฒ• | |------|--------|-----------| | **STT ์ง€์—ฐ ์‹œ๊ฐ„** | **< 1์ดˆ** | ๋ฐœ์–ธ ์‹œ์ž‘ โ†’ ํ™”๋ฉด ํ‘œ์‹œ (Azure ์‹ค์‹œ๊ฐ„ ์ŠคํŠธ๋ฆฌ๋ฐ) | | WebSocket ์ง€์—ฐ | < 100ms | ๋ฉ”์‹œ์ง€ ๋ฐœํ–‰ โ†’ ํด๋ผ์ด์–ธํŠธ ์ˆ˜์‹  | | Claude API ์‘๋‹ต | < 2์ดˆ | API ํ˜ธ์ถœ โ†’ ์‘๋‹ต ์ˆ˜์‹  | | ๋™์‹œ ํšŒ์˜ ์ฒ˜๋ฆฌ | 100๊ฐœ | Azure Speech ๋™์‹œ ์„ธ์…˜ ๋ถ€ํ•˜ ํ…Œ์ŠคํŠธ | | ํ™”์ž ์‹๋ณ„ ์ •ํ™•๋„ | > 85% | Speaker Diarization ์ •ํ™•๋„ | ### 7.2 ํ™•์žฅ์„ฑ ์ „๋žต - **์ˆ˜ํ‰ ํ™•์žฅ**: STT Gateway WebSocket ์„œ๋ฒ„๋ฅผ ์—ฌ๋Ÿฌ ์ธ์Šคํ„ด์Šค๋กœ ๋ถ„์‚ฐ (Load Balancer) - **์บ์‹ฑ**: Redis์— ํ™”์ž ๋งค์นญ ์ •๋ณด ์บ์‹ฑํ•˜์—ฌ ์ค‘๋ณต ์ฒ˜๋ฆฌ ๋ฐฉ์ง€ - **Queue ํŒŒํ‹ฐ์…”๋‹**: ํšŒ์˜ ID ๊ธฐ๋ฐ˜ RabbitMQ ํŒŒํ‹ฐ์…”๋‹ - **Azure ๋ฆฌ์†Œ์Šค ๊ด€๋ฆฌ**: Azure Speech Services ํ• ๋‹น๋Ÿ‰ ๋ชจ๋‹ˆํ„ฐ๋ง ๋ฐ ์ž๋™ ์Šค์ผ€์ผ๋ง - **CDN**: ์Œ์„ฑ ํŒŒ์ผ ์ €์žฅ ์‹œ S3 + CloudFront ํ™œ์šฉ --- ## 8. ๋ณด์•ˆ ๋ฐ ๊ฐœ์ธ์ •๋ณด ๋ณดํ˜ธ ### 8.1 ๋ณด์•ˆ ์š”๊ตฌ์‚ฌํ•ญ - **์ „์†ก ์•”ํ˜ธํ™”**: HTTPS/WSS ์‚ฌ์šฉ - **์ธ์ฆ/์ธ๊ฐ€**: JWT ํ† ํฐ ๊ธฐ๋ฐ˜ ํšŒ์˜ ์ ‘๊ทผ ์ œ์–ด - **์Œ์„ฑ ๋ฐ์ดํ„ฐ ๋ณดํ˜ธ**: ๋…น์Œ ํŒŒ์ผ ์•”ํ˜ธํ™” ์ €์žฅ (AES-256) - **๊ฐœ์ธ์ •๋ณด ์ฒ˜๋ฆฌ**: GDPR ์ค€์ˆ˜, ์Œ์„ฑ ๋ฐ์ดํ„ฐ ๋ณด๊ด€ ๊ธฐ๊ฐ„ ์ œํ•œ (30์ผ) ### 8.2 ๋ฐ์ดํ„ฐ ์ƒ๋ช…์ฃผ๊ธฐ ``` ๋…น์Œ ์‹œ์ž‘ โ†’ ์‹ค์‹œ๊ฐ„ ์ฒ˜๋ฆฌ โ†’ Redis ์บ์‹ฑ (10๋ถ„) โ†“ PostgreSQL ์ €์žฅ (30์ผ) โ†“ ์ž๋™ ์‚ญ์ œ (ํšŒ์˜ ์ข…๋ฃŒ ํ›„ 30์ผ) ``` --- ## 9. ๋ชจ๋‹ˆํ„ฐ๋ง ๋ฐ ๋กœ๊น… ### 9.1 ๋ชจ๋‹ˆํ„ฐ๋ง ์ง€ํ‘œ - **STT ์„ฑ๊ณต๋ฅ **: Whisper ์„ฑ๊ณต๋ฅ , Google ํด๋ฐฑ ๋น„์œจ - **ํ‰๊ท  confidence score**: ํ…์ŠคํŠธ ๋ณ€ํ™˜ ํ’ˆ์งˆ ์ถ”์  - **์ฒ˜๋ฆฌ ์ง€์—ฐ ์‹œ๊ฐ„**: ๊ฐ ๋‹จ๊ณ„๋ณ„ ์†Œ์š” ์‹œ๊ฐ„ - **์˜ค๋ฅ˜์œจ**: API ์˜ค๋ฅ˜, ๋„คํŠธ์›Œํฌ ์˜ค๋ฅ˜ ๋น„์œจ ### 9.2 ๋กœ๊น… ์ „๋žต ```javascript // ๊ตฌ์กฐํ™”๋œ ๋กœ๊ทธ { "timestamp": "2025-10-21T14:30:18.000Z", "level": "INFO", "service": "stt-gateway", "event": "transcription_success", "request_id": "REQ_001", "meeting_id": "MTG_001", "provider": "whisper", "confidence": 0.95, "processing_time_ms": 850 } ``` --- ## 10. ํ…Œ์ŠคํŠธ ์ „๋žต ### 10.1 ๋‹จ์œ„ ํ…Œ์ŠคํŠธ - Azure Speech SDK ์—ฐ๋™ ๋ชจํ‚น - Claude API ์‘๋‹ต ํŒŒ์‹ฑ - Redis ์บ์‹ฑ ๋กœ์ง - ํ™”์ž ๋งค์นญ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ### 10.2 ํ†ตํ•ฉ ํ…Œ์ŠคํŠธ - STT Gateway (Azure Speech) โ†’ RabbitMQ โ†’ AI Service ์ „์ฒด ํ”Œ๋กœ์šฐ - WebSocket ์–‘๋ฐฉํ–ฅ ์‹ค์‹œ๊ฐ„ ๋™๊ธฐํ™” - ํ™”์ž ์‹๋ณ„ ๋ฐ ๋งค์นญ ์ •ํ™•๋„ ๊ฒ€์ฆ ### 10.3 ์„ฑ๋Šฅ ํ…Œ์ŠคํŠธ - ๋™์‹œ 100๊ฐœ ํšŒ์˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ (Azure Speech ๋™์‹œ ์„ธ์…˜) - ์‹ค์‹œ๊ฐ„ ์ŠคํŠธ๋ฆฌ๋ฐ ์ง€์—ฐ ์‹œ๊ฐ„ ์ธก์ • (๋ชฉํ‘œ: < 1์ดˆ) - Azure API ํ• ๋‹น๋Ÿ‰ ๋ฐ ์ฒ˜๋ฆฌ๋Ÿ‰ ํ…Œ์ŠคํŠธ ### 10.4 ํ’ˆ์งˆ ํ…Œ์ŠคํŠธ - ๋‹ค์–‘ํ•œ ์Œ์งˆ ํ™˜๊ฒฝ์—์„œ STT ์ •ํ™•๋„ ์ธก์ • - ํ™”์ž ์‹๋ณ„ ์ •ํ™•๋„ ๊ฒ€์ฆ (๋ชฉํ‘œ: > 85%) - ํ•œ๊ตญ์–ด ๋ฐฉ์–ธ ๋ฐ ์–ต์–‘ ๋Œ€์‘ ํ…Œ์ŠคํŠธ --- ## 11. ๊ตฌํ˜„ ์ผ์ • | ๋‹จ๊ณ„ | ์ž‘์—… | ๋‹ด๋‹น์ž | ์˜ˆ์ƒ ๊ธฐ๊ฐ„ | |------|------|--------|-----------| | 1 | Frontend ์Œ์„ฑ ์บก์ฒ˜ (WebSocket) ๊ตฌํ˜„ | ์ตœ์œ ์ง„ | 4์ผ | | 2 | Azure Speech SDK ์—ฐ๋™ ๋ฐ STT Gateway ๊ฐœ๋ฐœ | ์ด์ค€ํ˜ธ | 6์ผ | | 3 | ํ™”์ž ์‹๋ณ„ ๋ฐ ๋งค์นญ ๋กœ์ง ๊ตฌํ˜„ | ๋ฐ•์„œ์—ฐ | 3์ผ | | 4 | RabbitMQ ์„ค์ • ๋ฐ ์ด๋ฒคํŠธ ์ฒ˜๋ฆฌ | ์ด๋™์šฑ | 3์ผ | | 5 | AI Service (Claude ์—ฐ๋™) | ๋ฐ•์„œ์—ฐ | 4์ผ | | 6 | Redis ์บ์‹ฑ ๊ตฌํ˜„ | ์ด์ค€ํ˜ธ | 2์ผ | | 7 | WebSocket ์–‘๋ฐฉํ–ฅ ์‹ค์‹œ๊ฐ„ ๋™๊ธฐํ™” | ์ตœ์œ ์ง„ | 4์ผ | | 8 | ํ†ตํ•ฉ ํ…Œ์ŠคํŠธ ๋ฐ ํ™”์ž ์‹๋ณ„ ๊ฒ€์ฆ | ์ •๋„ํ˜„ | 6์ผ | | 9 | ์„ฑ๋Šฅ ์ตœ์ ํ™” ๋ฐ Azure ๋ฆฌ์†Œ์Šค ํŠœ๋‹ | ์ „์ฒด | 3์ผ | **์ด ์˜ˆ์ƒ ๊ธฐ๊ฐ„**: 35์ผ (์•ฝ 5์ฃผ) --- ## 12. ์ฐธ๊ณ  ์ž๋ฃŒ ### Azure Speech Services - [Azure Speech SDK Documentation](https://learn.microsoft.com/en-us/azure/ai-services/speech-service/) - [Conversation Transcription (ํ™”์ž ์‹๋ณ„)](https://learn.microsoft.com/en-us/azure/ai-services/speech-service/conversation-transcription) - [Real-time Speech-to-Text](https://learn.microsoft.com/en-us/azure/ai-services/speech-service/get-started-speech-to-text) - [Azure Speech Pricing](https://azure.microsoft.com/en-us/pricing/details/cognitive-services/speech-services/) ### ๊ธฐํƒ€ ์ฐธ๊ณ  ์ž๋ฃŒ - [Anthropic Claude API](https://docs.anthropic.com/claude/reference/messages_post) - [RabbitMQ ๊ณต์‹ ๋ฌธ์„œ](https://www.rabbitmq.com/documentation.html) - [Redis Caching Best Practices](https://redis.io/docs/manual/patterns/) - [WebSocket API](https://developer.mozilla.org/en-US/docs/Web/API/WebSocket) - [Web Audio API](https://developer.mozilla.org/en-US/docs/Web/API/Web_Audio_API) --- ## 13. ๋ณ€๊ฒฝ ์ด๋ ฅ | ๋ฒ„์ „ | ๋‚ ์งœ | ์ž‘์„ฑ์ž | ๋ณ€๊ฒฝ ๋‚ด์šฉ | |------|------|--------|-----------| | 1.0 | 2025-10-21 | ๊ฐœ๋ฐœํŒ€ ์ „์ฒด | ์ตœ์ดˆ ์ž‘์„ฑ (Whisper + Google ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ์ „๋žต) | | 2.0 | 2025-10-21 | ๊ฐœ๋ฐœํŒ€ ์ „์ฒด | **Azure Speech Services ๋‹จ์ผ ์ „๋žต์œผ๋กœ ์ „๋ฉด ๋ณ€๊ฒฝ**
- STT ์—”์ง„: Whisper โ†’ Azure Speech Services
- ์‹ค์‹œ๊ฐ„ ์ŠคํŠธ๋ฆฌ๋ฐ ๋ฐฉ์‹ ์ ์šฉ (์ง€์—ฐ ์‹œ๊ฐ„ < 1์ดˆ)
- Speaker Diarization ๊ธฐ๋ณธ ์ง€์›
- ํด๋ฐฑ ์ „๋žต ์ œ๊ฑฐ (Azure ๋‹จ์ผ ์‚ฌ์šฉ)
- ๊ตฌํ˜„ ์ฝ”๋“œ ์ „๋ฉด ์ˆ˜์ • (Frontend/Backend)
- ๊ตฌํ˜„ ์ผ์ • ์กฐ์ • (4์ฃผ โ†’ 5์ฃผ) | --- **๋ฌธ์„œ ์Šน์ธ:** - AI Specialist: ๋ฐ•์„œ์—ฐ - Backend Developer: ์ด์ค€ํ˜ธ, ์ด๋™์šฑ - Frontend Developer: ์ตœ์œ ์ง„ - Architect: ํ™๊ธธ๋™ - QA Engineer: ์ •๋„ํ˜„