hgzero/design/backend/sequence/inner/ai-전문용어감지.puml

@startuml
!theme mono

title AI Service 내부 시퀀스 - 전문용어감지

participant "TermController" as Controller
participant "TermDetectionService" as Service
participant "LLMClient" as LLM
participant "TermRepository" as Repo
database "Azure OpenAI<<E>>" as OpenAI
database "PostgreSQL<<E>>" as DB

== 회의록 텍스트 실시간 분석 요청 ==

note over Controller
  API 요청 (실시간 또는 배치):
  POST /api/ai/terms/detect
  Body: {
    "meetingId": "{meetingId}",
    "text": "회의록 텍스트"
  }
end note

Controller -> Service: detectTerms(meetingId, text)
activate Service

== 용어 사전 조회 ==

par "조직별 용어 사전"
    Service -> Repo: getOrganizationTerms(organizationId)
    activate Repo

    Repo -> DB: SELECT term, definition, category\nFROM term_dictionary\nWHERE org_id = {organizationId}
    activate DB

    DB --> Repo: 조직 전문용어 목록
    deactivate DB

    Repo --> Service: orgTerms
    deactivate Repo

and "산업별 표준 용어"
    Service -> Repo: getIndustryTerms(industry)
    activate Repo

    Repo -> DB: SELECT term, definition, category\nFROM standard_terms\nWHERE industry = {industry}
    activate DB

    DB --> Repo: 산업 표준용어 목록
    deactivate DB

    Repo --> Service: industryTerms
    deactivate Repo
end

Service -> Service: 용어 사전 병합 및 준비
note right
  용어 사전:
  - 조직별 용어 (우선순위 높음)
  - 산업별 표준 용어
  - 기술 용어
end note

== LLM 기반 전문용어 감지 ==

Service -> Service: 용어 감지 프롬프트 생성
note right
  시스템 프롬프트:
  - 역할: 전문용어 감지 전문가
  - 지시사항:
    * 텍스트에서 전문용어 탐지
    * 용어 사전과 비교
    * 신뢰도 점수 계산
    * 위치 정보 추출

  사용자 프롬프트:
  - 분석 대상 텍스트: {text}
  - 용어 사전: {termDictionary}

  응답 형식:
  {
    "detectedTerms": [
      {
        "term": "용어명",
        "position": {line, offset},
        "confidence": 0.0-1.0,
        "category": "기술|업무|도메인"
      }
    ]
  }
end note

Service -> LLM: detectTechnicalTerms(prompt, text, termDictionary)
activate LLM

LLM -> OpenAI: POST /chat/completions
activate OpenAI
note right
  요청 파라미터:
  - model: gpt-4o
  - temperature: 0.1
  - response_format: json_object
end note

OpenAI -> OpenAI: 텍스트 분석 및 용어 감지
note right
  처리 단계:
  1. 텍스트 토큰화
  2. 용어 사전과 매칭
  3. 문맥 기반 용어 식별
  4. 신뢰도 계산
     - 정확한 매칭: 0.9-1.0
     - 변형 매칭: 0.7-0.9
     - 문맥 기반: 0.7-0.8
  5. 위치 정보 추출
end note

OpenAI --> LLM: 감지된 용어 목록 (JSON)
deactivate OpenAI

LLM --> Service: detectedTerms
deactivate LLM

== 용어 필터링 및 검증 ==

Service -> Service: 신뢰도 기반 필터링
note right
  필터링 기준:
  - 신뢰도 70% 이상만 선택
  - 중복 용어 제거
    (첫 번째 출현만 유지)
  - 카테고리별 분류
end note

loop 각 감지된 용어마다

    Service -> Service: 용어 메타데이터 보강
    note right
      추가 정보:
      - 용어 정의 (사전에서)
      - 카테고리
      - 사용 빈도
      - 관련 문서 참조
    end note

end

== 감지 결과 저장 ==

Service -> Repo: saveDetectedTerms(meetingId, detectedTerms)
activate Repo

loop 각 용어마다

    Repo -> DB: INSERT INTO detected_terms
    activate DB
    note right
      저장 데이터:
      - meeting_id
      - term
      - position (JSON)
      - confidence_score
      - category
      - detected_at
      - status: DETECTED
    end note

    DB --> Repo: termId
    deactivate DB

end

Repo --> Service: 저장 완료
deactivate Repo

== 하이라이트 정보 생성 ==

Service -> Service: 하이라이트 데이터 구성
note right
  프론트엔드 전달 정보:
  - 용어 위치 (줄 번호, 오프셋)
  - 하이라이트 스타일
  - 툴팁 텍스트
  - 신뢰도 표시
end note

== 맥락 기반 설명 트리거 ==

Service -> Service: 용어 설명 생성 트리거
note right
  비동기로 용어 설명 생성 시작
  (UFR-RAG-020 연동)

  각 감지된 용어에 대해:
  - RAG 검색 수행
  - 맥락 기반 설명 생성
end note

== 응답 반환 ==

Service -> Service: 응답 데이터 구성
note right
  응답 데이터:
  - detectedTerms: [
      {
        "term": "용어명",
        "position": {line, offset},
        "confidence": 0.85,
        "category": "기술",
        "highlight": true
      }
    ]
  - totalCount: 감지된 용어 수
  - highlightInfo: 하이라이트 정보
end note

Service --> Controller: 감지 완료 응답
deactivate Service

Controller --> Controller: 200 OK 응답 반환
note right
  프론트엔드 처리:
  - 용어 하이라이트 표시
  - 툴팁 준비
  - 설명 로딩 중 표시
end note

note over Controller, DB
처리 시간:
- 용어 사전 조회: 100-200ms
- LLM 용어 감지: 2-4초
- 필터링 및 검증: 100-200ms
- 저장 처리: 200-300ms
총 처리 시간: 약 3-5초

정책:
- 신뢰도 70% 이상만 자동 감지
- 중복 용어는 첫 번째만 하이라이트
- 맥락 기반 설명은 비동기 생성
end note

@enduml