Aller au contenu

Modèles de Données - Smart Transcription

Version: 1.0.0
Date: 11 Mars 2026
Statut: Production


Table des Matières

  1. Architecture Base de Données
  2. Modèle EnrichedSegment
  3. Modèle VoiceprintLibrary
  4. Modèle ContextualFile
  5. Modèle Transcript
  6. Relations & ERD
  7. Exemples Réels

1. Architecture Base de Données

1.1 Deux Schémas PostgreSQL

graph TB subgraph "Smart Transcription BFF - Schema: st" T[Transcripts] E[EnrichedSegments] V[VoiceprintLibrary] C[ContextualFiles] U[Users] T --> E T --> V T --> C U --> T U --> V end subgraph "MeetNoo GPU - Schema: meetnoo" P[Pipelines] S[Segments] SP[Speakers] VP[Voiceprints] P --> S P --> SP P --> VP end style T fill:#06b6d4,stroke:#fff,color:#fff style E fill:#f97316,stroke:#fff,color:#fff style V fill:#a78bfa,stroke:#fff,color:#fff

Séparation:
- Schema st.* → BFF (auth, RAG, enrichissement)
- Schema meetnoo.* → GPU service (pipeline, transcription)

Communication:
- BFF → Read only MeetNoo (query pipeline results)
- MeetNoo → No access to BFF (zero coupling back)

1.2 Tables BFF (st.*)

Table Description Lignes (prod)
users Comptes utilisateurs 150
transcripts Transcriptions (status, progress) 1,200
enriched_segments Segments enrichis RAG + LLM 45,000
voiceprint_library Bibliothèque voiceprints 600
contextual_files Documents RAG (PDF, DOCX) 4,500
voiceprint_chat_history Conversations chat 800

1.3 Migration Legacy → v3

-- BEFORE (Legacy)
CREATE TABLE segments (
    id VARCHAR PRIMARY KEY,
    transcript_id VARCHAR,
    speaker VARCHAR,              -- "SPEAKER_00"
    start_time FLOAT,
    end_time FLOAT,
    transcription TEXT,
    created_at VARCHAR
);

-- AFTER (Architecture v3)
CREATE TABLE enriched_segments (
    id VARCHAR PRIMARY KEY,
    transcript_id VARCHAR,
    speaker_label VARCHAR,        -- "SPEAKER_00"
    identified_name VARCHAR,      -- "Jean Dupont" (NEW)
    match_source VARCHAR,         -- "voiceprint_audio" | "rag" | "llm" (NEW)

    start_time FLOAT,
    end_time FLOAT,
    transcription TEXT,

    -- NEW: RAG enrichment metadata
    rag_context JSONB,

    -- NEW: Additional metadata
    metadata JSONB,

    -- NEW: Voiceprint linking
    voiceprint_lib_id VARCHAR,

    created_at VARCHAR,
    updated_at VARCHAR
);

Key Changes:
- identified_name field (vs manual identification)
- match_source tracking (voiceprint/rag/llm)
- rag_context JSONB (email, phone, role, company)
- voiceprint_lib_id FK (link to voiceprint library)
- metadata JSONB (extensible custom data)


2. Modèle EnrichedSegment

2.1 Définition SQLAlchemy

class EnrichedSegment(Base):
    __tablename__ = "enriched_segments"

    # Primary Key
    id = Column(String, primary_key=True, index=True)

    # Foreign Keys
    transcript_id = Column(
        String,
        ForeignKey("transcripts.id", ondelete="CASCADE"),
        nullable=False,
        index=True
    )
    voiceprint_lib_id = Column(
        String,
        ForeignKey("voiceprint_library.id", ondelete="SET NULL"),
        nullable=True,
        index=True
    )

    # Speaker Information
    speaker_label = Column(String, nullable=False)      # "SPEAKER_00"
    identified_name = Column(String, nullable=True)     # "Jean Dupont" or NULL
    match_source = Column(
        String,
        nullable=True,
        comment="voiceprint_audio | rag | llm_inference | unknown"
    )
    confidence_score = Column(Float, nullable=True)     # 0.0-1.0

    # Timing
    start_time = Column(Float, nullable=False)          # Seconds
    end_time = Column(Float, nullable=False)            # Seconds

    # Transcription
    transcription = Column(Text, nullable=False)        # Raw text
    cleaned_transcription = Column(Text, nullable=True) # LLM cleaned

    # RAG Context (JSONB)
    rag_context = Column(
        JSONB,
        nullable=True,
        comment="RAG enrichment: {role, email, phone, company, department, keywords}"
    )

    # Additional Metadata (JSONB)
    metadata = Column(
        JSONB,
        nullable=True,
        default={},
        server_default=text("'{}'::jsonb"),
        comment="Extensible metadata: {language, sentiment, topics, etc.}"
    )

    # Timestamps
    created_at = Column(String, nullable=False)
    updated_at = Column(String, nullable=False)

    # Relationships
    transcript = relationship("Transcript", back_populates="enriched_segments")
    voiceprint = relationship("VoiceprintLibrary", back_populates="enriched_segments")

2.2 Champs Détaillés

Champ Type Nullable Description
id String Non UUID segment
transcript_id String (FK) Non Transcript parent
voiceprint_lib_id String (FK) Oui Link voiceprint (si match)
speaker_label String Non Label GPU: "SPEAKER_00" to "SPEAKER_XX"
identified_name String Oui Nom réel: "Jean Dupont" (NULL si non identifié)
match_source String Oui Source identification: voiceprint_audio | rag | llm_inference | unknown
confidence_score Float Oui Score confiance (0.0-1.0)
start_time Float Non Début segment (secondes depuis début audio)
end_time Float Non Fin segment (secondes)
transcription Text Non Texte brut GPU (sans ponctuation)
cleaned_transcription Text Oui Texte corrigé LLM (avec ponctuation)
rag_context JSONB Oui Contexte RAG: {role, email, phone, company, department, keywords}
metadata JSONB Oui Metadata extensible: {language, sentiment, topics}
created_at String Non Unix timestamp création
updated_at String Non Unix timestamp mise à jour

2.3 Structure JSONB rag_context

{
  "role": "Senior Diplomat",
  "email": "kwame.mensah@onu.org",
  "phone": "+1-555-0123",
  "company": "Organisation des Nations Unies (ONU)",
  "department": "Relations Internationales",
  "keywords": [
    "évaluation",
    "politiques publiques",
    "stratégie nationale"
  ],
  "glossary_terms": [
    "TF-IDF",
    "RAG"
  ],
  "rag_score": 0.73,
  "rag_source_chunks": [
    "chunk-001",
    "chunk-003"
  ]
}

Cas d'usage:

# Query segments with email
segments_with_email = db.query(EnrichedSegment).filter(
    EnrichedSegment.rag_context["email"].astext != None
).all()

# Query segments for specific company
onu_segments = db.query(EnrichedSegment).filter(
    EnrichedSegment.rag_context["company"].astext.like("%ONU%")
).all()

2.4 Structure JSONB metadata

{
  "language": "fr",
  "sentiment": "neutral",
  "topics": [
    "backend development",
    "RAG workflow"
  ],
  "duration_seconds": 12.5,
  "word_count": 45,
  "speaking_rate": 3.6
}

3. Modèle VoiceprintLibrary

3.1 Définition SQLAlchemy

class VoiceprintLibrary(Base):
    __tablename__ = "voiceprint_library"

    # Primary Key
    id = Column(String, primary_key=True, index=True)

    # Foreign Keys
    user_id = Column(
        String,
        ForeignKey("users.id", ondelete="CASCADE"),
        nullable=False,
        index=True
    )
    transcript_id = Column(
        String,
        ForeignKey("transcripts.id", ondelete="SET NULL"),
        nullable=True,
        comment="First transcription where speaker appeared"
    )

    # Speaker Information
    speaker_label = Column(
        String,
        nullable=False,
        comment="Original GPU label: SPEAKER_00, SPEAKER_01, etc."
    )
    identified_name = Column(
        String,
        nullable=True,
        index=True,
        comment="Real name: 'Jean Dupont' (NULL if pending)"
    )

    # Voiceprint Embeddings (Dual)
    voiceprint_audio_512d = Column(
        Text,
        nullable=False,
        comment="PyAnnote AI audio embedding (512d JSON array)"
    )
    audio_model = Column(
        String,
        nullable=False,
        default="pyannote-audio",
        comment="Model used for audio voiceprint"
    )

    voiceprint_text_1024d = Column(
        Text,
        nullable=True,
        comment="BGE-M3 text embedding (1024d JSON array)"
    )
    text_model = Column(
        String,
        nullable=True,
        default="BAAI/bge-m3",
        comment="Model used for text voiceprint"
    )

    # Status & Tracking
    status = Column(
        String,
        nullable=False,
        default="pending",
        comment="pending | confirmed | rejected"
    )
    match_source = Column(
        String,
        nullable=True,
        comment="voiceprint_audio | rag | llm_inference | manual | unknown"
    )

    # Metadata (JSONB)
    email = Column(String, nullable=True)
    phone = Column(String, nullable=True)
    company = Column(String, nullable=True)
    role = Column(String, nullable=True)
    department = Column(String, nullable=True)

    additional_metadata = Column(
        JSONB,
        nullable=True,
        default={},
        server_default=text("'{}'::jsonb")
    )

    # Tracking
    first_seen_at = Column(DateTime, nullable=True)
    last_seen_at = Column(DateTime, nullable=True)
    total_occurrences = Column(Integer, nullable=False, default=1)

    # Timestamps
    created_at = Column(String, nullable=False)
    updated_at = Column(String, nullable=False)

    # Relationships
    user = relationship("User", back_populates="voiceprint_library")
    transcript = relationship("Transcript", back_populates="voiceprints")
    enriched_segments = relationship(
        "EnrichedSegment",
        back_populates="voiceprint",
        cascade="all, delete-orphan"
    )

3.2 Champs Détaillés

Champ Type Description
voiceprint_audio_512d Text (JSON) Embedding audio PyAnnote (512 dimensions)
voiceprint_text_1024d Text (JSON) Embedding texte BGE-M3 (1024 dimensions)
status String pending (auto-saved) | confirmed (identified) | rejected (false match)
match_source String Source identification: voiceprint_audio | rag | llm_inference | manual
identified_name String Nom réel (NULL si pending)
first_seen_at DateTime Première occurrence (première transcription)
last_seen_at DateTime Dernière occurrence
total_occurrences Integer Nombre de transcriptions où speaker apparaît

3.3 Dual Embeddings Strategy

# Audio Voiceprint (512d - PyAnnote AI)
voiceprint_audio = [0.123, 0.456, ..., 0.789]  # 512 floats
# Used for: Audio-based matching (biometric)

# Text Voiceprint (1024d - BGE-M3)
voiceprint_text = [0.234, 0.567, ..., 0.890]  # 1024 floats
# Used for: Text-based matching (semantic context)

Matching Strategy:

# Priority 1: Audio match (biometric)
audio_similarity = cosine_similarity(
    new_voiceprint_512d,
    stored.voiceprint_audio_512d
)

if audio_similarity > 0.85:
    match = True

# Priority 2: Text match (semantic fallback)
else:
    text_similarity = cosine_similarity(
        mean_pooled_segments_1024d,
        stored.voiceprint_text_1024d
    )

    if text_similarity > 0.75:
        match = True  # Less strict threshold

3.4 Status Lifecycle

stateDiagram-v2 [*] --> pending: Auto-save (Priority 1 fail) pending --> confirmed: LLM identifies (confidence > 0.75) pending --> confirmed: Manual identification (user) pending --> rejected: False match (user correction) confirmed --> confirmed: Re-match (next transcriptions) rejected --> [*]: Delete note right of pending identified_name = NULL match_source = "unknown" end note note right of confirmed identified_name = "Jean Dupont" match_source = "llm_inference" end note

4. Modèle ContextualFile

4.1 Définition SQLAlchemy

class ContextualFile(Base):
    __tablename__ = "contextual_files"

    # Primary Key
    id = Column(String, primary_key=True, index=True)

    # Foreign Keys
    user_id = Column(
        String,
        ForeignKey("users.id", ondelete="CASCADE"),
        nullable=False,
        index=True
    )
    transcript_id = Column(
        String,
        ForeignKey("transcripts.id", ondelete="CASCADE"),
        nullable=False,
        index=True
    )

    # File Information
    filename = Column(String, nullable=False)
    file_type = Column(
        String,
        nullable=False,
        comment="pdf | docx | txt"
    )
    file_size_bytes = Column(Integer, nullable=False)

    # Storage
    s3_key = Column(
        String,
        nullable=False,
        unique=True,
        comment="S3 key: contextual_files/{user_id}/{transcript_id}/{filename}"
    )
    s3_bucket = Column(
        String,
        nullable=False,
        default="meetnoo-storage"
    )

    # Processing Status
    processing_status = Column(
        String,
        nullable=False,
        default="pending",
        comment="pending | processing | indexed | failed"
    )
    indexed_at = Column(DateTime, nullable=True)

    # RAG Metadata
    total_chunks = Column(Integer, nullable=True)
    total_tokens = Column(Integer, nullable=True)

    extraction_metadata = Column(
        JSONB,
        nullable=True,
        comment="Metadata extracted by LLM: participants, project_context, glossary"
    )

    # Error Tracking
    error_message = Column(Text, nullable=True)
    retry_count = Column(Integer, nullable=False, default=0)

    # Timestamps
    created_at = Column(String, nullable=False)
    updated_at = Column(String, nullable=False)

    # Relationships
    user = relationship("User", back_populates="contextual_files")
    transcript = relationship("Transcript", back_populates="contextual_files")

4.2 Structure JSONB extraction_metadata

{
  "participants": [
    {
      "name": "Jean Dupont",
      "role": "Lead Backend Developer",
      "email": "jean.dupont@company.com",
      "phone": "+33 6 12 34 56 78",
      "company": "TechCorp",
      "department": "Engineering",
      "specialties": ["FastAPI", "PostgreSQL", "Docker"]
    }
  ],
  "project_context": {
    "meeting_date": "2026-03-01",
    "project_name": "Smart Transcription v3",
    "key_topics": ["RAG", "Speaker Identification", "LLM Integration"],
    "objectives": [
      "Implement 3-priority speaker identification",
      "Optimize mean pooling accuracy"
    ]
  },
  "glossary": {
    "RAG": "Retrieval-Augmented Generation - AI technique combining search + LLM",
    "BGE-M3": "BAAI General Embedding Model - Multilingual 1024d embeddings",
    "PyAnnote": "Speaker diarization and voiceprint extraction library"
  },
  "extraction_method": "llm_gpt4o_mini",
  "extraction_confidence": 0.92,
  "extraction_time_seconds": 3.5
}

4.3 Processing Lifecycle

stateDiagram-v2 [*] --> pending: Upload file pending --> processing: Start RAG indexing processing --> indexed: Success (all chunks in Qdrant) processing --> failed: Error (timeout, invalid format) failed --> pending: Retry (max 3) failed --> [*]: Max retries reached indexed --> [*]: Ready for search note right of processing Extract text → LLM metadata extraction → Chunk text → Generate embeddings → Upsert Qdrant end note

5. Modèle Transcript

5.1 Définition SQLAlchemy

class Transcript(Base):
    __tablename__ = "transcripts"

    # Primary Key
    id = Column(String, primary_key=True, index=True)

    # Foreign Key
    user_id = Column(
        String,
        ForeignKey("users.id", ondelete="CASCADE"),
        nullable=False,
        index=True
    )

    # File Information
    filename = Column(String, nullable=False)
    audio_duration_seconds = Column(Float, nullable=True)
    audio_format = Column(String, nullable=True, comment="mp3 | wav | m4a")

    # Storage
    s3_key = Column(String, nullable=False, unique=True)
    s3_bucket = Column(String, nullable=False, default="meetnoo-storage")
    s3_url = Column(String, nullable=True)  # Presigned URL (expires 12h)

    # Processing Status
    status = Column(
        String,
        nullable=False,
        default="pending",
        comment="pending | processing | completed | failed"
    )
    progress_percentage = Column(Integer, nullable=False, default=0)
    current_step = Column(String, nullable=True)
    current_step_message = Column(String, nullable=True)

    # MeetNoo GPU Pipeline
    meetnoo_task_id = Column(String, nullable=True, unique=True)
    meetnoo_status = Column(String, nullable=True)

    # Results Summary
    total_speakers = Column(Integer, nullable=True)
    total_segments = Column(Integer, nullable=True)
    identified_speakers = Column(Integer, nullable=True)
    pending_speakers = Column(Integer, nullable=True)

    # Timestamps
    started_at = Column(DateTime, nullable=True)
    completed_at = Column(DateTime, nullable=True)
    created_at = Column(String, nullable=False)
    updated_at = Column(String, nullable=False)

    # Relationships
    user = relationship("User", back_populates="transcripts")
    enriched_segments = relationship(
        "EnrichedSegment",
        back_populates="transcript",
        cascade="all, delete-orphan"
    )
    contextual_files = relationship(
        "ContextualFile",
        back_populates="transcript",
        cascade="all, delete-orphan"
    )
    voiceprints = relationship(
        "VoiceprintLibrary",
        back_populates="transcript"
    )

5.2 Progress Tracking

# Update progress
def update_transcript_progress(
    db: Session,
    transcript_id: str,
    progress: int,
    step: str,
    message: str
):
    transcript = db.query(Transcript).get(transcript_id)

    transcript.progress_percentage = progress
    transcript.current_step = step
    transcript.current_step_message = message
    transcript.updated_at = unix_timestamp()

    db.commit()

    # SSE event to frontend
    send_sse_event(
        user_id=transcript.user_id,
        event_type="transcript_progress",
        data={
            "transcript_id": transcript_id,
            "progress": progress,
            "step": step,
            "message": message
        }
    )

6. Relations & ERD

6.1 Entity Relationship Diagram

erDiagram USERS ||--o{ TRANSCRIPTS : "owns" USERS ||--o{ VOICEPRINT_LIBRARY : "has" USERS ||--o{ CONTEXTUAL_FILES : "uploads" TRANSCRIPTS ||--o{ ENRICHED_SEGMENTS : "contains" TRANSCRIPTS ||--o{ CONTEXTUAL_FILES : "linked_to" TRANSCRIPTS ||--o{ VOICEPRINT_LIBRARY : "first_seen_in" VOICEPRINT_LIBRARY ||--o{ ENRICHED_SEGMENTS : "identified_by" USERS { string id PK string email string name string role string created_at } TRANSCRIPTS { string id PK string user_id FK string filename float audio_duration_seconds string status int progress_percentage string meetnoo_task_id int total_speakers int identified_speakers string created_at } ENRICHED_SEGMENTS { string id PK string transcript_id FK string voiceprint_lib_id FK string speaker_label string identified_name string match_source float start_time float end_time text transcription jsonb rag_context string created_at } VOICEPRINT_LIBRARY { string id PK string user_id FK string transcript_id FK string identified_name text voiceprint_audio_512d text voiceprint_text_1024d string status string match_source datetime first_seen_at datetime last_seen_at int total_occurrences } CONTEXTUAL_FILES { string id PK string user_id FK string transcript_id FK string filename string file_type string s3_key string processing_status int total_chunks jsonb extraction_metadata string created_at }

6.2 Constraints & Indexes

-- Primary Keys
ALTER TABLE enriched_segments ADD PRIMARY KEY (id);
ALTER TABLE voiceprint_library ADD PRIMARY KEY (id);
ALTER TABLE contextual_files ADD PRIMARY KEY (id);
ALTER TABLE transcripts ADD PRIMARY KEY (id);

-- Foreign Keys with CASCADE
ALTER TABLE enriched_segments
  ADD CONSTRAINT fk_transcript
  FOREIGN KEY (transcript_id) REFERENCES transcripts(id)
  ON DELETE CASCADE;

ALTER TABLE enriched_segments
  ADD CONSTRAINT fk_voiceprint
  FOREIGN KEY (voiceprint_lib_id) REFERENCES voiceprint_library(id)
  ON DELETE SET NULL;

-- Indexes for Performance
CREATE INDEX idx_enriched_segments_transcript ON enriched_segments(transcript_id);
CREATE INDEX idx_enriched_segments_voiceprint ON enriched_segments(voiceprint_lib_id);
CREATE INDEX idx_enriched_segments_speaker_label ON enriched_segments(speaker_label);

CREATE INDEX idx_voiceprint_library_user ON voiceprint_library(user_id);
CREATE INDEX idx_voiceprint_library_status ON voiceprint_library(status);
CREATE INDEX idx_voiceprint_library_name ON voiceprint_library(identified_name);

CREATE INDEX idx_contextual_files_transcript ON contextual_files(transcript_id);
CREATE INDEX idx_contextual_files_user ON contextual_files(user_id);

-- JSONB Indexes (GIN)
CREATE INDEX idx_enriched_segments_rag_context ON enriched_segments USING GIN (rag_context);
CREATE INDEX idx_contextual_files_metadata ON contextual_files USING GIN (extraction_metadata);

7. Exemples Réels

7.1 EnrichedSegment - Kwame Mensah (Identified)

{
  "id": "es-001",
  "transcript_id": "trans-abc123",
  "voiceprint_lib_id": "vp-kwame-001",

  "speaker_label": "SPEAKER_00",
  "identified_name": "Kwame Mensah",
  "match_source": "voiceprint_audio",
  "confidence_score": 1.0,

  "start_time": 12.5,
  "end_time": 25.3,

  "transcription": "bonjour je suis kwame mensah senior diplomat a l'onu",
  "cleaned_transcription": "Bonjour, je suis Kwame Mensah, Senior Diplomat à l'ONU.",

  "rag_context": {
    "role": "Senior Diplomat",
    "email": "kwame.mensah@onu.org",
    "phone": "+1-555-0123",
    "company": "Organisation des Nations Unies (ONU)",
    "department": "Relations Internationales",
    "keywords": ["diplomatie", "évaluation", "politiques publiques"],
    "rag_score": 0.73,
    "rag_source_chunks": ["chunk-001", "chunk-003"]
  },

  "metadata": {
    "language": "fr",
    "duration_seconds": 12.8,
    "word_count": 12
  },

  "created_at": "1741852800",
  "updated_at": "1741852800"
}

7.2 EnrichedSegment - Pending Speaker

{
  "id": "es-002",
  "transcript_id": "trans-abc123",
  "voiceprint_lib_id": "vp-pending-002",

  "speaker_label": "SPEAKER_01",
  "identified_name": null,
  "match_source": "unknown",
  "confidence_score": null,

  "start_time": 28.0,
  "end_time": 35.5,

  "transcription": "j'ai une question sur le rag",
  "cleaned_transcription": "J'ai une question sur le RAG.",

  "rag_context": {
    "potential_speakers": [
      "Dr. Marie Dubois",
      "Jean-Marc Petit (dit \"John\")"
    ],
    "keywords": ["RAG", "backend"],
    "glossary_terms": ["RAG", "BGE-M3"],
    "rag_score": 0.51
  },

  "metadata": {
    "language": "fr",
    "duration_seconds": 7.5,
    "word_count": 7
  },

  "created_at": "1741852800",
  "updated_at": "1741852800"
}

7.3 VoiceprintLibrary - Confirmed

{
  "id": "vp-kwame-001",
  "user_id": "user-123",
  "transcript_id": "trans-abc123",

  "speaker_label": "SPEAKER_00",
  "identified_name": "Kwame Mensah",

  "voiceprint_audio_512d": "[0.123, 0.456, ..., 0.789]",  // 512 floats
  "audio_model": "pyannote-audio",

  "voiceprint_text_1024d": "[0.234, 0.567, ..., 0.890]",  // 1024 floats
  "text_model": "BAAI/bge-m3",

  "status": "confirmed",
  "match_source": "llm_inference",

  "email": "kwame.mensah@onu.org",
  "phone": "+1-555-0123",
  "company": "Organisation des Nations Unies (ONU)",
  "role": "Senior Diplomat",
  "department": "Relations Internationales",

  "additional_metadata": {
    "linkedin": "https://linkedin.com/in/kwame-mensah",
    "languages": ["fr", "en", "es"]
  },

  "first_seen_at": "2026-03-11T10:30:00Z",
  "last_seen_at": "2026-03-11T10:30:00Z",
  "total_occurrences": 1,

  "created_at": "1741852800",
  "updated_at": "1741852800"
}

7.4 ContextualFile - CV

{
  "id": "cf-001",
  "user_id": "user-123",
  "transcript_id": "trans-abc123",

  "filename": "CV_Kwame_Mensah.txt",
  "file_type": "txt",
  "file_size_bytes": 2048,

  "s3_key": "contextual_files/user-123/trans-abc123/CV_Kwame_Mensah.txt",
  "s3_bucket": "meetnoo-storage",

  "processing_status": "indexed",
  "indexed_at": "2026-03-11T10:25:00Z",

  "total_chunks": 3,
  "total_tokens": 1200,

  "extraction_metadata": {
    "participants": [
      {
        "name": "Kwame Mensah",
        "role": "Senior Diplomat",
        "email": "kwame.mensah@onu.org",
        "phone": "+1-555-0123",
        "company": "Organisation des Nations Unies (ONU)",
        "specialties": ["Diplomatie", "Évaluation", "Politiques publiques"]
      }
    ],
    "extraction_method": "llm_gpt4o_mini",
    "extraction_confidence": 0.95,
    "extraction_time_seconds": 3.2
  },

  "error_message": null,
  "retry_count": 0,

  "created_at": "1741852500",
  "updated_at": "1741852600"
}

Navigation: ← RAG Enrichment | Error Handling →