LLM Prompting & Post-Processing¶
Version: 1.0.0
Date: 11 Mars 2026
Statut: Production
Table des Matières¶
- Architecture LLM Service
- Operation 1: Clean Transcription
- Operation 2: Identify Speakers
- Redis Streams Pattern
- Prompt Engineering
- Error Handling & Fallbacks
1. Architecture LLM Service¶
1.1 Service Flow¶
1.2 Two Operations¶
| Operation | Input | LLM Model | Output | Latency |
|---|---|---|---|---|
| clean_transcription | Raw segments (all speakers) | Qwen 2.5-3B | Corrected text (punctuation, capitalization) | 10-20s |
| identify_speakers | Pending segments + RAG context | Qwen 2.5-3B | Speaker identifications + confidence | 10-20s |
1.3 LLM Service Endpoints (MeetNoo)¶
API Contract (from API_MEETNOO_SERVICES.md):
# POST http://localhost:8000/llm/submit
{
"operation": "clean_transcription" | "identify_speakers",
"payload": {
"language": "fr",
"segments": {...},
"context": {...}
}
}
# Response
{
"task_id": "llm-task-abc123",
"status": "pending"
}
# Redis Stream: llm:reply:{task_id}
{
"task_id": "llm-task-abc123",
"status": "completed" | "failed",
"result": {...},
"error": null | "error message"
}
2. Operation 1: Clean Transcription¶
2.1 Purpose¶
Corriger transcription brute Whisper:
- Aucune ponctuation → Ajouter points, virgules
- Minuscules partout → Capitaliser noms propres, début phrases
- Acronymes mal transcrits → "rag" → "RAG"
- Erreurs phonétiques → "meuh tu" → "MeetNoo"
2.2 Prompt Template¶
CLEAN_TRANSCRIPTION_PROMPT = """
Tu es un expert en correction de transcriptions automatiques françaises.
CONTEXTE:
- Transcription automatique d'une réunion professionnelle
- Langue: {language}
- Participants identifiés: {identified_participants}
- Mots-clés du projet: {keywords}
- Glossaire: {glossary}
TRANSCRIPTION BRUTE:
---
{raw_transcription}
---
TÂCHE:
Corriger la transcription en appliquant les règles suivantes:
1. **Ponctuation**
- Ajouter points (.) en fin de phrases
- Ajouter virgules (,) pour séparer propositions
- Ajouter points d'interrogation (?) pour questions
- Ajouter points d'exclamation (!) si approprié
2. **Capitalisation**
- Majuscule au début de chaque phrase
- Majuscules pour noms propres (personnes, entreprises, lieux)
- Majuscules pour acronymes (RAG, API, ONU)
3. **Noms propres**
- Utiliser liste participants pour orthographe exacte
- Exemple: "jean dupont" → "Jean Dupont"
4. **Acronymes et termes techniques**
- Utiliser glossaire si disponible
- Exemple: "rag" → "RAG", "api" → "API"
- Exemple: "meuh tu" → "MeetNoo"
5. **Cohérence temporelle**
- Vérifier conjugaison verbes
- Corriger si nécessaire pour cohérence
RÈGLES CRITIQUES:
- NE PAS modifier le sens de la transcription
- NE PAS ajouter d'informations non présentes
- NE PAS supprimer d'informations
- CONSERVER la structure [Speaker Label]: texte
- CORRIGER uniquement forme, PAS le fond
FORMAT RÉPONSE (JSON strict):
```json
{
"cleaned_transcription": {
"SPEAKER_00": "Texte corrigé du speaker 00...",
"SPEAKER_01": "Texte corrigé du speaker 01...",
"SPEAKER_02": "Texte corrigé du speaker 02..."
},
"corrections_applied": [
"Added punctuation: 12 periods, 8 commas, 3 question marks",
"Fixed capitalization: 15 words",
"Corrected acronyms: RAG (3x), API (2x)",
"Corrected proper nouns: Jean Dupont, Marie Martin"
],
"errors_corrected": [
{
"original": "meuh tu",
"corrected": "MeetNoo",
"type": "phonetic_error"
},
{
"original": "rag",
"corrected": "RAG",
"type": "acronym"
}
]
}
IMPORTANT: Réponds UNIQUEMENT avec le JSON, aucun texte avant ou après.
"""
### 2.3 Implémentation BFF
```python
class LLMPostProcessor:
def __init__(self, redis_client):
self.redis = redis_client
self.timeout = 90 # seconds
async def clean_transcription(
self,
raw_transcription: Dict[str, str],
participants: List[str],
keywords: List[str],
glossary: Dict[str, str],
language: str = "fr"
) -> Dict[str, Any]:
"""
Nettoyer transcription via LLM Qwen 2.5-3B.
Args:
raw_transcription: {"SPEAKER_00": "texte brut...", ...}
participants: ["Jean Dupont", "Marie Martin"]
keywords: ["RAG", "backend", "API"]
glossary: {"RAG": "...", "API": "..."}
language: "fr" | "en"
Returns:
{
"cleaned_transcription": {...},
"corrections_applied": [...],
"errors_corrected": [...]
}
"""
# Format transcription for prompt
formatted_transcription = "\n".join([
f"{speaker}: {text}"
for speaker, text in raw_transcription.items()
])
# Build prompt
prompt = CLEAN_TRANSCRIPTION_PROMPT.format(
language=language,
identified_participants=", ".join(participants),
keywords=", ".join(keywords),
glossary=json.dumps(glossary, ensure_ascii=False, indent=2),
raw_transcription=formatted_transcription
)
# Submit LLM task
payload = {
"language": language,
"prompt": prompt,
"max_tokens": 4000,
"temperature": 0.0 # Deterministic
}
task_id = await self._submit_llm_task(
operation="clean_transcription",
payload=payload
)
logger.info(
f"DEBUG: Clean transcription LLM task submitted - "
f"Task ID: {task_id}"
)
# Poll result
result = await self._poll_llm_result(task_id, timeout=self.timeout)
# Parse JSON response
try:
cleaned_data = json.loads(result["result"])
logger.info(
f"DEBUG: Clean transcription completed - "
f"Corrections: {len(cleaned_data.get('corrections_applied', []))}"
)
return cleaned_data
except json.JSONDecodeError as e:
logger.error(
f"DEBUG: Failed to parse LLM response - Error: {str(e)}"
)
raise ValueError(f"Invalid JSON from LLM: {result['result']}")
async def _submit_llm_task(
self,
operation: str,
payload: Dict[str, Any]
) -> str:
"""Submit task to MeetNoo LLM via Redis."""
task_id = f"llm-task-{generate_uuid()}"
task_data = {
"task_id": task_id,
"operation": operation,
"payload": payload,
"submitted_at": datetime.utcnow().isoformat()
}
# Publish to Redis Stream
await self.redis.xadd(
"llm:tasks",
{"data": json.dumps(task_data)}
)
return task_id
async def _poll_llm_result(
self,
task_id: str,
timeout: int = 90
) -> Dict[str, Any]:
"""Poll LLM result from Redis."""
start_time = time.time()
while time.time() - start_time < timeout:
# Check stream llm:reply:{task_id}
stream_key = f"llm:reply:{task_id}"
messages = await self.redis.xread(
{stream_key: "0"},
count=1,
block=2000 # 2s timeout
)
if messages:
# Parse result
result_data = json.loads(
messages[0][1][0][1][b"data"].decode()
)
if result_data["status"] == "completed":
return result_data
elif result_data["status"] == "failed":
raise Exception(f"LLM task failed: {result_data['error']}")
# Wait before retry
await asyncio.sleep(2)
# Timeout
raise TimeoutError(
f"LLM task {task_id} timed out after {timeout}s"
)
2.4 Exemple Réel¶
Input (Raw Transcription):
{
"SPEAKER_00": "bonjour je suis kwame mensah senior diplomat a l onu je travaille sur l evaluation des politiques publiques",
"SPEAKER_01": "merci kwame j ai une question sur le rag comment ca fonctionne",
"SPEAKER_02": "le rag c est retrieval augmented generation ca permet d enrichir un modele d ia avec des documents externes"
}
Output (Cleaned):
{
"cleaned_transcription": {
"SPEAKER_00": "Bonjour, je suis Kwame Mensah, Senior Diplomat à l'ONU. Je travaille sur l'évaluation des politiques publiques.",
"SPEAKER_01": "Merci Kwame, j'ai une question sur le RAG : comment ça fonctionne ?",
"SPEAKER_02": "Le RAG, c'est Retrieval-Augmented Generation. Ça permet d'enrichir un modèle d'IA avec des documents externes."
},
"corrections_applied": [
"Added punctuation: 5 periods, 6 commas, 1 question mark",
"Fixed capitalization: 10 words (Bonjour, Kwame Mensah, Senior Diplomat, ONU, etc.)",
"Corrected acronyms: RAG (2x), IA (1x)"
],
"errors_corrected": [
{
"original": "l onu",
"corrected": "l'ONU",
"type": "acronym + apostrophe"
},
{
"original": "rag",
"corrected": "RAG",
"type": "acronym"
}
]
}
3. Operation 2: Identify Speakers¶
3.1 Purpose¶
Identifier speakers pending (Priority 3) en analysant:
- Auto-identifications ("Je suis Jean...", "En tant que...")
- Contexte RAG (participants potentiels extraits)
- Cohérence thématique (qui parle de quoi)
- Keywords (matching avec specialties RAG)
3.2 Prompt Template¶
IDENTIFY_SPEAKERS_PROMPT = """
Tu es un expert en analyse de transcriptions pour identifier les participants.
CONTEXTE RAG:
- Participants potentiels: {potential_participants}
- Mots-clés projet: {keywords}
- Glossaire termes techniques: {glossary_terms}
SEGMENTS À IDENTIFIER:
---
{unidentified_segments}
---
TÂCHE:
Identifier chaque SPEAKER_XX en analysant:
1. **Auto-identifications explicites**
- "Je suis [Nom]"
- "En tant que [Rôle]"
- "Je travaille chez [Entreprise]"
2. **Matching avec participants RAG**
- Comparer contenu segments avec contexte participants
- Exemple: "backend developer" → Jean Dupont (Lead Backend Developer)
3. **Cohérence thématique**
- Qui parle de quoi
- Exemple: Speaker qui parle de "diplomatie" → Kwame Mensah (Senior Diplomat)
4. **Keywords matching**
- Comparer interventions avec keywords projet
- Exemple: Parle de "RAG" + "embedding" → Probable ML Engineer
RÈGLES D'IDENTIFICATION:
- Confidence > 0.75 → Identifier avec nom réel
- Confidence < 0.75 → Laisser "Intervenant X" (ne pas deviner)
- NE JAMAIS inventer de noms absents du contexte RAG
CALCUL CONFIDENCE:
- Auto-identification explicite: 0.95
- Match contexte RAG + keywords: 0.80-0.90
- Match contexte RAG uniquement: 0.60-0.75
- Match thématique faible: 0.30-0.50
- Aucun indice: 0.10-0.20
FORMAT RÉPONSE (JSON strict):
```json
{
"speaker_identifications": {
"SPEAKER_00": {
"identified_name": "Kwame Mensah",
"confidence": 0.92,
"reasoning": "Auto-identification 'je suis Kwame Mensah, Senior Diplomat à l'ONU' + match contexte RAG (Senior Diplomat, ONU)",
"evidence": [
"Explicit: 'je suis Kwame Mensah'",
"Role match: 'Senior Diplomat'",
"Company match: 'ONU'"
]
},
"SPEAKER_01": {
"identified_name": "Intervenant 1",
"confidence": 0.35,
"reasoning": "Intervention courte, aucune auto-identification, aucun match RAG fort",
"evidence": [
"Thematic: mentioned 'RAG' (weak match avec ML Engineer)",
"Too little data for confident identification"
]
}
},
"summary": {
"total_speakers": 2,
"identified": 1,
"pending": 1,
"avg_confidence": 0.635
}
}
IMPORTANT: Réponds UNIQUEMENT avec le JSON, aucun texte avant ou après.
"""
### 3.3 Implémentation BFF
```python
class LLMPostProcessor:
async def identify_speakers(
self,
unidentified_segments: Dict[str, str],
potential_participants: List[str],
keywords: List[str],
glossary_terms: List[str],
language: str = "fr"
) -> Dict[str, Any]:
"""
Identifier speakers via LLM Qwen 2.5-3B.
Args:
unidentified_segments: {"SPEAKER_00": "transcription...", ...}
potential_participants: ["Jean Dupont", "Marie Martin"]
keywords: ["RAG", "backend"]
glossary_terms: ["RAG", "BGE-M3"]
language: "fr"
Returns:
{
"speaker_identifications": {...},
"summary": {...}
}
"""
# Format segments for prompt
formatted_segments = "\n\n".join([
f"[{speaker}]:\n{text}"
for speaker, text in unidentified_segments.items()
])
# Build prompt
prompt = IDENTIFY_SPEAKERS_PROMPT.format(
potential_participants=", ".join(potential_participants),
keywords=", ".join(keywords),
glossary_terms=", ".join(glossary_terms),
unidentified_segments=formatted_segments
)
# Submit LLM task
payload = {
"language": language,
"prompt": prompt,
"max_tokens": 2000,
"temperature": 0.0
}
task_id = await self._submit_llm_task(
operation="identify_speakers",
payload=payload
)
logger.info(
f"DEBUG: Identify speakers LLM task submitted - "
f"Task ID: {task_id}, Speakers: {len(unidentified_segments)}"
)
# Poll result
result = await self._poll_llm_result(task_id, timeout=self.timeout)
# Parse JSON response
try:
identifications = json.loads(result["result"])
logger.info(
f"DEBUG: Speaker identification completed - "
f"Identified: {identifications['summary']['identified']}/{identifications['summary']['total_speakers']}"
)
return identifications
except json.JSONDecodeError as e:
logger.error(
f"DEBUG: Failed to parse LLM response - Error: {str(e)}"
)
raise ValueError(f"Invalid JSON from LLM: {result['result']}")
3.4 Post-Processing: Confirm Voiceprints¶
async def process_llm_identifications(
self,
identifications: Dict[str, Any],
pending_voiceprints: Dict[str, str], # {speaker_label: voiceprint_lib_id}
db: Session
):
"""
Confirmer voiceprints pending si confidence > 0.75.
"""
for speaker_label, data in identifications["speaker_identifications"].items():
confidence = data["confidence"]
identified_name = data["identified_name"]
# Skip "Intervenant X" (low confidence)
if "Intervenant" in identified_name:
logger.info(
f"DEBUG: Skipping confirmation for {speaker_label} - "
f"Low confidence ({confidence:.2f})"
)
continue
# Confirm if confidence > 0.75
if confidence >= 0.75:
voiceprint_lib_id = pending_voiceprints.get(speaker_label)
if voiceprint_lib_id:
# Update voiceprint status
voiceprint = db.query(VoiceprintLibrary).filter(
VoiceprintLibrary.id == voiceprint_lib_id
).first()
if voiceprint:
voiceprint.status = "confirmed"
voiceprint.identified_name = identified_name
voiceprint.match_source = "llm_inference"
voiceprint.updated_at = unix_timestamp()
db.commit()
logger.info(
f"DEBUG: Confirmed voiceprint {voiceprint_lib_id} - "
f"Name: {identified_name}, Confidence: {confidence:.2f}"
)
3.5 Exemple Réel¶
Input:
{
"unidentified_segments": {
"SPEAKER_00": "Bonjour, je suis Kwame Mensah, Senior Diplomat à l'ONU. Je travaille sur l'évaluation des politiques publiques depuis 15 ans.",
"SPEAKER_01": "Merci. J'ai une question rapide.",
"SPEAKER_02": "Le backend est développé avec FastAPI, on utilise PostgreSQL pour la base de données."
},
"potential_participants": [
"Kwame Mensah",
"Dr. Marie Dubois",
"Jean-Marc Petit (dit \"John\")"
],
"keywords": ["RAG", "backend", "FastAPI", "PostgreSQL"],
"glossary_terms": ["RAG", "BGE-M3", "Qdrant"]
}
Output:
{
"speaker_identifications": {
"SPEAKER_00": {
"identified_name": "Kwame Mensah",
"confidence": 0.98,
"reasoning": "Auto-identification explicite 'je suis Kwame Mensah, Senior Diplomat à l'ONU' + match parfait avec participant RAG",
"evidence": [
"Explicit: 'je suis Kwame Mensah'",
"Role match: 'Senior Diplomat'",
"Company match: 'ONU'",
"Context match: 'évaluation des politiques publiques'"
]
},
"SPEAKER_01": {
"identified_name": "Intervenant 1",
"confidence": 0.15,
"reasoning": "Intervention trop courte (6 mots), aucune information distinctive",
"evidence": [
"No auto-identification",
"No thematic clues",
"Insufficient data"
]
},
"SPEAKER_02": {
"identified_name": "Jean-Marc Petit (dit \"John\")",
"confidence": 0.82,
"reasoning": "Parle de backend, FastAPI, PostgreSQL → Match keywords + probable backend developer → Jean-Marc Petit (Lead Backend)",
"evidence": [
"Keyword match: 'backend' (3x in participant context)",
"Keyword match: 'FastAPI' (2x in participant context)",
"Keyword match: 'PostgreSQL' (1x in participant context)",
"Thematic coherence: backend development"
]
}
},
"summary": {
"total_speakers": 3,
"identified": 2,
"pending": 1,
"avg_confidence": 0.65
}
}
Voiceprint Confirmation:
SPEAKER_00 (confidence=0.98) → Confirm voiceprint as "Kwame Mensah" (OK)
SPEAKER_01 (confidence=0.15) → Keep pending as "Intervenant 1" ⏸️
SPEAKER_02 (confidence=0.82) → Confirm voiceprint as "Jean-Marc Petit" (OK)
4. Redis Streams Pattern¶
4.1 Stream Architecture¶
4.2 Stream Keys¶
| Stream | Producer | Consumer | TTL | Purpose |
|---|---|---|---|---|
| llm:tasks | BFF | MeetNoo | 1h | Queue LLM tasks |
| llm:reply:{task_id} | MeetNoo | BFF | 10min | Task results |
4.3 Redis Client Configuration¶
# BFF: src/services/redis_service.py
import redis.asyncio as aioredis
class RedisService:
def __init__(self):
self.client = None
async def connect(self):
self.client = await aioredis.from_url(
"redis://localhost:6379/0",
encoding="utf-8",
decode_responses=True,
max_connections=50,
socket_connect_timeout=5,
socket_keepalive=True
)
logger.info("DEBUG: Redis client connected")
async def xadd(
self,
stream: str,
data: Dict[str, str],
maxlen: int = 1000
) -> str:
"""Add to stream with max length."""
message_id = await self.client.xadd(
stream,
data,
maxlen=maxlen,
approximate=True
)
return message_id
async def xread(
self,
streams: Dict[str, str],
count: int = 1,
block: int = 2000
) -> List[Any]:
"""Read from streams (blocking)."""
messages = await self.client.xread(
streams,
count=count,
block=block
)
return messages
5. Prompt Engineering¶
5.1 Best Practices¶
1. Temperature = 0.0 (Deterministic)
2. JSON Output Enforcement
# Forcer JSON strict
prompt += "\n\nIMPORTANT: Réponds UNIQUEMENT avec le JSON, aucun texte avant ou après."
# Validation
try:
result = json.loads(llm_response)
except json.JSONDecodeError:
# Retry with explicit JSON instruction
pass
3. Few-Shot Examples (Future)
# Add examples to prompt for better accuracy
EXAMPLES = """
EXEMPLE 1:
Input: "bonjour je suis jean dupont"
Output: "Bonjour, je suis Jean Dupont."
EXEMPLE 2:
Input: "le rag c est retrieval augmented generation"
Output: "Le RAG, c'est Retrieval-Augmented Generation."
"""
4. Context Relevance
# Ne pas surcharger contexte
# MAX:
# - 10 participants potentiels
# - 20 keywords
# - 10 glossary terms
# Si plus → Filtrer par pertinence RAG score
5.2 Prompt Templates Storage¶
# src/prompts/llm_prompts.py
class LLMPrompts:
# Version control
VERSION = "1.0.0"
# Clean transcription
CLEAN_TRANSCRIPTION_V1 = """..."""
# Identify speakers
IDENTIFY_SPEAKERS_V1 = """..."""
@classmethod
def get_prompt(cls, operation: str, version: str = "v1") -> str:
"""Get prompt by operation and version."""
prompts = {
"clean_transcription": {
"v1": cls.CLEAN_TRANSCRIPTION_V1
},
"identify_speakers": {
"v1": cls.IDENTIFY_SPEAKERS_V1
}
}
return prompts[operation][version]
6. Error Handling & Fallbacks¶
6.1 Timeout Handling¶
class LLMPostProcessor:
async def safe_llm_call(
self,
operation: str,
payload: Dict[str, Any],
fallback_func: Optional[Callable] = None
) -> Dict[str, Any]:
"""
LLM call avec timeout et fallback.
"""
try:
# Submit task
task_id = await self._submit_llm_task(operation, payload)
# Poll with timeout
result = await self._poll_llm_result(
task_id,
timeout=90
)
return result
except TimeoutError:
logger.error(
f"DEBUG: LLM timeout after 90s - Operation: {operation}"
)
# Fallback if provided
if fallback_func:
logger.info("DEBUG: Using fallback function")
return await fallback_func(payload)
# No fallback: Re-raise
raise
6.2 Fallback: Regex Cleaning¶
def regex_cleaning_fallback(raw_transcription: Dict[str, str]) -> Dict[str, str]:
"""
Fallback cleaning si LLM timeout.
Simple rules:
- Capitalize first letter of each sentence
- Add period at end if missing
- Uppercase common acronyms (RAG, API, ONU)
"""
cleaned = {}
for speaker, text in raw_transcription.items():
# Capitalize first letter
text = text[0].upper() + text[1:] if text else text
# Add period if missing
if text and not text.endswith((".", "!", "?")):
text += "."
# Uppercase acronyms
for acronym in ["rag", "api", "onu", "gpu", "cpu"]:
text = re.sub(
r"\b" + acronym + r"\b",
acronym.upper(),
text,
flags=re.IGNORECASE
)
cleaned[speaker] = text
return {
"cleaned_transcription": cleaned,
"corrections_applied": ["Basic regex cleaning (LLM fallback)"],
"errors_corrected": []
}
6.3 Graceful Degradation: Keep Pending¶
def keep_speakers_pending(
speaker_labels: List[str]
) -> Dict[str, Any]:
"""
Fallback si LLM identification timeout.
Keep all speakers as "Intervenant X".
"""
identifications = {}
for i, label in enumerate(speaker_labels):
identifications[label] = {
"identified_name": f"Intervenant {i}",
"confidence": 0.0,
"reasoning": "LLM timeout - kept as pending",
"evidence": []
}
return {
"speaker_identifications": identifications,
"summary": {
"total_speakers": len(speaker_labels),
"identified": 0,
"pending": len(speaker_labels),
"avg_confidence": 0.0
}
}
Navigation: ← RAG Enrichment | Data Models →