Pipeline de Transcription - Workflow Complet¶
Version: 1.0.0
Date: 11 Mars 2026
Statut: Production
Table des Matières¶
- Vue d'Ensemble
- Workflow Complet (Diagramme)
- Phase 1: Upload & Indexation RAG
- Phase 2: GPU Transcription (MeetNoo)
- Phase 3: Post-Processing BFF
- Phase 4: Finalisation
- Gestion du Progress
- Logs de Production
1. Vue d'Ensemble¶
1.1 Flow Simplifié¶
graph TB
A["1. Upload Audio + Documents"] --> B["2. Indexation RAG (Qdrant)"]
B --> C["3. GPU Transcription (MeetNoo)
Diarization + Whisper + Voiceprint Extraction"] C --> D["4. Post-Processing BFF"] D --> D1["Priority 1: Voiceprint Matching"] D --> D2["Priority 2: RAG Enrichment"] D --> D3["Priority 3: LLM Processing"] D1 --> E["5. Segments Enrichis Sauvegardés"] D2 --> E D3 --> E style A fill:#e0f2fe,stroke:#0284c7,stroke-width:2px style B fill:#fef3c7,stroke:#f59e0b,stroke-width:2px style C fill:#ffedd5,stroke:#f97316,stroke-width:2px style D fill:#e0f2fe,stroke:#0284c7,stroke-width:2px style D1 fill:#ddd6fe,stroke:#7c3aed,stroke-width:2px style D2 fill:#ddd6fe,stroke:#7c3aed,stroke-width:2px style D3 fill:#ddd6fe,stroke:#7c3aed,stroke-width:2px style E fill:#d1fae5,stroke:#10b981,stroke-width:2px
Diarization + Whisper + Voiceprint Extraction"] C --> D["4. Post-Processing BFF"] D --> D1["Priority 1: Voiceprint Matching"] D --> D2["Priority 2: RAG Enrichment"] D --> D3["Priority 3: LLM Processing"] D1 --> E["5. Segments Enrichis Sauvegardés"] D2 --> E D3 --> E style A fill:#e0f2fe,stroke:#0284c7,stroke-width:2px style B fill:#fef3c7,stroke:#f59e0b,stroke-width:2px style C fill:#ffedd5,stroke:#f97316,stroke-width:2px style D fill:#e0f2fe,stroke:#0284c7,stroke-width:2px style D1 fill:#ddd6fe,stroke:#7c3aed,stroke-width:2px style D2 fill:#ddd6fe,stroke:#7c3aed,stroke-width:2px style D3 fill:#ddd6fe,stroke:#7c3aed,stroke-width:2px style E fill:#d1fae5,stroke:#10b981,stroke-width:2px
Durée totale: ~20 minutes pour 1h audio
| Phase | Durée | % Total |
|---|---|---|
| Upload + RAG indexing | 12s | 1% |
| GPU Transcription | 18min | 90% |
| Post-Processing BFF | 50s | 5% |
| Finalisation | 30s | 2.5% |
| Total | 20min | 100% |
1.2 Responsabilités¶
graph TB
subgraph BFF["Smart Transcription BFF"]
B1["Upload audio to S3"]
B2["Index contextual files → Qdrant"]
B3["Trigger MeetNoo pipeline"]
B4["Listen Redis Streams (progress)"]
B5["Post-process results:"]
B5A["- Voiceprint matching"]
B5B["- RAG enrichment"]
B5C["- LLM cleaning + identification"]
B6["Save enriched segments to DB"]
end
subgraph GPU["MeetNoo GPU Services"]
G1["Diarization (PyAnnote AI)"]
G2["Transcription (Whisper large-v3)"]
G3["Voiceprint extraction (PyAnnote 512d)"]
G4["LLM inference (Qwen 2.5-3B)"]
G5["Publish events to Redis Streams"]
end
BFF -->|HTTP + Redis Streams| GPU
style BFF fill:#e0f2fe,stroke:#0284c7,stroke-width:3px
style GPU fill:#ffedd5,stroke:#f97316,stroke-width:3px
2. Workflow Complet (Diagramme)¶
sequenceDiagram
participant User
participant Frontend
participant BFF as Smart Trans BFF
participant S3
participant Qdrant
participant Redis
participant MeetNoo as MeetNoo GPU
participant PG as PostgreSQL
Note over User,PG: PHASE 1: Upload & RAG Indexing (12s)
User->>Frontend: Upload audio + 4 PDF docs
Frontend->>BFF: POST /api/transcripts/create-with-rag
{audio, title, language, contextual_files[]} BFF->>PG: INSERT transcript (status=pending) BFF->>S3: Upload audio.mp3 S3-->>BFF: s3://bucket/audio/uuid.mp3 BFF->>Qdrant: Create collection
user_{userId}_transcript_{id} loop For each contextual file BFF->>BFF: Extract text (pdfplumber) BFF->>BFF: LLM metadata extraction
(GPT-4o-mini) BFF->>BFF: Semantic chunking
(LlamaIndex) BFF->>BFF: Generate embeddings
(BGE-M3 1024d) BFF->>Qdrant: Upsert chunks with metadata end BFF->>PG: UPDATE transcript
(qdrant_collection_name) BFF-->>Frontend: 202 Accepted
{transcript_id, status:processing} Frontend-->>User: "Transcription en cours..." Note over User,PG: PHASE 2: GPU Transcription (18min) BFF->>MeetNoo: POST /api/v1/pipeline/start
{tenant_id, file_url:s3_key} MeetNoo-->>BFF: 202 {transcription_id} MeetNoo->>Redis: XADD pipeline:events
{stage:preprocess, status:started} MeetNoo->>S3: Download audio MeetNoo->>MeetNoo: Convert + chunk audio MeetNoo->>Redis: XADD {stage:preprocess, status:completed, progress:15} MeetNoo->>MeetNoo: Diarization (PyAnnote)
Detect speakers MeetNoo->>Redis: XADD {stage:diarize, status:completed, progress:30} MeetNoo->>MeetNoo: Transcription (Whisper)
Speech-to-text MeetNoo->>Redis: XADD {stage:transcribe, status:completed, progress:60} MeetNoo->>MeetNoo: Voiceprint extraction
(PyAnnote 512d) MeetNoo->>Redis: XADD {stage:voiceprint, status:completed, progress:80} MeetNoo->>MeetNoo: Clustering speakers MeetNoo->>Redis: XADD {stage:cluster, status:completed, progress:95} MeetNoo->>PG: Save segments + speakers
(meetnoo.* schema) MeetNoo->>Redis: XADD {stage:pipeline, status:completed, progress:100} BFF->>Redis: XREADGROUP smart-trans-group
pipeline:events Redis-->>BFF: {txn_id, status:completed} Note over User,PG: PHASE 3: Post-Processing BFF (50s) BFF->>MeetNoo: GET /api/v1/pipeline/{id}/result MeetNoo-->>BFF: {segments[], speakers[], voiceprints[]} BFF->>BFF: Update progress
"Identification des participants..." loop For each speaker (6 speakers) BFF->>BFF: Priority 1: Voiceprint Matching
Cosine similarity > 0.85? alt Match found (95%) BFF->>PG: Fetch voiceprint metadata
(email, phone, company) BFF->>Qdrant: RAG search (enrichment)
Filter: all_participants.name BFF->>BFF: Merge enrichment metadata else No match (5%) BFF->>PG: Auto-save pending voiceprint
(status=pending) BFF->>Qdrant: RAG search (extraction)
No filter, get context BFF->>BFF: Extract potential speakers
for LLM end end BFF->>BFF: Aggregate RAG context
for all pending speakers BFF->>MeetNoo: POST /api/v1/llm/submit
Operation: clean_transcription MeetNoo->>Redis: XADD llm:reply:{request_id}
{status:completed, result} BFF->>Redis: XREAD llm:reply:{request_id} Redis-->>BFF: {clean_transcription} BFF->>MeetNoo: POST /api/v1/llm/submit
Operation: identify_speakers MeetNoo->>Redis: XADD llm:reply:{request_id2} BFF->>Redis: XREAD llm:reply:{request_id2} Redis-->>BFF: {speaker_identifications} loop For each identified speaker BFF->>PG: UPDATE voiceprint_library
(status=confirmed, identified_name) end Note over User,PG: PHASE 4: Finalisation (30s) BFF->>PG: INSERT enriched_segments (33 segments)
with all metadata BFF->>PG: UPDATE transcript (status=completed) BFF-->>Frontend: SSE Event: transcript.completed Frontend->>BFF: GET /api/transcripts/{id}/segments BFF-->>Frontend: [33 enriched segments] Frontend-->>User: "Transcription terminée!
6 speakers identifiés"
{audio, title, language, contextual_files[]} BFF->>PG: INSERT transcript (status=pending) BFF->>S3: Upload audio.mp3 S3-->>BFF: s3://bucket/audio/uuid.mp3 BFF->>Qdrant: Create collection
user_{userId}_transcript_{id} loop For each contextual file BFF->>BFF: Extract text (pdfplumber) BFF->>BFF: LLM metadata extraction
(GPT-4o-mini) BFF->>BFF: Semantic chunking
(LlamaIndex) BFF->>BFF: Generate embeddings
(BGE-M3 1024d) BFF->>Qdrant: Upsert chunks with metadata end BFF->>PG: UPDATE transcript
(qdrant_collection_name) BFF-->>Frontend: 202 Accepted
{transcript_id, status:processing} Frontend-->>User: "Transcription en cours..." Note over User,PG: PHASE 2: GPU Transcription (18min) BFF->>MeetNoo: POST /api/v1/pipeline/start
{tenant_id, file_url:s3_key} MeetNoo-->>BFF: 202 {transcription_id} MeetNoo->>Redis: XADD pipeline:events
{stage:preprocess, status:started} MeetNoo->>S3: Download audio MeetNoo->>MeetNoo: Convert + chunk audio MeetNoo->>Redis: XADD {stage:preprocess, status:completed, progress:15} MeetNoo->>MeetNoo: Diarization (PyAnnote)
Detect speakers MeetNoo->>Redis: XADD {stage:diarize, status:completed, progress:30} MeetNoo->>MeetNoo: Transcription (Whisper)
Speech-to-text MeetNoo->>Redis: XADD {stage:transcribe, status:completed, progress:60} MeetNoo->>MeetNoo: Voiceprint extraction
(PyAnnote 512d) MeetNoo->>Redis: XADD {stage:voiceprint, status:completed, progress:80} MeetNoo->>MeetNoo: Clustering speakers MeetNoo->>Redis: XADD {stage:cluster, status:completed, progress:95} MeetNoo->>PG: Save segments + speakers
(meetnoo.* schema) MeetNoo->>Redis: XADD {stage:pipeline, status:completed, progress:100} BFF->>Redis: XREADGROUP smart-trans-group
pipeline:events Redis-->>BFF: {txn_id, status:completed} Note over User,PG: PHASE 3: Post-Processing BFF (50s) BFF->>MeetNoo: GET /api/v1/pipeline/{id}/result MeetNoo-->>BFF: {segments[], speakers[], voiceprints[]} BFF->>BFF: Update progress
"Identification des participants..." loop For each speaker (6 speakers) BFF->>BFF: Priority 1: Voiceprint Matching
Cosine similarity > 0.85? alt Match found (95%) BFF->>PG: Fetch voiceprint metadata
(email, phone, company) BFF->>Qdrant: RAG search (enrichment)
Filter: all_participants.name BFF->>BFF: Merge enrichment metadata else No match (5%) BFF->>PG: Auto-save pending voiceprint
(status=pending) BFF->>Qdrant: RAG search (extraction)
No filter, get context BFF->>BFF: Extract potential speakers
for LLM end end BFF->>BFF: Aggregate RAG context
for all pending speakers BFF->>MeetNoo: POST /api/v1/llm/submit
Operation: clean_transcription MeetNoo->>Redis: XADD llm:reply:{request_id}
{status:completed, result} BFF->>Redis: XREAD llm:reply:{request_id} Redis-->>BFF: {clean_transcription} BFF->>MeetNoo: POST /api/v1/llm/submit
Operation: identify_speakers MeetNoo->>Redis: XADD llm:reply:{request_id2} BFF->>Redis: XREAD llm:reply:{request_id2} Redis-->>BFF: {speaker_identifications} loop For each identified speaker BFF->>PG: UPDATE voiceprint_library
(status=confirmed, identified_name) end Note over User,PG: PHASE 4: Finalisation (30s) BFF->>PG: INSERT enriched_segments (33 segments)
with all metadata BFF->>PG: UPDATE transcript (status=completed) BFF-->>Frontend: SSE Event: transcript.completed Frontend->>BFF: GET /api/transcripts/{id}/segments BFF-->>Frontend: [33 enriched segments] Frontend-->>User: "Transcription terminée!
6 speakers identifiés"
3. Phase 1: Upload & Indexation RAG¶
3.1 Endpoint API¶
POST /api/transcripts/create-with-rag
Content-Type: multipart/form-data
Authorization: Bearer {jwt_token}
Form Fields:
- audio_file: File (required) - Audio file (MP3, WAV, M4A, FLAC, OGG, MP4, MKV)
- title: string (optional) - Transcription title (ex: "Réunion Q1 2026")
- language: string (optional, default: "fr") - Language code (fr, en, es, etc.)
- contextual_files[]: File[] (optional) - Contextual files for RAG (PDF, DOCX, TXT)
Example:
audio_file: reunion.mp3 (4.8MB)
title: "Panel Citoyen - Mars 2026"
language: "fr"
contextual_files[]:
- CV_Jean.txt
- CV_Marie.txt
- glossaire.txt
- organigramme.pdf
3.2 Workflow Détaillé¶
Étape 1.1: Validation & Upload Audio
# 1. Validate audio format
allowed_formats = ['.mp3', '.wav', '.m4a', '.flac', '.ogg', '.mp4', '.mkv']
if not audio_file.filename.endswith(tuple(allowed_formats)):
raise HTTPException(400, "Unsupported audio format")
# 2. Upload to S3
s3_key = f"users/{user_id}/audio/{transcript_id}/{filename}"
s3_url = await s3_client.upload_file(audio_file, s3_key)
# 3. Create transcript record
transcript = Transcript(
id=transcript_id,
user_id=user_id,
title="Réunion Panel Citoyen",
status="processing",
audio_url=s3_url,
created_at=unix_timestamp()
)
db.add(transcript)
db.commit()
Étape 1.2: Création Collection Qdrant
# Collection naming: isolation par user + transcript
collection_name = f"user_{user_id}_transcript_{transcript_id}"
# Create with BGE-M3 dimensions
await qdrant_service.create_collection(
collection_name=collection_name,
vector_size=1024,
distance=Distance.COSINE
)
# Save collection reference
transcript.qdrant_collection_name = collection_name
db.commit()
Étape 1.3: Processing Contextual Files
Pour chaque fichier :
# 1. Extract text
text = await text_extraction_service.extract(file, file_type="pdf")
# Output: "Jean Dupont - Lead Developer\nEmail: jean@company.com\n..."
# 2. LLM Metadata Extraction (AVANT chunking)
metadata = await llm_metadata_extractor.extract(text, filename)
# Output: ParticipantMetadata(
# name="Jean Dupont",
# role="Lead Developer",
# email="jean@company.com",
# company="TechCorp"
# )
# 3. Semantic Chunking
chunks = await semantic_chunking_service.chunk(
text=text,
metadata=metadata,
max_chunk_size=2000
)
# Output: [
# {"text": "Jean Dupont - Lead...", "chunk_index": 0, "metadata": {...}},
# {"text": "Responsibilities: ...", "chunk_index": 1, "metadata": {...}}
# ]
# 4. Generate Embeddings
chunks_with_embeddings = await embedding_service.encode_batch(chunks)
# Adds: "embedding": [0.023, -0.156, ..., 0.012] (1024d)
# 5. Analyze Chunks (per-chunk metadata)
for chunk in chunks_with_embeddings:
chunk["mentioned_participants"] = extract_mentioned(chunk["text"], metadata)
chunk["chunk_type"] = classify_chunk_type(chunk["text"], metadata)
chunk["keyword_associations"] = extract_keywords(chunk["text"])
# 6. Upsert to Qdrant
await qdrant_service.upsert_chunks(
collection_name=collection_name,
chunks=chunks_with_embeddings
)
Logs de Production:
INFO: Processing file 1/4: CV_Jean.txt
INFO: Extracted 1017 characters from TXT file (encoding: utf-8)
INFO: Cache HIT for key: llm_metadata:gpt-4o-mini:a23a260101...
INFO: Document-level extraction complete - method=llm, participants=1, confidence=0.95
INFO: Chunked text into 1 semantic chunks
INFO: Intelligent chunking complete - 1 semantic chunks → 1 final chunks
INFO: Generated 1 embeddings in 0.21s (4.8 texts/sec)
INFO: Chunk 1/1 created - Type: person_specific, Mentioned: ['Jean Dupont']
INFO: Upserted 1 points to collection user_00..._transcript_10f2...
INFO: File 'CV_Jean.txt' processed successfully
3.3 Payload Qdrant (Structure)¶
{
"id": 0,
"vector": [0.0234, -0.0156, ..., 0.0012],
"payload": {
"text": "Jean Dupont - Lead Backend Developer\nEmail: jean.dupont@techcorp.com...",
"file_id": "file-uuid",
"filename": "CV_Jean.txt",
"file_type": "txt",
"chunk_index": 0,
"total_chunks": 1,
"start_char": 0,
"end_char": 1017,
"participants": [
{
"name": "Jean Dupont",
"role": "Lead Backend Developer",
"email": "jean.dupont@techcorp.com",
"phone": "+33 6 12 34 56 78",
"company": "TechCorp SAS",
"department": "Engineering",
"specialties": "Python, FastAPI, PostgreSQL",
"responsibilities": "Backend architecture, API design"
}
],
"all_participants": ["Jean Dupont"],
"mentioned_participants": ["Jean Dupont"],
"project_context": {
"meeting_date": null,
"project_name": null,
"key_topics": ["backend", "architecture", "API"],
"objectives": null
},
"glossary": {
"RAG": "Retrieval Augmented Generation",
"FastAPI": "Python web framework for building APIs"
},
"keyword_associations": {
"Jean Dupont": ["backend", "API", "architecture"]
},
"chunk_type": "person_specific",
"has_participant_data": true,
"confidence": 0.95,
"source": "contextual_file",
"extraction_method": "llm_gpt4o_mini",
"indexed_at": "2026-03-11T10:30:00Z"
}
}
4. Phase 2: GPU Transcription (MeetNoo)¶
4.1 Déclenchement Pipeline¶
# BFF calls MeetNoo
response = await http_client.post(
f"{MEETNOO_SERVICES_URL}/api/v1/pipeline/start",
headers={"X-Pipeline-Key": PIPELINE_API_KEY},
json={
"tenant_id": user_id,
"file_url": s3_key, # or full s3:// URL
"project_id": transcript_id,
"config": {
"chunk_duration_seconds": 900,
"similarity_threshold": 0.75,
"transcription_model": "large-v3",
"llm_model": "Qwen/Qwen2.5-3B-Instruct"
}
},
timeout=10
)
meetnoo_transcript_id = response.json()["transcription_id"]
# Save mapping
transcript.meetnoo_transcript_id = meetnoo_transcript_id
db.commit()
4.2 Stages Pipeline MeetNoo¶
| Stage | Description | Durée | Output |
|---|---|---|---|
| preprocess | Download audio, convert, chunk | 30s | WAV chunks in S3 |
| diarize | PyAnnote speaker diarization | 5min | Speaker boundaries |
| transcribe | Whisper large-v3 transcription | 10min | Raw text segments |
| voiceprint | PyAnnote embedding extraction | 2min | 512d voiceprints |
| cluster | Speaker clustering | 30s | Speaker labels consolidated |
| finalize | Save to PostgreSQL (meetnoo.*) | 30s | Segments + speakers + voiceprints |
Total: ~18 minutes
4.3 Redis Streams Events¶
# BFF consumes events in background task
async def consume_pipeline_events():
while True:
messages = redis_client.xreadgroup(
groupname="smart-trans-group",
consumername="consumer-1",
streams={"pipeline:events": ">"},
count=10,
block=5000
)
for stream, messages_list in messages:
for message_id, data in messages_list:
txn_id = data[b'txn_id'].decode()
stage = data[b'stage'].decode()
status = data[b'status'].decode()
progress = int(data[b'progress'].decode())
# Map meetnoo_transcript_id to our transcript_id
transcript = db.query(Transcript).filter(
Transcript.meetnoo_transcript_id == txn_id
).first()
if not transcript:
continue
# Update progress in DB
await update_transcript_progress(
db,
transcript.id,
progress,
f"gpu_{stage}",
f"GPU Processing: {stage}",
status=status
)
# Trigger post-processing when completed
if stage == "pipeline" and status == "completed":
await trigger_post_processing(transcript.id)
# ACK message
redis_client.xack("pipeline:events", "smart-trans-group", message_id)
Events Sequence:
1. {txn_id, stage:pipeline, status:started, progress:0}
2. {txn_id, stage:preprocess, status:completed, progress:15}
3. {txn_id, stage:diarize, status:completed, progress:30}
4. {txn_id, stage:transcribe, status:completed, progress:60}
5. {txn_id, stage:voiceprint, status:completed, progress:80}
6. {txn_id, stage:cluster, status:completed, progress:95}
7. {txn_id, stage:pipeline, status:completed, progress:100}
4.4 Résultat Pipeline¶
Response:
{
"transcription_id": "uuid",
"status": "completed",
"duration_ms": 3600000,
"language": "fr",
"transcription_time_ms": 1130437,
"segments": [
{
"start": 9.87,
"end": 15.59,
"text": "C'est une opération inédite pour évaluer la stratégie...",
"speaker": "SPEAKER_00"
},
{
"start": 15.67,
"end": 22.83,
"text": "France Stratégie a souhaité associer au comité...",
"speaker": "SPEAKER_00"
}
],
"speakers": [
{
"label": "SPEAKER_00",
"total_time": 245.5,
"word_count": 1521,
"voiceprint_512d": [0.123, 0.456, ..., 0.789]
},
{
"label": "SPEAKER_01",
"total_time": 180.3,
"word_count": 1102,
"voiceprint_512d": [0.234, 0.567, ..., 0.891]
}
]
}
5. Phase 3: Post-Processing BFF¶
5.1 Orchestration¶
async def post_process_transcript(transcript_id: str, db: Session):
"""
Post-processing après GPU transcription.
Workflow:
1. Fetch results from MeetNoo
2. For each speaker:
- Priority 1: Voiceprint matching
- Priority 2: RAG enrichment/extraction
- Priority 3: LLM processing
3. Save enriched segments
"""
# Fetch MeetNoo results
meetnoo_results = await fetch_meetnoo_results(transcript.meetnoo_transcript_id)
# Initialize speaker identification
voiceprint_results = {}
rag_context_global = {
"all_participants": [],
"keywords": [],
"glossary_terms": []
}
# Group segments by speaker
segments_by_speaker = group_by_speaker(meetnoo_results["segments"])
# Process each speaker
for speaker_label, speaker_segments in segments_by_speaker.items():
speaker_voiceprint = get_speaker_voiceprint(
meetnoo_results["speakers"],
speaker_label
)
# PRIORITY 1: Voiceprint Matching
match_result = await voiceprint_matcher.match_speaker(
voiceprint_audio_512d=speaker_voiceprint,
user_id=transcript.user_id,
db=db
)
if match_result:
# Match found
voiceprint_results[speaker_label] = match_result
is_identified = True
identified_name = match_result["identified_name"]
else:
# No match - auto-save pending
voiceprint_lib_id = await auto_save_pending_voiceprint(
speaker_label=speaker_label,
voiceprint_512d=speaker_voiceprint,
user_id=transcript.user_id,
transcript_id=transcript_id,
db=db
)
is_identified = False
identified_name = f"Intervenant {speaker_label.split('_')[1]}"
# PRIORITY 2: RAG Enrichment/Extraction
rag_results = await rag_service.process_speaker(
speaker_segments=speaker_segments,
collection_name=transcript.qdrant_collection_name,
is_identified=is_identified,
identified_name=identified_name if is_identified else None
)
if is_identified:
# Enrich metadata
voiceprint_results[speaker_label].update(rag_results["enrichment"])
else:
# Extract context for LLM
rag_context_global["all_participants"].extend(
rag_results["extraction"]["participants"]
)
rag_context_global["keywords"].extend(
rag_results["extraction"]["keywords"]
)
rag_context_global["glossary_terms"].extend(
rag_results["extraction"]["glossary_terms"]
)
# PRIORITY 3: LLM Processing
# 3.1 Clean transcription (ALL segments)
clean_transcription await llm_post_processor.clean_transcription(
raw_transcription={seg["speaker"]: seg["text"] for seg in meetnoo_results["segments"]},
participants=rag_context_global["all_participants"],
keywords=rag_context_global["keywords"],
language="fr"
)
# 3.2 Identify speakers (ONLY pending)
pending_speakers = {
label: segments
for label, result in voiceprint_results.items()
if result.get("match_source") == "pending"
}
if pending_speakers:
speaker_identifications = await llm_post_processor.identify_speakers(
unidentified_segments={
label: " ".join([seg["text"] for seg in segments])
for label, segments in pending_speakers.items()
},
potential_participants=rag_context_global["all_participants"],
keywords=rag_context_global["keywords"],
glossary_terms=rag_context_global["glossary_terms"]
)
# Confirm pending voiceprints
for speaker_label, identification in speaker_identifications.items():
if identification["confidence"] > 0.75:
await confirm_pending_voiceprint(
voiceprint_lib_id=voiceprint_results[speaker_label]["voiceprint_lib_id"],
identified_name=identification["identified_name"],
db=db
)
voiceprint_results[speaker_label].update({
"identified_name": identification["identified_name"],
"match_source": "llm_inference",
"match_confidence": identification["confidence"]
})
# Save enriched segments
enriched_segments = []
for segment in meetnoo_results["segments"]:
speaker_label = segment["speaker"]
speaker_info = voiceprint_results.get(speaker_label, {})
enriched_segment = EnrichedSegment(
id=generate_uuid(),
transcript_id=transcript_id,
segment_index=segment["index"],
speaker_label=speaker_label,
identified_name=speaker_info.get("identified_name"),
role=speaker_info.get("role"),
voiceprint_lib_id=speaker_info.get("voiceprint_lib_id"),
text=segment["text"],
clean_text=clean_transcription.get(speaker_label),
start_ms=int(segment["start"] * 1000),
end_ms=int(segment["end"] * 1000),
confidence=speaker_info.get("match_confidence", 0.0),
match_source=speaker_info.get("match_source", "pending"),
context_used=True,
metadata={
"role": speaker_info.get("role"),
"specialties": speaker_info.get("specialties"),
"source_document": speaker_info.get("source_document")
},
created_at=unix_timestamp(),
updated_at=unix_timestamp()
)
enriched_segments.append(enriched_segment)
db.bulk_save_objects(enriched_segments)
# Update transcript status
transcript.status = "completed"
transcript.updated_at = unix_timestamp()
db.commit()
return {
"status": "completed",
"segments_count": len(enriched_segments),
"speakers_identified": len([r for r in voiceprint_results.values() if r.get("match_source") != "pending"])
}
5.2 Statistiques Post-Processing (Test E2E)¶
DEBUG: Post-processing completed - Stats:
Total segments: 33
Total speakers: 6
Voiceprint matching (Priority 1):
- Matched: 1/6 (16.7%)
- Pending: 5/6 (83.3%)
- Match details: Kwame Mensah (similarity=1.000)
RAG enrichment (Priority 2):
- Enriched (identified speakers): 1/6
- Context extracted (pending speakers): 5/6
- Total chunks retrieved: 15 (avg 2.5 per speaker)
- RAG scores range: 0.51-0.73
LLM processing (Priority 3):
- Clean transcription: SUCCESS (33 segments)
- Speaker identification: TIMEOUT (90s, GPU service side)
Final enrichment stats:
- Fully identified: 1/6 (16.7%)
- Pending with RAG context: 5/6 (83.3%)
- All segments have clean_text: TRUE
6. Phase 4: Finalisation¶
6.1 Sauvegarde Segments¶
# INSERT enriched_segments batch (33 records)
db.bulk_save_objects(enriched_segments)
db.commit()
logger.info(f"Saved {len(enriched_segments)} enriched segments for transcript {transcript_id}")
6.2 Update Transcript Status¶
transcript.status = "completed"
transcript.total_segments = len(enriched_segments)
transcript.total_speakers = len(set([seg.speaker_label for seg in enriched_segments]))
transcript.duration_ms = meetnoo_results["duration_ms"]
transcript.updated_at = unix_timestamp()
db.commit()
6.3 SSE Notification¶
# Emit SSE event to frontend
await sse_manager.emit(
user_id=transcript.user_id,
event="transcript.completed",
data={
"transcript_id": transcript_id,
"status": "completed",
"total_segments": transcript.total_segments,
"total_speakers": transcript.total_speakers
}
)
7. Gestion du Progress¶
7.1 Table transcript_progress¶
CREATE TABLE transcript_progress (
id VARCHAR(36) PRIMARY KEY,
transcript_id VARCHAR(36) REFERENCES transcripts(uuid) ON DELETE CASCADE,
percentage INTEGER NOT NULL, -- 0-100
step_key VARCHAR(100) NOT NULL,
step_name VARCHAR(200) NOT NULL,
substep_message TEXT,
current_segment INTEGER,
total_segments INTEGER,
created_at VARCHAR(50) NOT NULL,
INDEX idx_transcript_id (transcript_id)
);
7.2 Update Progress Function¶
def update_transcript_progress(
db: Session,
transcript_id: str,
percentage: int,
step_key: str,
step_name: str,
substep_message: str = None,
current_segment: int = None,
total_segments: int = None
):
"""
Update transcript progress (visible in frontend progress bar).
"""
progress = TranscriptProgress(
id=generate_uuid(),
transcript_id=transcript_id,
percentage=percentage,
step_key=step_key,
step_name=step_name,
substep_message=substep_message,
current_segment=current_segment,
total_segments=total_segments,
created_at=unix_timestamp()
)
db.add(progress)
db.commit()
# Emit SSE
sse_manager.emit(
user_id=transcript.user_id,
event="transcript.progress",
data={
"transcript_id": transcript_id,
"percentage": percentage,
"step_name": step_name
}
)
7.3 Progress Steps Mapping¶
| Percentage | Step Key | Step Name (French) |
|---|---|---|
| 0% | upload |
Téléchargement fichiers |
| 5% | rag_indexing |
Indexation documents contextuels |
| 10% | gpu_preprocess |
Prétraitement audio (GPU) |
| 20% | gpu_diarize |
Détection des intervenants (GPU) |
| 50% | gpu_transcribe |
Transcription audio (GPU) |
| 70% | gpu_voiceprint |
Extraction empreintes vocales (GPU) |
| 80% | voiceprint_matching |
Identification vocale |
| 85% | rag_enrichment |
Enrichissement contextuel |
| 90% | llm_cleaning |
Correction transcription (IA) |
| 95% | llm_identification |
Identification speakers (IA) |
| 98% | finalization |
Finalisation |
| 100% | completed |
Transcription terminée |
8. Logs de Production¶
8.1 Logs Phase 1 (RAG Indexing)¶
INFO: DEBUG: Starting RAG workflow - user: 00..., audio: reunion.mp3, context_files: 4
INFO: DEBUG: Step 1/8 - Validating inputs
INFO: DEBUG: Input validation passed
INFO: DEBUG: Step 2/8 - Creating transcript record
INFO: DEBUG: Transcript created - ID: 10f220dd-e650...
INFO: DEBUG: Step 3/8 - Creating Qdrant collection
INFO: DEBUG: Created Qdrant collection: user_00..._transcript_10f2... (dim=1024)
INFO: DEBUG: Step 4/8 - Processing 4 contextual files
INFO: DEBUG: Processing file 1/4: CV_Jean.txt
INFO: DEBUG: Extracted 1017 characters from TXT file
INFO: DEBUG: Cache HIT for llm_metadata:gpt-4o-mini:a23a26...
INFO: DEBUG: Document-level extraction complete - method=llm, participants=1, confidence=0.95
INFO: DEBUG: Chunked text into 1 semantic chunks
INFO: DEBUG: Generated 1 embeddings in 0.21s (4.8 texts/sec)
INFO: DEBUG: Upserted 1 points to collection
INFO: DEBUG: Processing file 2/4: CV_Marie.txt
...
INFO: DEBUG: Step 5/8 - Uploading audio to S3
INFO: DEBUG: Audio uploaded - S3 key: users/00.../audio/10f2.../reunion.mp3
INFO: DEBUG: Step 6/8 - Triggering MeetNoo pipeline
INFO: DEBUG: MeetNoo pipeline started - meetnoo_id: 3581dde3...
INFO: DEBUG: Step 7/8 - Updating transcript record
INFO: DEBUG: RAG workflow complete - transcript_id: 10f220dd...
8.2 Logs Phase 3 (Post-Processing)¶
INFO: DEBUG: Post-processing transcript 10f220dd...
INFO: DEBUG: Fetching MeetNoo results...
INFO: DEBUG: MeetNoo results: 33 segments, 6 speakers
INFO: DEBUG: [Voiceprint Matching] Processing 6 speakers
INFO: DEBUG: [Voiceprint Matching] SPEAKER_00 - Match FOUND: Kwame Mensah (similarity=1.000)
INFO: DEBUG: [Voiceprint Matching] SPEAKER_01 - NO MATCH - Auto-saved pending (ID: vp-abc123)
INFO: DEBUG: [Voiceprint Matching] SPEAKER_02 - NO MATCH - Auto-saved pending (ID: vp-def456)
...
INFO: DEBUG: [RAG Enrichment] Processing SPEAKER_00 (identified: Kwame Mensah)
INFO: DEBUG: [RAG Enrichment] Mean pooling 8 segments (filtered: 8 valid, 0 too short)
INFO: DEBUG: [RAG Enrichment] Pooled embeddings: final_norm=1.000000
INFO: DEBUG: [QDRANT SEARCH] Found 3 results for Kwame Mensah - top_score=0.73
INFO: DEBUG: [RAG Enrichment] Extracted metadata: email=kwame.mensah@onu.org, phone=+1-555-0123
INFO: DEBUG: [RAG Extraction] Processing SPEAKER_01 (pending)
INFO: DEBUG: [QDRANT SEARCH] Found 3 results - extracting context
INFO: DEBUG: [RAG Extraction] Potential speakers: ['Dr. Marie Dubois', 'Jean-Marc Petit']
INFO: DEBUG: [LLM Cleaning] Submitting request...
INFO: DEBUG: [LLM Cleaning] Completed - corrections=15
INFO: DEBUG: [LLM Identification] Submitting request...
INFO: DEBUG: [LLM Identification] TIMEOUT after 90s
INFO: DEBUG: Saving 33 enriched segments...
INFO: DEBUG: Post-processing completed
Navigation: ← Architecture | RAG Enrichment →