Amazon Polly

Amazon Polly is a Text-to-Speech (TTS) service that converts written text into lifelike speech using deep learning.
With Polly, you can create applications that actually speak to users—for example, an audiobook generator, a voice-enabled chatbot, or accessibility tools for visually impaired users.

Lexicons (Custom Pronunciation Dictionary)
- You can define how certain words should be pronounced.
- Example: AWS → Amazon Web Services, W3C → World Wide Web Consortium.
- Exam Tip: If a question asks how to control the way Polly pronounces abbreviations → the answer is Lexicons.
SSML (Speech Synthesis Markup Language)
- Markup language to fine-tune speech output: pauses, emphasis, pitch, volume, rate, etc.
- Example:
  1
  <speak>Hello, <break time="1s"/> how are you?</speak>
  → Polly will say “Hello,” pause for 1 second, then continue with “how are you?”
Voice Engines
- Standard: Older, robotic-sounding voices.
- Neural: More human-like and natural.
- Long-form: Designed for extended audio like podcasts or audiobooks.
- Generative: Latest engine using GenAI, capable of expressive, adaptive voices.
- Exam Tip: Know the difference between Standard vs Neural voices.
Speech Marks
- Metadata showing where words and sentences start/end in the audio stream.
- Useful for lip-syncing or highlighting words in real-time transcripts.

Amazon Rekognition

Amazon Rekognition analyzes images and videos with machine learning.
It can identify objects, text, people, and activities, and it supports facial recognition and verification.

Labeling – Automatically detect and categorize objects and scenes (e.g., “car,” “dog,” “mountain”).
Text Detection – Extract text from images (e.g., license plates, signs).
Face Detection & Analysis – Determine gender, age range, and emotions (e.g., smiling, eyes open).
Face Search & Verification – Match against a database of known faces (e.g., for access control).
Celebrity Recognition – Identify famous people.
Pathing / Tracking – Track movement (e.g., following a ball in a sports game).
PPE Detection – Detect personal protective equipment like helmets, gloves, and masks.

Custom Labels
- Train Rekognition to detect your own objects or logos.
- Example: The NFL uses Rekognition to automatically find its logo in social media photos.
- Only a few hundred training images are needed.
- Images are stored in Amazon S3, then Rekognition trains a custom model.
Exam Tip:
If you see “identify your company logo in images” → answer is Rekognition Custom Labels.
Content Moderation
- Automatically detect inappropriate or unsafe content (e.g., for social media platforms, ad campaigns, broadcasting).
- Reduces human review workload to about 1–5%.
- Integrated with Amazon Augmented AI (A2I) so humans can review edge cases.
- Supports Custom Moderation Adapters → you can supply your own labeled datasets to improve accuracy.
Exam Tip:
If a question asks about automatically filtering harmful content while still allowing human review when needed → the answer involves Rekognition Content Moderation + A2I.

Face Liveness Detection: Ensures the detected face is real (not a photo or video spoof).
Image Properties: Extract dominant colors, foreground/background quality.
Integration with Other AWS Services:
- Works well with Amazon S3 (for image storage).
- Results can be sent to Amazon SNS/SQS for event handling.
- Human-in-the-loop moderation integrates with Amazon A2I.

Polly vs Transcribe → Polly = TTS, Transcribe = STT.
Polly Key Features → Lexicons, SSML, Neural/Generative Voices, Speech Marks.
Rekognition Key Features → Labeling, Text Detection, Face Analysis, Celebrity Recognition, PPE Detection.
Rekognition Advanced → Custom Labels, Content Moderation (+ A2I integration).
Remember: Rekognition = image/video analysis, Polly = text-to-speech.