Amazon Polly

What It Does

  • Amazon Polly is a Text-to-Speech (TTS) service that converts written text into lifelike speech using deep learning.
  • With Polly, you can create applications that actually speak to users—for example, an audiobook generator, a voice-enabled chatbot, or accessibility tools for visually impaired users.

Key Features (Exam Focus)

  1. Lexicons (Custom Pronunciation Dictionary)
    • You can define how certain words should be pronounced.
    • Example: AWSAmazon Web Services, W3CWorld Wide Web Consortium.
    • Exam Tip: If a question asks how to control the way Polly pronounces abbreviations → the answer is Lexicons.
  2. SSML (Speech Synthesis Markup Language)
    • Markup language to fine-tune speech output: pauses, emphasis, pitch, volume, rate, etc.

    • Example:

      1
      <speak>Hello, <break time="1s"/> how are you?</speak>

      → Polly will say “Hello,” pause for 1 second, then continue with “how are you?”

  3. Voice Engines
    • Standard: Older, robotic-sounding voices.
    • Neural: More human-like and natural.
    • Long-form: Designed for extended audio like podcasts or audiobooks.
    • Generative: Latest engine using GenAI, capable of expressive, adaptive voices.
    • Exam Tip: Know the difference between Standard vs Neural voices.
  4. Speech Marks
    • Metadata showing where words and sentences start/end in the audio stream.
    • Useful for lip-syncing or highlighting words in real-time transcripts.

Important Comparison

  • Amazon Polly = Text → Speech
  • Amazon Transcribe = Speech → Text

Amazon Rekognition

What It Does

  • Amazon Rekognition analyzes images and videos with machine learning.
  • It can identify objects, text, people, and activities, and it supports facial recognition and verification.

Core Use Cases (High Exam Relevance)

  1. Labeling – Automatically detect and categorize objects and scenes (e.g., “car,” “dog,” “mountain”).
  2. Text Detection – Extract text from images (e.g., license plates, signs).
  3. Face Detection & Analysis – Determine gender, age range, and emotions (e.g., smiling, eyes open).
  4. Face Search & Verification – Match against a database of known faces (e.g., for access control).
  5. Celebrity Recognition – Identify famous people.
  6. Pathing / Tracking – Track movement (e.g., following a ball in a sports game).
  7. PPE Detection – Detect personal protective equipment like helmets, gloves, and masks.

Advanced Features

  1. Custom Labels

    • Train Rekognition to detect your own objects or logos.
    • Example: The NFL uses Rekognition to automatically find its logo in social media photos.
    • Only a few hundred training images are needed.
    • Images are stored in Amazon S3, then Rekognition trains a custom model.

    Exam Tip:
    If you see “identify your company logo in images” → answer is Rekognition Custom Labels.

  2. Content Moderation

    • Automatically detect inappropriate or unsafe content (e.g., for social media platforms, ad campaigns, broadcasting).
    • Reduces human review workload to about 1–5%.
    • Integrated with Amazon Augmented AI (A2I) so humans can review edge cases.
    • Supports Custom Moderation Adapters → you can supply your own labeled datasets to improve accuracy.

    Exam Tip:
    If a question asks about automatically filtering harmful content while still allowing human review when needed → the answer involves Rekognition Content Moderation + A2I.


Extra Details That Might Show Up on Exams

  • Face Liveness Detection: Ensures the detected face is real (not a photo or video spoof).
  • Image Properties: Extract dominant colors, foreground/background quality.
  • Integration with Other AWS Services:
    • Works well with Amazon S3 (for image storage).
    • Results can be sent to Amazon SNS/SQS for event handling.
    • Human-in-the-loop moderation integrates with Amazon A2I.

Quick Exam Summary

  • Polly vs Transcribe → Polly = TTS, Transcribe = STT.
  • Polly Key Features → Lexicons, SSML, Neural/Generative Voices, Speech Marks.
  • Rekognition Key Features → Labeling, Text Detection, Face Analysis, Celebrity Recognition, PPE Detection.
  • Rekognition Advanced → Custom Labels, Content Moderation (+ A2I integration).
  • Remember: Rekognition = image/video analysis, Polly = text-to-speech.