Amazon SageMaker Data Tools and Model Evaluation

SageMaker Data Wrangler

SageMaker Data Wrangler is a tool designed to make data preparation easier before building machine learning (ML) models.

With Data Wrangler, you can: - Prepare tabular and image data for ML

Perform data preparation, transformation, and feature engineering
Use a single interface for: - Data selection - Cleansing - Exploration - Visualization - Processing
Run SQL queries directly
Use the Data Quality tool to check for missaing or inconsistent values

Key Features

Import Data: Load from sources like Amazon S3.

Preview Data: Inspect column names, types, and values.

Visualize Data: Build charts to better understand the dataset.

Transform Data: Apply functions, drop or add columns.

Quick Model: Run a quick test to check model performance.

Export Data Flow: Save transformations for reuse in pipelines.

Exam Tip: If you see a question about data preparation and feature engineering in SageMaker, think of Data Wrangler.

What are ML Features?

Features are the inputs to ML models during training and inference.

Example:
For a music dataset, features might include:

Song ratings
Listening duration
Listener demographics

High-quality, reusable features are critical. They improve consistency across teams and projects within a company.

SageMaker Feature Store

The Feature Store helps manage and reuse features.

Ingest features from multiple sources.
Define transformations to convert raw data into usable features.
Publish features directly from Data Wrangler into Feature Store.
Features are searchable and shareable within SageMaker Studio.

Exam Tip: Feature Store = centralized place to manage, discover, and reuse ML features.

------------------------------------------------------------------------

SageMaker Clarify

SageMaker Clarify is about trust and fairness in ML models. It helps with:

Model Evaluation: Compare performance of two models (e.g., Model A vs Model B).

Can evaluate human factors like friendliness or humor in a foundation model.
Use AWS-managed human reviewers or your own employees.
Use built-in datasets or bring your own.
Includes built-in metrics and algorithms.

Model Explainability: Understand why a model made its predictions.
- Example: “Why was this loan rejected?”
- Helps debug deployed models and build trust.
- Exam Tip: Look for keywords like explain predictions or increase transparency → Clarify.

Bias Detection: Identify and measure bias in data or models using statistical metrics.
- Example: If your dataset heavily favors one group, Clarify can flag it.
- Types of Bias:
  - Sampling Bias: Data doesn’t fairly represent the population.
  - Measurement Bias: Errors in how data is measured.
  - Observer Bias: Human judgment skews results.
  - Confirmation Bias: Favoring information that supports preconceptions.

Exam Tip: If the question mentions detecting bias or explaining ML predictions, the answer is usually SageMaker Clarify.

SageMaker Ground Truth

Ground Truth focuses on data labeling and human feedback.

Supports RLHF (Reinforcement Learning from Human Feedback).
Use cases:
- Model review and evaluation
- Aligning models to human preferences
- Creating labeled datasets (e.g., tagging images)

How it Works

Humans review and provide feedback, which is added to the model’s “reward” function.
Feedback improves model accuracy and aligns it with desired behavior.
Reviewers can be:
- Amazon Mechanical Turk workers
- Your employees
- Third-party vendors

AWS Certified AI Practitioner(37) - SageMaker Data Tools and Model Evaluation

Amazon SageMaker Data Tools and Model Evaluation

SageMaker Data Wrangler

Key Features

What are ML Features?

SageMaker Feature Store

SageMaker Clarify

SageMaker Ground Truth

How it Works

Ground Truth Plus

Exam Tip: If the exam mentions data labeling or RLHF, think
Ground Truth.

Key Takeaways for the Exam

Amazon SageMaker Data Tools and Model Evaluation

SageMaker Data Wrangler

Key Features

What are ML Features?

SageMaker Feature Store

SageMaker Clarify

SageMaker Ground Truth

How it Works

Ground Truth Plus

Exam Tip: If the exam mentions data labeling or RLHF, thinkGround Truth.

Key Takeaways for the Exam

Exam Tip: If the exam mentions data labeling or RLHF, think
Ground Truth.