Used Car Price Prediction
🚗 Used Car Price Prediction in Virginia
Predicting the price of used cars with AI and data analysis
“Don’t guess the price — let the data tell you.”
📎 Full Analysis:
👉 View Jupyter Notebook on GitHub
📌 One-Line Summary
This project predicts the prices of used cars in Virginia using a dataset of over 46,000 listings.
By analyzing details like year, mileage, brand, and fuel type, the AI model can estimate a realistic market price.
1️⃣ How It Was Built
1. Data Collection
- Collected real car sales data from the web
- Stored in an AWS cloud MySQL database
- Accessed using Python and shared via Flask API
2. Data Preparation
- Filled missing values (e.g., unknown mileage)
- Removed unrealistic values (e.g., mileage over 1 million km)
- Converted text data (brand, fuel type) into numbers
- Applied log transformation to balance skewed data
3. Data Analysis (EDA)
- Visualized the relationship between price and year/mileage
- Found year and mileage to be the most influential features
4. AI Model Training
Tested several machine learning models:
- Linear Regression
- Decision Tree
- Random Forest
- Support Vector Regression (SVR)
- XGBoost (winner)
📊 Best Model: XGBoost
- Accuracy (R²): 0.89
- Average Error (RMSE): $5,474
5. Results
Model | R² Score | RMSE |
---|---|---|
Linear Regression | 0.58 | 14,085 |
Decision Tree | 0.73 | 9,022 |
Random Forest | 0.84 | 7,021 |
SVR | 0.11 | 14,328 |
XGBoost | 0.89 | 5,474 |
6. Real-World Test
- 2016 Honda Odyssey → Predicted price: $18,738
Matched closely with actual market data.
2️⃣ Real-World Use
- Final model saved as a Pickle file
- Deployed via Flask API for real-time predictions
- Created a simplified version (Year, Mileage, Brand, Model) for web app integration
🛠 Technologies Used
Step | Technology |
---|---|
Data Storage | AWS MySQL |
Model Dev | Python, scikit-learn, XGBoost |
Deployment | Flask API, Pickle |
Environment | AWS EC2 (Ubuntu) |
💡 Key Learnings
- Log transformation improves accuracy for skewed data
- Tree-based models handle mixed data types effectively
- Even with only 4 features, accurate real-time predictions are possible
🔗 GitHub Repository
All articles on this blog are licensed under CC BY-NC-SA 4.0 unless otherwise stated.