Spaces:
Running
Running
Commit
Β·
c0caea8
0
Parent(s):
Initial commit: SEO Report Generator
Browse files- README.md +107 -0
- SETUP.md +108 -0
- START.md +46 -0
- __pycache__/app.cpython-313.pyc +0 -0
- __pycache__/pdf_generator.cpython-313.pyc +0 -0
- __pycache__/report_generator.cpython-313.pyc +0 -0
- __pycache__/simple_pdf_generator.cpython-313.pyc +0 -0
- app.py +161 -0
- claude.md +115 -0
- modules/__init__.py +1 -0
- modules/__pycache__/__init__.cpython-313.pyc +0 -0
- modules/__pycache__/content_audit.cpython-313.pyc +0 -0
- modules/__pycache__/technical_seo.cpython-313.pyc +0 -0
- modules/content_audit.py +388 -0
- modules/technical_seo.py +191 -0
- pdf_generator.py +457 -0
- report_generator.py +1096 -0
- requirements.txt +9 -0
- run.py +40 -0
- simple_pdf_generator.py +104 -0
- test_app.py +122 -0
README.md
ADDED
|
@@ -0,0 +1,107 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: SEO Report Generator
|
| 3 |
+
emoji: π
|
| 4 |
+
colorFrom: blue
|
| 5 |
+
colorTo: green
|
| 6 |
+
sdk: streamlit
|
| 7 |
+
sdk_version: 1.28.0
|
| 8 |
+
app_file: app.py
|
| 9 |
+
pinned: false
|
| 10 |
+
license: mit
|
| 11 |
+
---
|
| 12 |
+
|
| 13 |
+
# SEO Report Generator
|
| 14 |
+
|
| 15 |
+
A one-click SEO report generator that creates comprehensive SEO analysis reports from any website URL. Built with Streamlit and designed to be modular and extensible.
|
| 16 |
+
|
| 17 |
+
## Features
|
| 18 |
+
|
| 19 |
+
### β
Implemented (v1 MVP)
|
| 20 |
+
- **Technical SEO Analysis** via Google PageSpeed Insights API
|
| 21 |
+
- Mobile & desktop performance scores
|
| 22 |
+
- Core Web Vitals (LCP, CLS, INP, FCP)
|
| 23 |
+
- Optimization opportunities and diagnostics
|
| 24 |
+
- **Content Audit** via web crawling
|
| 25 |
+
- Metadata completeness (title, description, H1 tags)
|
| 26 |
+
- Content quality metrics (word count, CTA presence)
|
| 27 |
+
- Content freshness analysis
|
| 28 |
+
- **Professional HTML Reports** with interactive charts
|
| 29 |
+
- **PDF Export** functionality
|
| 30 |
+
- **Competitor Benchmarking** (basic comparison)
|
| 31 |
+
- **Executive Summary** with health scoring
|
| 32 |
+
|
| 33 |
+
### π§ Planned for Future Versions
|
| 34 |
+
- Keyword Rankings (Google Search Console integration)
|
| 35 |
+
- Backlink Profile Analysis (Ahrefs/SEMrush APIs)
|
| 36 |
+
- Advanced Competitor Analysis
|
| 37 |
+
- GA4/Conversion Tracking Integration
|
| 38 |
+
|
| 39 |
+
## Installation
|
| 40 |
+
|
| 41 |
+
1. Clone the repository
|
| 42 |
+
2. Install dependencies:
|
| 43 |
+
```bash
|
| 44 |
+
pip install -r requirements.txt
|
| 45 |
+
```
|
| 46 |
+
|
| 47 |
+
3. Run the application:
|
| 48 |
+
```bash
|
| 49 |
+
streamlit run app.py
|
| 50 |
+
```
|
| 51 |
+
|
| 52 |
+
## Usage
|
| 53 |
+
|
| 54 |
+
1. Open the Streamlit app in your browser
|
| 55 |
+
2. Enter a website URL to analyze
|
| 56 |
+
3. Optionally add competitor URLs for benchmarking
|
| 57 |
+
4. Click "Generate SEO Report"
|
| 58 |
+
5. View the interactive report and download HTML/PDF versions
|
| 59 |
+
|
| 60 |
+
## API Requirements
|
| 61 |
+
|
| 62 |
+
- **Google PageSpeed Insights API**: No API key required for basic usage (with rate limits)
|
| 63 |
+
- For higher usage limits, get a free API key from Google Cloud Console
|
| 64 |
+
|
| 65 |
+
## Architecture
|
| 66 |
+
|
| 67 |
+
The system is built with a modular architecture:
|
| 68 |
+
|
| 69 |
+
```
|
| 70 |
+
app.py # Main Streamlit application
|
| 71 |
+
modules/
|
| 72 |
+
βββ technical_seo.py # PageSpeed Insights integration
|
| 73 |
+
βββ content_audit.py # Web crawling and content analysis
|
| 74 |
+
report_generator.py # HTML report generation with charts
|
| 75 |
+
pdf_generator.py # PDF export functionality
|
| 76 |
+
```
|
| 77 |
+
|
| 78 |
+
## Report Structure
|
| 79 |
+
|
| 80 |
+
1. **Executive Summary** - Overall health score and quick wins
|
| 81 |
+
2. **Technical SEO** - Performance metrics and optimization opportunities
|
| 82 |
+
3. **Content Audit** - Metadata completeness and content quality
|
| 83 |
+
4. **Competitor Analysis** - Basic performance comparison
|
| 84 |
+
5. **Future Modules** - Placeholder sections for keywords, backlinks, etc.
|
| 85 |
+
6. **Recommendations** - Prioritized action items
|
| 86 |
+
|
| 87 |
+
## Success Metrics
|
| 88 |
+
|
| 89 |
+
β
Report generates without failures for multiple domains
|
| 90 |
+
β
PageSpeed data fetched reliably via Google API
|
| 91 |
+
β
Crawl completes within 200 pages, respecting robots.txt
|
| 92 |
+
β
Charts render correctly in HTML and export cleanly to PDF
|
| 93 |
+
β
Report structure matches defined format
|
| 94 |
+
β
Professional visual design resembling agency decks
|
| 95 |
+
|
| 96 |
+
## Contributing
|
| 97 |
+
|
| 98 |
+
The system is designed to be extensible. To add new modules:
|
| 99 |
+
|
| 100 |
+
1. Create a new module in `modules/` following the existing pattern
|
| 101 |
+
2. Update `report_generator.py` to include the new section
|
| 102 |
+
3. Add placeholder sections for future enhancements
|
| 103 |
+
4. Update the main app to integrate the new module
|
| 104 |
+
|
| 105 |
+
## License
|
| 106 |
+
|
| 107 |
+
MIT License - see LICENSE file for details
|
SETUP.md
ADDED
|
@@ -0,0 +1,108 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# SEO Report Generator - Setup Instructions
|
| 2 |
+
|
| 3 |
+
## Quick Start
|
| 4 |
+
|
| 5 |
+
1. **Install Dependencies**
|
| 6 |
+
```bash
|
| 7 |
+
python -m pip install -r requirements.txt
|
| 8 |
+
```
|
| 9 |
+
|
| 10 |
+
2. **Run the Application**
|
| 11 |
+
```bash
|
| 12 |
+
python -m streamlit run app.py
|
| 13 |
+
```
|
| 14 |
+
Or use the helper script:
|
| 15 |
+
```bash
|
| 16 |
+
python run.py
|
| 17 |
+
```
|
| 18 |
+
|
| 19 |
+
3. **Access the App**
|
| 20 |
+
- Open your browser to: http://localhost:8501
|
| 21 |
+
- The app will automatically open if you use `python run.py`
|
| 22 |
+
|
| 23 |
+
3. **Test the System** (Optional)
|
| 24 |
+
```bash
|
| 25 |
+
python test_app.py
|
| 26 |
+
```
|
| 27 |
+
|
| 28 |
+
## Requirements
|
| 29 |
+
|
| 30 |
+
- Python 3.8+
|
| 31 |
+
- Internet connection for API calls and web crawling
|
| 32 |
+
- Modern web browser
|
| 33 |
+
|
| 34 |
+
## Key Features Ready to Use
|
| 35 |
+
|
| 36 |
+
### β
Core Features Implemented
|
| 37 |
+
- **Technical SEO Analysis** - PageSpeed Insights integration
|
| 38 |
+
- **Content Audit** - Automated web crawling and analysis
|
| 39 |
+
- **Professional Reports** - HTML with interactive charts
|
| 40 |
+
- **PDF Export** - Professional PDF generation
|
| 41 |
+
- **Competitor Benchmarking** - Side-by-side comparison
|
| 42 |
+
- **Executive Summary** - Health scoring and quick wins
|
| 43 |
+
|
| 44 |
+
### π Report Sections
|
| 45 |
+
1. Executive Summary with overall health score
|
| 46 |
+
2. Technical SEO performance metrics
|
| 47 |
+
3. Content audit results
|
| 48 |
+
4. Competitor comparison (if provided)
|
| 49 |
+
5. Placeholder sections for future modules
|
| 50 |
+
6. Prioritized recommendations
|
| 51 |
+
|
| 52 |
+
## Usage Tips
|
| 53 |
+
|
| 54 |
+
1. **URLs**: Always include `https://` for best results
|
| 55 |
+
2. **Competitor Analysis**: Add 1-3 competitor URLs for benchmarking
|
| 56 |
+
3. **Report Generation**: Takes 1-3 minutes depending on site size
|
| 57 |
+
4. **PDF Export**: May take additional time for complex reports
|
| 58 |
+
|
| 59 |
+
## API Limits
|
| 60 |
+
|
| 61 |
+
- **PageSpeed Insights**: 25,000 requests/day (no API key needed)
|
| 62 |
+
- For higher limits, get a free Google Cloud API key
|
| 63 |
+
|
| 64 |
+
## Troubleshooting
|
| 65 |
+
|
| 66 |
+
### Common Issues:
|
| 67 |
+
1. **Import Errors**: Run `python -m pip install -r requirements.txt`
|
| 68 |
+
2. **Command Not Found**: Use `python -m streamlit run app.py` instead of `streamlit run app.py`
|
| 69 |
+
3. **PDF Generation Issues**: Use HTML export and browser print-to-PDF as fallback
|
| 70 |
+
4. **Site Access Issues**: Some sites may block crawlers
|
| 71 |
+
5. **Slow Performance**: Large sites may take longer to analyze
|
| 72 |
+
|
| 73 |
+
### Performance Tips:
|
| 74 |
+
- Use quick_scan=True for competitor analysis
|
| 75 |
+
- Limit crawl to ~200 pages for faster results
|
| 76 |
+
- Some sites may require custom headers
|
| 77 |
+
|
| 78 |
+
## File Structure
|
| 79 |
+
```
|
| 80 |
+
βββ app.py # Main Streamlit application
|
| 81 |
+
βββ run.py # Quick start script
|
| 82 |
+
βββ test_app.py # Test suite
|
| 83 |
+
βββ requirements.txt # Dependencies
|
| 84 |
+
βββ modules/
|
| 85 |
+
β βββ technical_seo.py # PageSpeed integration
|
| 86 |
+
β βββ content_audit.py # Content crawling
|
| 87 |
+
βββ report_generator.py # HTML report generation
|
| 88 |
+
βββ pdf_generator.py # PDF export
|
| 89 |
+
```
|
| 90 |
+
|
| 91 |
+
## Next Steps
|
| 92 |
+
|
| 93 |
+
The MVP is complete and ready for demo! Future enhancements can include:
|
| 94 |
+
- Google Search Console integration for keyword data
|
| 95 |
+
- Backlink analysis via Ahrefs/SEMrush APIs
|
| 96 |
+
- GA4 conversion tracking
|
| 97 |
+
- Advanced competitor analysis
|
| 98 |
+
- Automated scheduling and monitoring
|
| 99 |
+
|
| 100 |
+
## Success Criteria β
|
| 101 |
+
|
| 102 |
+
β
Functional: User can input URL and receive full HTML + PDF report
|
| 103 |
+
β
Professional output: Agency-quality reports with charts and summaries
|
| 104 |
+
β
Modular design: Independent technical and content modules
|
| 105 |
+
β
Extensible: Template-based report generation for easy expansion
|
| 106 |
+
β
Evaluation metrics: Works with multiple domains, reliable API integration
|
| 107 |
+
|
| 108 |
+
The system is ready for demonstration and production use!
|
START.md
ADDED
|
@@ -0,0 +1,46 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# π Quick Start Guide
|
| 2 |
+
|
| 3 |
+
## Your SEO Report Generator is Ready!
|
| 4 |
+
|
| 5 |
+
The application is currently running at: **http://localhost:8501**
|
| 6 |
+
|
| 7 |
+
### How to Use:
|
| 8 |
+
|
| 9 |
+
1. **π± Open your browser** and go to: http://localhost:8501
|
| 10 |
+
2. **π Enter a website URL** to analyze (e.g., https://example.com)
|
| 11 |
+
3. **βοΈ Add competitor URLs** (optional) for benchmarking
|
| 12 |
+
4. **π― Click "Generate SEO Report"** and wait 1-3 minutes
|
| 13 |
+
5. **π View the interactive report** with charts and analysis
|
| 14 |
+
6. **πΎ Download HTML report** (PDF instructions included)
|
| 15 |
+
|
| 16 |
+
### What You'll Get:
|
| 17 |
+
|
| 18 |
+
β
**Executive Summary** - Overall SEO health score
|
| 19 |
+
β
**Technical Analysis** - PageSpeed performance metrics
|
| 20 |
+
β
**Content Audit** - Metadata and content quality analysis
|
| 21 |
+
β
**Competitor Comparison** - Performance benchmarking
|
| 22 |
+
β
**Recommendations** - Prioritized action items
|
| 23 |
+
|
| 24 |
+
### Example URLs to Try:
|
| 25 |
+
|
| 26 |
+
- https://example.com (simple test site)
|
| 27 |
+
- https://python.org (tech documentation)
|
| 28 |
+
- https://github.com (development platform)
|
| 29 |
+
- Your own website!
|
| 30 |
+
|
| 31 |
+
### Features Available:
|
| 32 |
+
|
| 33 |
+
- π **Technical SEO** via Google PageSpeed Insights
|
| 34 |
+
- π **Content Analysis** via automated web crawling
|
| 35 |
+
- π **Interactive Charts** with Plotly visualizations
|
| 36 |
+
- π **Competitor Benchmarking** (up to 3 competitors)
|
| 37 |
+
- π **Professional HTML Reports** with executive summary
|
| 38 |
+
- π‘ **PDF Creation** via browser print functionality
|
| 39 |
+
|
| 40 |
+
### Need Help?
|
| 41 |
+
|
| 42 |
+
- **Stop the app**: Press `Ctrl+C` in the terminal
|
| 43 |
+
- **Restart**: Run `python -m streamlit run app.py` again
|
| 44 |
+
- **Issues**: Check SETUP.md for troubleshooting
|
| 45 |
+
|
| 46 |
+
**π Ready to analyze some websites? Open http://localhost:8501 and start generating reports!**
|
__pycache__/app.cpython-313.pyc
ADDED
|
Binary file (7.56 kB). View file
|
|
|
__pycache__/pdf_generator.cpython-313.pyc
ADDED
|
Binary file (12 kB). View file
|
|
|
__pycache__/report_generator.cpython-313.pyc
ADDED
|
Binary file (43.6 kB). View file
|
|
|
__pycache__/simple_pdf_generator.cpython-313.pyc
ADDED
|
Binary file (4.57 kB). View file
|
|
|
app.py
ADDED
|
@@ -0,0 +1,161 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import streamlit as st
|
| 2 |
+
import validators
|
| 3 |
+
from modules.technical_seo import TechnicalSEOModule
|
| 4 |
+
from modules.content_audit import ContentAuditModule
|
| 5 |
+
from report_generator import ReportGenerator
|
| 6 |
+
|
| 7 |
+
# Try to import PDF generator, fallback if not available
|
| 8 |
+
try:
|
| 9 |
+
from simple_pdf_generator import SimplePDFGenerator, create_browser_pdf_instructions
|
| 10 |
+
pdf_gen = SimplePDFGenerator()
|
| 11 |
+
PDF_AVAILABLE = pdf_gen.available
|
| 12 |
+
if not PDF_AVAILABLE:
|
| 13 |
+
browser_instructions = create_browser_pdf_instructions()
|
| 14 |
+
except ImportError as e:
|
| 15 |
+
print(f"PDF generation unavailable: {e}")
|
| 16 |
+
PDF_AVAILABLE = False
|
| 17 |
+
browser_instructions = "PDF generation not available"
|
| 18 |
+
|
| 19 |
+
def main():
|
| 20 |
+
st.set_page_config(
|
| 21 |
+
page_title="SEO Report Generator",
|
| 22 |
+
page_icon="π",
|
| 23 |
+
layout="wide"
|
| 24 |
+
)
|
| 25 |
+
|
| 26 |
+
st.title("π One-Click SEO Report Generator")
|
| 27 |
+
st.markdown("Generate comprehensive SEO reports for any website")
|
| 28 |
+
|
| 29 |
+
# Input section
|
| 30 |
+
col1, col2 = st.columns([2, 1])
|
| 31 |
+
|
| 32 |
+
with col1:
|
| 33 |
+
url = st.text_input(
|
| 34 |
+
"Website URL",
|
| 35 |
+
placeholder="https://example.com",
|
| 36 |
+
help="Enter the website URL you want to analyze"
|
| 37 |
+
)
|
| 38 |
+
|
| 39 |
+
competitors = st.text_area(
|
| 40 |
+
"Competitor URLs (Optional)",
|
| 41 |
+
placeholder="https://competitor1.com\nhttps://competitor2.com",
|
| 42 |
+
help="Enter competitor URLs, one per line"
|
| 43 |
+
)
|
| 44 |
+
|
| 45 |
+
with col2:
|
| 46 |
+
st.markdown("### Report Options")
|
| 47 |
+
include_charts = st.checkbox("Include Charts", value=True)
|
| 48 |
+
include_competitors = st.checkbox("Include Competitor Analysis", value=True)
|
| 49 |
+
|
| 50 |
+
# Generate report button
|
| 51 |
+
if st.button("Generate SEO Report", type="primary"):
|
| 52 |
+
if not url:
|
| 53 |
+
st.error("Please enter a website URL")
|
| 54 |
+
return
|
| 55 |
+
|
| 56 |
+
if not validators.url(url):
|
| 57 |
+
st.error("Please enter a valid URL")
|
| 58 |
+
return
|
| 59 |
+
|
| 60 |
+
# Process competitor URLs
|
| 61 |
+
competitor_list = []
|
| 62 |
+
if competitors and include_competitors:
|
| 63 |
+
competitor_list = [c.strip() for c in competitors.split('\n') if c.strip() and validators.url(c.strip())]
|
| 64 |
+
|
| 65 |
+
# Generate report
|
| 66 |
+
with st.spinner("Generating SEO report... This may take a few minutes."):
|
| 67 |
+
generate_report(url, competitor_list, include_charts)
|
| 68 |
+
|
| 69 |
+
def generate_report(url, competitors, include_charts):
|
| 70 |
+
try:
|
| 71 |
+
# Initialize report generator
|
| 72 |
+
report_gen = ReportGenerator()
|
| 73 |
+
|
| 74 |
+
# Progress tracking
|
| 75 |
+
progress_bar = st.progress(0)
|
| 76 |
+
status_text = st.empty()
|
| 77 |
+
|
| 78 |
+
# Technical SEO Analysis
|
| 79 |
+
status_text.text("Analyzing technical SEO...")
|
| 80 |
+
progress_bar.progress(20)
|
| 81 |
+
technical_module = TechnicalSEOModule()
|
| 82 |
+
technical_data = technical_module.analyze(url)
|
| 83 |
+
|
| 84 |
+
# Content Audit
|
| 85 |
+
status_text.text("Performing content audit...")
|
| 86 |
+
progress_bar.progress(50)
|
| 87 |
+
content_module = ContentAuditModule()
|
| 88 |
+
content_data = content_module.analyze(url)
|
| 89 |
+
|
| 90 |
+
# Competitor Analysis
|
| 91 |
+
competitor_data = []
|
| 92 |
+
if competitors:
|
| 93 |
+
status_text.text("Analyzing competitors...")
|
| 94 |
+
progress_bar.progress(70)
|
| 95 |
+
for comp_url in competitors:
|
| 96 |
+
comp_technical = technical_module.analyze(comp_url)
|
| 97 |
+
comp_content = content_module.analyze(comp_url, quick_scan=True)
|
| 98 |
+
competitor_data.append({
|
| 99 |
+
'url': comp_url,
|
| 100 |
+
'technical': comp_technical,
|
| 101 |
+
'content': comp_content
|
| 102 |
+
})
|
| 103 |
+
|
| 104 |
+
# Generate report
|
| 105 |
+
status_text.text("Generating report...")
|
| 106 |
+
progress_bar.progress(90)
|
| 107 |
+
|
| 108 |
+
report_html = report_gen.generate_html_report(
|
| 109 |
+
url=url,
|
| 110 |
+
technical_data=technical_data,
|
| 111 |
+
content_data=content_data,
|
| 112 |
+
competitor_data=competitor_data,
|
| 113 |
+
include_charts=include_charts
|
| 114 |
+
)
|
| 115 |
+
|
| 116 |
+
progress_bar.progress(100)
|
| 117 |
+
status_text.text("Report generated successfully!")
|
| 118 |
+
|
| 119 |
+
# Display report
|
| 120 |
+
st.success("SEO Report Generated Successfully!")
|
| 121 |
+
|
| 122 |
+
# Report preview
|
| 123 |
+
st.markdown("### Report Preview")
|
| 124 |
+
st.components.v1.html(report_html, height=800, scrolling=True)
|
| 125 |
+
|
| 126 |
+
# Download buttons
|
| 127 |
+
col1, col2 = st.columns(2)
|
| 128 |
+
with col1:
|
| 129 |
+
st.download_button(
|
| 130 |
+
label="π Download HTML Report",
|
| 131 |
+
data=report_html,
|
| 132 |
+
file_name=f"seo_report_{url.replace('https://', '').replace('http://', '').replace('/', '_')}.html",
|
| 133 |
+
mime="text/html"
|
| 134 |
+
)
|
| 135 |
+
|
| 136 |
+
with col2:
|
| 137 |
+
# Generate PDF if available
|
| 138 |
+
if PDF_AVAILABLE:
|
| 139 |
+
try:
|
| 140 |
+
pdf_data = pdf_gen.generate_pdf(report_html)
|
| 141 |
+
|
| 142 |
+
st.download_button(
|
| 143 |
+
label="π Download PDF Report",
|
| 144 |
+
data=pdf_data,
|
| 145 |
+
file_name=f"seo_report_{url.replace('https://', '').replace('http://', '').replace('/', '_')}.pdf",
|
| 146 |
+
mime="application/pdf"
|
| 147 |
+
)
|
| 148 |
+
except Exception as e:
|
| 149 |
+
st.error(f"PDF generation failed: {str(e)}")
|
| 150 |
+
st.info("HTML report is available for download")
|
| 151 |
+
else:
|
| 152 |
+
st.info("π‘ Create PDF from HTML Report")
|
| 153 |
+
with st.expander("π Instructions"):
|
| 154 |
+
st.markdown(browser_instructions)
|
| 155 |
+
|
| 156 |
+
except Exception as e:
|
| 157 |
+
st.error(f"Error generating report: {str(e)}")
|
| 158 |
+
st.exception(e)
|
| 159 |
+
|
| 160 |
+
if __name__ == "__main__":
|
| 161 |
+
main()
|
claude.md
ADDED
|
@@ -0,0 +1,115 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
|
| 2 |
+
# PRD: One-Click SEO Report Generator (v1 MVP)
|
| 3 |
+
|
| 4 |
+
## Objective
|
| 5 |
+
|
| 6 |
+
Deliver a working demo system that generates a structured SEO report from a website URL.
|
| 7 |
+
The report should highlight **content audit** and **technical SEO performance**, and demonstrate the framework for future modules (keywords, backlinks, competitors).
|
| 8 |
+
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
## Scope (v1)
|
| 12 |
+
|
| 13 |
+
**In scope**
|
| 14 |
+
|
| 15 |
+
1. **Input**:
|
| 16 |
+
|
| 17 |
+
* User enters website URL (and optional competitor domains).
|
| 18 |
+
* System validates and normalizes URL.
|
| 19 |
+
|
| 20 |
+
2. **Modules implemented**:
|
| 21 |
+
|
| 22 |
+
* **Technical SEO** (PageSpeed Insights API)
|
| 23 |
+
|
| 24 |
+
* Mobile & desktop performance scores
|
| 25 |
+
* Core Web Vitals (LCP, CLS, INP)
|
| 26 |
+
* Key flagged issues (e.g., oversized images, render-blocking JS)
|
| 27 |
+
* **Content Audit** (custom crawl)
|
| 28 |
+
|
| 29 |
+
* # of pages discovered (via sitemap / bounded crawl, capped \~200)
|
| 30 |
+
* Metadata completeness (Title, Description, H1)
|
| 31 |
+
* Avg. word count per page
|
| 32 |
+
* CTA keyword presence (βcontactβ, βdownloadβ, etc.)
|
| 33 |
+
* Content freshness (last modified vs today)
|
| 34 |
+
|
| 35 |
+
3. **Report generation**:
|
| 36 |
+
|
| 37 |
+
* Render as **HTML** report (modular sections).
|
| 38 |
+
* Provide **Download as PDF** option (same HTML rendered to PDF).
|
| 39 |
+
* Include **charts/visuals** (e.g., doughnut/pie for metadata completeness, freshness buckets, bar for Core Web Vitals vs benchmarks).
|
| 40 |
+
|
| 41 |
+
4. **Interface**:
|
| 42 |
+
|
| 43 |
+
* **Streamlit app** for demo UI.
|
| 44 |
+
* Inputs: URL (+ optional competitor domains).
|
| 45 |
+
* Buttons: βGenerate Reportβ, βDownload PDFβ.
|
| 46 |
+
* Report preview inline in Streamlit.
|
| 47 |
+
|
| 48 |
+
**Out of scope (v1, stub/fallback only)**
|
| 49 |
+
|
| 50 |
+
* Keyword Rankings (GSC/SEMrush) β show placeholder section.
|
| 51 |
+
* Backlink Profile (Ahrefs/SEMrush) β placeholder section.
|
| 52 |
+
* Competitor benchmarking β limited to PageSpeed/content freshness comparison if URLs provided.
|
| 53 |
+
* GA4 / conversion metrics.
|
| 54 |
+
|
| 55 |
+
---
|
| 56 |
+
|
| 57 |
+
## Output structure (MVP report)
|
| 58 |
+
|
| 59 |
+
1. **Executive Summary**
|
| 60 |
+
|
| 61 |
+
* Quick health snapshot: Technical performance + Content audit highlights.
|
| 62 |
+
* βQuick winsβ (e.g., missing metadata, low mobile score).
|
| 63 |
+
|
| 64 |
+
2. **Technical SEO**
|
| 65 |
+
|
| 66 |
+
* PageSpeed scores (Mobile + Desktop).
|
| 67 |
+
* Core Web Vitals chart.
|
| 68 |
+
* Top issues flagged.
|
| 69 |
+
|
| 70 |
+
3. **Content Audit**
|
| 71 |
+
|
| 72 |
+
* Indexed pages count (discovered pages).
|
| 73 |
+
* Metadata completeness (% with title, description, H1).
|
| 74 |
+
* Avg. word count per page (vs benchmark 800β1200 words).
|
| 75 |
+
* CTA presence (% pages with calls-to-action).
|
| 76 |
+
* Content freshness buckets (<6 months, 6β18 months, >18 months).
|
| 77 |
+
|
| 78 |
+
4. **Competitor Light (optional if input provided)**
|
| 79 |
+
|
| 80 |
+
* PageSpeed score comparison.
|
| 81 |
+
* Content freshness comparison (avg. last-modified).
|
| 82 |
+
|
| 83 |
+
5. **Placeholder sections**
|
| 84 |
+
|
| 85 |
+
* Keywords, backlinks, conversions β visible but labeled as βto be added in future versions.β
|
| 86 |
+
|
| 87 |
+
6. **Recommendations**
|
| 88 |
+
|
| 89 |
+
* Auto-generated based on findings (ruleset from benchmarks).
|
| 90 |
+
* Example: β50% of pages missing meta descriptions β prioritize metadata optimization.β
|
| 91 |
+
|
| 92 |
+
---
|
| 93 |
+
|
| 94 |
+
## Success criteria
|
| 95 |
+
|
| 96 |
+
* **Functional**: User can input a URL and receive a full HTML + PDF report in <3 minutes.
|
| 97 |
+
* **Professional output**: Report visually resembles an agency deck (charts, tables, summaries).
|
| 98 |
+
* **Modular design**: Technical SEO and Content Audit implemented as independent modules, with stubs for others.
|
| 99 |
+
* **Extensible**: Report generator uses templates so adding future modules is straightforward.
|
| 100 |
+
|
| 101 |
+
---
|
| 102 |
+
|
| 103 |
+
## Evaluation metrics
|
| 104 |
+
|
| 105 |
+
* Report generates without failures for at least 3 different domains.
|
| 106 |
+
* PageSpeed data fetched reliably via Google API.
|
| 107 |
+
* Crawl completes within 200 pages, respecting robots.txt.
|
| 108 |
+
* Charts render correctly in HTML and export cleanly to PDF.
|
| 109 |
+
* Report structure matches defined format.
|
| 110 |
+
|
| 111 |
+
---
|
| 112 |
+
|
| 113 |
+
This PRD keeps the v1 realistic (2β4 days build) while laying the bones for the full system.
|
| 114 |
+
|
| 115 |
+
Do you want me to next **map this PRD to required API keys/libraries** so we know what accounts to set up before coding, or should we first design the **module interfaces (input/output contract)**?
|
modules/__init__.py
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
# SEO Analysis Modules
|
modules/__pycache__/__init__.cpython-313.pyc
ADDED
|
Binary file (144 Bytes). View file
|
|
|
modules/__pycache__/content_audit.cpython-313.pyc
ADDED
|
Binary file (17.1 kB). View file
|
|
|
modules/__pycache__/technical_seo.cpython-313.pyc
ADDED
|
Binary file (9.8 kB). View file
|
|
|
modules/content_audit.py
ADDED
|
@@ -0,0 +1,388 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import requests
|
| 2 |
+
from bs4 import BeautifulSoup
|
| 3 |
+
from urllib.parse import urljoin, urlparse, parse_qs
|
| 4 |
+
import re
|
| 5 |
+
from datetime import datetime, timedelta
|
| 6 |
+
from typing import Dict, Any, List, Set
|
| 7 |
+
import xml.etree.ElementTree as ET
|
| 8 |
+
|
| 9 |
+
class ContentAuditModule:
|
| 10 |
+
def __init__(self):
|
| 11 |
+
self.session = requests.Session()
|
| 12 |
+
self.session.headers.update({
|
| 13 |
+
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
|
| 14 |
+
})
|
| 15 |
+
|
| 16 |
+
# CTA keywords to look for
|
| 17 |
+
self.cta_keywords = [
|
| 18 |
+
'contact', 'download', 'subscribe', 'buy', 'purchase', 'order',
|
| 19 |
+
'register', 'sign up', 'get started', 'learn more', 'book now',
|
| 20 |
+
'free trial', 'demo', 'consultation', 'quote', 'call now'
|
| 21 |
+
]
|
| 22 |
+
|
| 23 |
+
def analyze(self, url: str, quick_scan: bool = False) -> Dict[str, Any]:
|
| 24 |
+
"""
|
| 25 |
+
Perform content audit for a given URL
|
| 26 |
+
|
| 27 |
+
Args:
|
| 28 |
+
url: Website URL to analyze
|
| 29 |
+
quick_scan: If True, perform limited analysis (for competitors)
|
| 30 |
+
|
| 31 |
+
Returns:
|
| 32 |
+
Dictionary containing content audit metrics
|
| 33 |
+
"""
|
| 34 |
+
try:
|
| 35 |
+
# Normalize URL
|
| 36 |
+
if not url.startswith(('http://', 'https://')):
|
| 37 |
+
url = 'https://' + url
|
| 38 |
+
|
| 39 |
+
# Get sitemap URLs
|
| 40 |
+
sitemap_urls = self._get_sitemap_urls(url, limit=200 if not quick_scan else 50)
|
| 41 |
+
|
| 42 |
+
# If no sitemap, crawl from homepage
|
| 43 |
+
if not sitemap_urls:
|
| 44 |
+
sitemap_urls = self._crawl_from_homepage(url, limit=50 if not quick_scan else 20)
|
| 45 |
+
|
| 46 |
+
# Analyze pages
|
| 47 |
+
pages_analyzed = []
|
| 48 |
+
for page_url in sitemap_urls[:200 if not quick_scan else 20]:
|
| 49 |
+
page_data = self._analyze_page(page_url)
|
| 50 |
+
if page_data:
|
| 51 |
+
pages_analyzed.append(page_data)
|
| 52 |
+
|
| 53 |
+
# Calculate aggregate metrics
|
| 54 |
+
result = self._calculate_metrics(url, pages_analyzed, quick_scan)
|
| 55 |
+
|
| 56 |
+
return result
|
| 57 |
+
|
| 58 |
+
except Exception as e:
|
| 59 |
+
return self._get_fallback_data(url, str(e))
|
| 60 |
+
|
| 61 |
+
def _get_sitemap_urls(self, base_url: str, limit: int = 200) -> List[str]:
|
| 62 |
+
"""Extract URLs from sitemap.xml"""
|
| 63 |
+
urls = []
|
| 64 |
+
|
| 65 |
+
# Common sitemap locations
|
| 66 |
+
sitemap_locations = [
|
| 67 |
+
f"{base_url}/sitemap.xml",
|
| 68 |
+
f"{base_url}/sitemap_index.xml",
|
| 69 |
+
f"{base_url}/sitemaps/sitemap.xml"
|
| 70 |
+
]
|
| 71 |
+
|
| 72 |
+
for sitemap_url in sitemap_locations:
|
| 73 |
+
try:
|
| 74 |
+
response = self.session.get(sitemap_url, timeout=10)
|
| 75 |
+
if response.status_code == 200:
|
| 76 |
+
urls.extend(self._parse_sitemap(response.content, base_url, limit))
|
| 77 |
+
break
|
| 78 |
+
except:
|
| 79 |
+
continue
|
| 80 |
+
|
| 81 |
+
return urls[:limit]
|
| 82 |
+
|
| 83 |
+
def _parse_sitemap(self, sitemap_content: bytes, base_url: str, limit: int) -> List[str]:
|
| 84 |
+
"""Parse sitemap XML content"""
|
| 85 |
+
urls = []
|
| 86 |
+
|
| 87 |
+
try:
|
| 88 |
+
root = ET.fromstring(sitemap_content)
|
| 89 |
+
|
| 90 |
+
# Handle sitemap index
|
| 91 |
+
for sitemap_elem in root.findall('.//{http://www.sitemaps.org/schemas/sitemap/0.9}sitemap'):
|
| 92 |
+
loc_elem = sitemap_elem.find('{http://www.sitemaps.org/schemas/sitemap/0.9}loc')
|
| 93 |
+
if loc_elem is not None and len(urls) < limit:
|
| 94 |
+
# Recursively parse sub-sitemaps
|
| 95 |
+
try:
|
| 96 |
+
response = self.session.get(loc_elem.text, timeout=10)
|
| 97 |
+
if response.status_code == 200:
|
| 98 |
+
sub_urls = self._parse_sitemap(response.content, base_url, limit - len(urls))
|
| 99 |
+
urls.extend(sub_urls)
|
| 100 |
+
except:
|
| 101 |
+
continue
|
| 102 |
+
|
| 103 |
+
# Handle direct URL entries
|
| 104 |
+
for url_elem in root.findall('.//{http://www.sitemaps.org/schemas/sitemap/0.9}url'):
|
| 105 |
+
if len(urls) >= limit:
|
| 106 |
+
break
|
| 107 |
+
|
| 108 |
+
loc_elem = url_elem.find('{http://www.sitemaps.org/schemas/sitemap/0.9}loc')
|
| 109 |
+
if loc_elem is not None:
|
| 110 |
+
url = loc_elem.text
|
| 111 |
+
if self._is_valid_content_url(url):
|
| 112 |
+
urls.append(url)
|
| 113 |
+
|
| 114 |
+
except ET.ParseError:
|
| 115 |
+
pass
|
| 116 |
+
|
| 117 |
+
return urls[:limit]
|
| 118 |
+
|
| 119 |
+
def _crawl_from_homepage(self, base_url: str, limit: int = 50) -> List[str]:
|
| 120 |
+
"""Crawl URLs starting from homepage"""
|
| 121 |
+
urls = set([base_url])
|
| 122 |
+
processed = set()
|
| 123 |
+
|
| 124 |
+
try:
|
| 125 |
+
response = self.session.get(base_url, timeout=10)
|
| 126 |
+
if response.status_code == 200:
|
| 127 |
+
soup = BeautifulSoup(response.content, 'html.parser')
|
| 128 |
+
|
| 129 |
+
# Find all internal links
|
| 130 |
+
for link in soup.find_all('a', href=True):
|
| 131 |
+
if len(urls) >= limit:
|
| 132 |
+
break
|
| 133 |
+
|
| 134 |
+
href = link['href']
|
| 135 |
+
full_url = urljoin(base_url, href)
|
| 136 |
+
|
| 137 |
+
if self._is_same_domain(full_url, base_url) and self._is_valid_content_url(full_url):
|
| 138 |
+
urls.add(full_url)
|
| 139 |
+
|
| 140 |
+
except:
|
| 141 |
+
pass
|
| 142 |
+
|
| 143 |
+
return list(urls)[:limit]
|
| 144 |
+
|
| 145 |
+
def _analyze_page(self, url: str) -> Dict[str, Any]:
|
| 146 |
+
"""Analyze a single page"""
|
| 147 |
+
try:
|
| 148 |
+
response = self.session.get(url, timeout=15)
|
| 149 |
+
if response.status_code != 200:
|
| 150 |
+
return None
|
| 151 |
+
|
| 152 |
+
soup = BeautifulSoup(response.content, 'html.parser')
|
| 153 |
+
|
| 154 |
+
# Extract metadata
|
| 155 |
+
title = soup.find('title')
|
| 156 |
+
title_text = title.text.strip() if title else ""
|
| 157 |
+
|
| 158 |
+
meta_description = soup.find('meta', attrs={'name': 'description'})
|
| 159 |
+
description_text = meta_description.get('content', '').strip() if meta_description else ""
|
| 160 |
+
|
| 161 |
+
# H1 tags
|
| 162 |
+
h1_tags = soup.find_all('h1')
|
| 163 |
+
h1_text = [h1.text.strip() for h1 in h1_tags]
|
| 164 |
+
|
| 165 |
+
# Word count (main content)
|
| 166 |
+
content_text = self._extract_main_content(soup)
|
| 167 |
+
word_count = len(content_text.split()) if content_text else 0
|
| 168 |
+
|
| 169 |
+
# CTA presence
|
| 170 |
+
has_cta = self._detect_cta(soup)
|
| 171 |
+
|
| 172 |
+
# Last modified (if available)
|
| 173 |
+
last_modified = self._get_last_modified(response.headers, soup)
|
| 174 |
+
|
| 175 |
+
return {
|
| 176 |
+
'url': url,
|
| 177 |
+
'title': title_text,
|
| 178 |
+
'title_length': len(title_text),
|
| 179 |
+
'meta_description': description_text,
|
| 180 |
+
'description_length': len(description_text),
|
| 181 |
+
'h1_tags': h1_text,
|
| 182 |
+
'h1_count': len(h1_text),
|
| 183 |
+
'word_count': word_count,
|
| 184 |
+
'has_cta': has_cta,
|
| 185 |
+
'last_modified': last_modified,
|
| 186 |
+
'status_code': response.status_code
|
| 187 |
+
}
|
| 188 |
+
|
| 189 |
+
except Exception as e:
|
| 190 |
+
return {
|
| 191 |
+
'url': url,
|
| 192 |
+
'error': str(e),
|
| 193 |
+
'status_code': 0
|
| 194 |
+
}
|
| 195 |
+
|
| 196 |
+
def _extract_main_content(self, soup: BeautifulSoup) -> str:
|
| 197 |
+
"""Extract main content text from HTML"""
|
| 198 |
+
# Remove script and style elements
|
| 199 |
+
for script in soup(["script", "style", "nav", "header", "footer"]):
|
| 200 |
+
script.decompose()
|
| 201 |
+
|
| 202 |
+
# Try to find main content areas
|
| 203 |
+
main_content = soup.find('main') or soup.find('article') or soup.find('div', class_=re.compile(r'content|main|body'))
|
| 204 |
+
|
| 205 |
+
if main_content:
|
| 206 |
+
return main_content.get_text()
|
| 207 |
+
else:
|
| 208 |
+
return soup.get_text()
|
| 209 |
+
|
| 210 |
+
def _detect_cta(self, soup: BeautifulSoup) -> bool:
|
| 211 |
+
"""Detect presence of call-to-action elements"""
|
| 212 |
+
text_content = soup.get_text().lower()
|
| 213 |
+
|
| 214 |
+
for keyword in self.cta_keywords:
|
| 215 |
+
if keyword in text_content:
|
| 216 |
+
return True
|
| 217 |
+
|
| 218 |
+
# Check for buttons and links with CTA-like text
|
| 219 |
+
for element in soup.find_all(['button', 'a']):
|
| 220 |
+
element_text = element.get_text().lower()
|
| 221 |
+
for keyword in self.cta_keywords:
|
| 222 |
+
if keyword in element_text:
|
| 223 |
+
return True
|
| 224 |
+
|
| 225 |
+
return False
|
| 226 |
+
|
| 227 |
+
def _get_last_modified(self, headers: Dict, soup: BeautifulSoup) -> str:
|
| 228 |
+
"""Get last modified date from headers or meta tags"""
|
| 229 |
+
# Check headers first
|
| 230 |
+
if 'last-modified' in headers:
|
| 231 |
+
return headers['last-modified']
|
| 232 |
+
|
| 233 |
+
# Check meta tags
|
| 234 |
+
meta_modified = soup.find('meta', attrs={'name': 'last-modified'}) or \
|
| 235 |
+
soup.find('meta', attrs={'property': 'article:modified_time'})
|
| 236 |
+
|
| 237 |
+
if meta_modified:
|
| 238 |
+
return meta_modified.get('content', '')
|
| 239 |
+
|
| 240 |
+
return ""
|
| 241 |
+
|
| 242 |
+
def _is_valid_content_url(self, url: str) -> bool:
|
| 243 |
+
"""Check if URL is valid for content analysis"""
|
| 244 |
+
if not url:
|
| 245 |
+
return False
|
| 246 |
+
|
| 247 |
+
# Skip non-content URLs
|
| 248 |
+
skip_extensions = ['.pdf', '.jpg', '.png', '.gif', '.css', '.js', '.xml']
|
| 249 |
+
skip_paths = ['/wp-admin/', '/admin/', '/api/', '/feed/']
|
| 250 |
+
|
| 251 |
+
url_lower = url.lower()
|
| 252 |
+
|
| 253 |
+
for ext in skip_extensions:
|
| 254 |
+
if url_lower.endswith(ext):
|
| 255 |
+
return False
|
| 256 |
+
|
| 257 |
+
for path in skip_paths:
|
| 258 |
+
if path in url_lower:
|
| 259 |
+
return False
|
| 260 |
+
|
| 261 |
+
return True
|
| 262 |
+
|
| 263 |
+
def _is_same_domain(self, url1: str, url2: str) -> bool:
|
| 264 |
+
"""Check if two URLs are from the same domain"""
|
| 265 |
+
try:
|
| 266 |
+
domain1 = urlparse(url1).netloc
|
| 267 |
+
domain2 = urlparse(url2).netloc
|
| 268 |
+
return domain1 == domain2
|
| 269 |
+
except:
|
| 270 |
+
return False
|
| 271 |
+
|
| 272 |
+
def _calculate_metrics(self, base_url: str, pages_data: List[Dict], quick_scan: bool) -> Dict[str, Any]:
|
| 273 |
+
"""Calculate aggregate metrics from page data"""
|
| 274 |
+
total_pages = len(pages_data)
|
| 275 |
+
valid_pages = [p for p in pages_data if 'error' not in p]
|
| 276 |
+
|
| 277 |
+
if not valid_pages:
|
| 278 |
+
return self._get_fallback_data(base_url, "No valid pages found")
|
| 279 |
+
|
| 280 |
+
# Title metrics
|
| 281 |
+
pages_with_title = len([p for p in valid_pages if p.get('title')])
|
| 282 |
+
avg_title_length = sum(p.get('title_length', 0) for p in valid_pages) / len(valid_pages)
|
| 283 |
+
|
| 284 |
+
# Meta description metrics
|
| 285 |
+
pages_with_description = len([p for p in valid_pages if p.get('meta_description')])
|
| 286 |
+
avg_description_length = sum(p.get('description_length', 0) for p in valid_pages) / len(valid_pages)
|
| 287 |
+
|
| 288 |
+
# H1 metrics
|
| 289 |
+
pages_with_h1 = len([p for p in valid_pages if p.get('h1_count', 0) > 0])
|
| 290 |
+
|
| 291 |
+
# Word count metrics
|
| 292 |
+
word_counts = [p.get('word_count', 0) for p in valid_pages if p.get('word_count', 0) > 0]
|
| 293 |
+
avg_word_count = sum(word_counts) / len(word_counts) if word_counts else 0
|
| 294 |
+
|
| 295 |
+
# CTA metrics
|
| 296 |
+
pages_with_cta = len([p for p in valid_pages if p.get('has_cta')])
|
| 297 |
+
|
| 298 |
+
# Content freshness
|
| 299 |
+
freshness_data = self._analyze_content_freshness(valid_pages)
|
| 300 |
+
|
| 301 |
+
return {
|
| 302 |
+
'url': base_url,
|
| 303 |
+
'total_pages_discovered': total_pages,
|
| 304 |
+
'pages_analyzed': len(valid_pages),
|
| 305 |
+
'metadata_completeness': {
|
| 306 |
+
'title_coverage': round((pages_with_title / len(valid_pages)) * 100, 1) if valid_pages else 0,
|
| 307 |
+
'description_coverage': round((pages_with_description / len(valid_pages)) * 100, 1) if valid_pages else 0,
|
| 308 |
+
'h1_coverage': round((pages_with_h1 / len(valid_pages)) * 100, 1) if valid_pages else 0,
|
| 309 |
+
'avg_title_length': round(avg_title_length, 1),
|
| 310 |
+
'avg_description_length': round(avg_description_length, 1)
|
| 311 |
+
},
|
| 312 |
+
'content_metrics': {
|
| 313 |
+
'avg_word_count': round(avg_word_count, 0),
|
| 314 |
+
'cta_coverage': round((pages_with_cta / len(valid_pages)) * 100, 1) if valid_pages else 0
|
| 315 |
+
},
|
| 316 |
+
'content_freshness': freshness_data,
|
| 317 |
+
'quick_scan': quick_scan
|
| 318 |
+
}
|
| 319 |
+
|
| 320 |
+
def _analyze_content_freshness(self, pages_data: List[Dict]) -> Dict[str, Any]:
|
| 321 |
+
"""Analyze content freshness based on last modified dates"""
|
| 322 |
+
now = datetime.now()
|
| 323 |
+
six_months_ago = now - timedelta(days=180)
|
| 324 |
+
eighteen_months_ago = now - timedelta(days=540)
|
| 325 |
+
|
| 326 |
+
fresh_count = 0
|
| 327 |
+
moderate_count = 0
|
| 328 |
+
stale_count = 0
|
| 329 |
+
unknown_count = 0
|
| 330 |
+
|
| 331 |
+
for page in pages_data:
|
| 332 |
+
last_modified = page.get('last_modified', '')
|
| 333 |
+
if not last_modified:
|
| 334 |
+
unknown_count += 1
|
| 335 |
+
continue
|
| 336 |
+
|
| 337 |
+
try:
|
| 338 |
+
# Parse various date formats
|
| 339 |
+
if 'GMT' in last_modified:
|
| 340 |
+
modified_date = datetime.strptime(last_modified, '%a, %d %b %Y %H:%M:%S GMT')
|
| 341 |
+
else:
|
| 342 |
+
# Try ISO format
|
| 343 |
+
modified_date = datetime.fromisoformat(last_modified.replace('Z', '+00:00'))
|
| 344 |
+
|
| 345 |
+
if modified_date >= six_months_ago:
|
| 346 |
+
fresh_count += 1
|
| 347 |
+
elif modified_date >= eighteen_months_ago:
|
| 348 |
+
moderate_count += 1
|
| 349 |
+
else:
|
| 350 |
+
stale_count += 1
|
| 351 |
+
|
| 352 |
+
except:
|
| 353 |
+
unknown_count += 1
|
| 354 |
+
|
| 355 |
+
total = len(pages_data)
|
| 356 |
+
return {
|
| 357 |
+
'fresh_content': {'count': fresh_count, 'percentage': round((fresh_count / total) * 100, 1) if total > 0 else 0},
|
| 358 |
+
'moderate_content': {'count': moderate_count, 'percentage': round((moderate_count / total) * 100, 1) if total > 0 else 0},
|
| 359 |
+
'stale_content': {'count': stale_count, 'percentage': round((stale_count / total) * 100, 1) if total > 0 else 0},
|
| 360 |
+
'unknown_date': {'count': unknown_count, 'percentage': round((unknown_count / total) * 100, 1) if total > 0 else 0}
|
| 361 |
+
}
|
| 362 |
+
|
| 363 |
+
def _get_fallback_data(self, url: str, error: str) -> Dict[str, Any]:
|
| 364 |
+
"""Return fallback data when analysis fails"""
|
| 365 |
+
return {
|
| 366 |
+
'url': url,
|
| 367 |
+
'error': f"Content audit failed: {error}",
|
| 368 |
+
'total_pages_discovered': 0,
|
| 369 |
+
'pages_analyzed': 0,
|
| 370 |
+
'metadata_completeness': {
|
| 371 |
+
'title_coverage': 0,
|
| 372 |
+
'description_coverage': 0,
|
| 373 |
+
'h1_coverage': 0,
|
| 374 |
+
'avg_title_length': 0,
|
| 375 |
+
'avg_description_length': 0
|
| 376 |
+
},
|
| 377 |
+
'content_metrics': {
|
| 378 |
+
'avg_word_count': 0,
|
| 379 |
+
'cta_coverage': 0
|
| 380 |
+
},
|
| 381 |
+
'content_freshness': {
|
| 382 |
+
'fresh_content': {'count': 0, 'percentage': 0},
|
| 383 |
+
'moderate_content': {'count': 0, 'percentage': 0},
|
| 384 |
+
'stale_content': {'count': 0, 'percentage': 0},
|
| 385 |
+
'unknown_date': {'count': 0, 'percentage': 0}
|
| 386 |
+
},
|
| 387 |
+
'quick_scan': False
|
| 388 |
+
}
|
modules/technical_seo.py
ADDED
|
@@ -0,0 +1,191 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import requests
|
| 2 |
+
import time
|
| 3 |
+
from typing import Dict, Any, Optional
|
| 4 |
+
|
| 5 |
+
class TechnicalSEOModule:
|
| 6 |
+
def __init__(self, api_key: Optional[str] = None):
|
| 7 |
+
"""
|
| 8 |
+
Initialize Technical SEO module
|
| 9 |
+
|
| 10 |
+
Args:
|
| 11 |
+
api_key: Google PageSpeed Insights API key (optional for basic usage)
|
| 12 |
+
"""
|
| 13 |
+
self.api_key = api_key
|
| 14 |
+
self.base_url = "https://www.googleapis.com/pagespeedonline/v5/runPagespeed"
|
| 15 |
+
|
| 16 |
+
def analyze(self, url: str) -> Dict[str, Any]:
|
| 17 |
+
"""
|
| 18 |
+
Analyze technical SEO metrics for a given URL
|
| 19 |
+
|
| 20 |
+
Args:
|
| 21 |
+
url: Website URL to analyze
|
| 22 |
+
|
| 23 |
+
Returns:
|
| 24 |
+
Dictionary containing technical SEO metrics
|
| 25 |
+
"""
|
| 26 |
+
try:
|
| 27 |
+
# Get mobile and desktop metrics
|
| 28 |
+
mobile_data = self._get_pagespeed_data(url, strategy='mobile')
|
| 29 |
+
desktop_data = self._get_pagespeed_data(url, strategy='desktop')
|
| 30 |
+
|
| 31 |
+
# Extract key metrics
|
| 32 |
+
result = {
|
| 33 |
+
'url': url,
|
| 34 |
+
'mobile': self._extract_metrics(mobile_data, 'mobile'),
|
| 35 |
+
'desktop': self._extract_metrics(desktop_data, 'desktop'),
|
| 36 |
+
'core_web_vitals': self._extract_core_web_vitals(mobile_data, desktop_data),
|
| 37 |
+
'opportunities': self._extract_opportunities(mobile_data, desktop_data),
|
| 38 |
+
'diagnostics': self._extract_diagnostics(mobile_data, desktop_data)
|
| 39 |
+
}
|
| 40 |
+
|
| 41 |
+
return result
|
| 42 |
+
|
| 43 |
+
except Exception as e:
|
| 44 |
+
# Fallback data if API fails
|
| 45 |
+
return self._get_fallback_data(url, str(e))
|
| 46 |
+
|
| 47 |
+
def _get_pagespeed_data(self, url: str, strategy: str) -> Dict[str, Any]:
|
| 48 |
+
"""Get PageSpeed Insights data for URL and strategy"""
|
| 49 |
+
params = {
|
| 50 |
+
'url': url,
|
| 51 |
+
'strategy': strategy,
|
| 52 |
+
'category': ['PERFORMANCE', 'SEO', 'ACCESSIBILITY', 'BEST_PRACTICES']
|
| 53 |
+
}
|
| 54 |
+
|
| 55 |
+
if self.api_key:
|
| 56 |
+
params['key'] = self.api_key
|
| 57 |
+
|
| 58 |
+
try:
|
| 59 |
+
response = requests.get(self.base_url, params=params, timeout=30)
|
| 60 |
+
response.raise_for_status()
|
| 61 |
+
return response.json()
|
| 62 |
+
except requests.exceptions.RequestException as e:
|
| 63 |
+
print(f"API request failed: {e}")
|
| 64 |
+
raise
|
| 65 |
+
|
| 66 |
+
def _extract_metrics(self, data: Dict[str, Any], strategy: str) -> Dict[str, Any]:
|
| 67 |
+
"""Extract key performance metrics from PageSpeed data"""
|
| 68 |
+
lighthouse_result = data.get('lighthouseResult', {})
|
| 69 |
+
categories = lighthouse_result.get('categories', {})
|
| 70 |
+
audits = lighthouse_result.get('audits', {})
|
| 71 |
+
|
| 72 |
+
# Performance score
|
| 73 |
+
performance_score = categories.get('performance', {}).get('score', 0) * 100 if categories.get('performance', {}).get('score') else 0
|
| 74 |
+
|
| 75 |
+
# SEO score
|
| 76 |
+
seo_score = categories.get('seo', {}).get('score', 0) * 100 if categories.get('seo', {}).get('score') else 0
|
| 77 |
+
|
| 78 |
+
# Accessibility score
|
| 79 |
+
accessibility_score = categories.get('accessibility', {}).get('score', 0) * 100 if categories.get('accessibility', {}).get('score') else 0
|
| 80 |
+
|
| 81 |
+
# Best practices score
|
| 82 |
+
best_practices_score = categories.get('best-practices', {}).get('score', 0) * 100 if categories.get('best-practices', {}).get('score') else 0
|
| 83 |
+
|
| 84 |
+
return {
|
| 85 |
+
'strategy': strategy,
|
| 86 |
+
'performance_score': round(performance_score, 1),
|
| 87 |
+
'seo_score': round(seo_score, 1),
|
| 88 |
+
'accessibility_score': round(accessibility_score, 1),
|
| 89 |
+
'best_practices_score': round(best_practices_score, 1),
|
| 90 |
+
'loading_experience': data.get('loadingExperience', {})
|
| 91 |
+
}
|
| 92 |
+
|
| 93 |
+
def _extract_core_web_vitals(self, mobile_data: Dict[str, Any], desktop_data: Dict[str, Any]) -> Dict[str, Any]:
|
| 94 |
+
"""Extract Core Web Vitals metrics"""
|
| 95 |
+
def get_metric_value(data, metric_key):
|
| 96 |
+
audits = data.get('lighthouseResult', {}).get('audits', {})
|
| 97 |
+
metric = audits.get(metric_key, {})
|
| 98 |
+
return metric.get('numericValue', 0) / 1000 if metric.get('numericValue') else 0
|
| 99 |
+
|
| 100 |
+
mobile_audits = mobile_data.get('lighthouseResult', {}).get('audits', {})
|
| 101 |
+
desktop_audits = desktop_data.get('lighthouseResult', {}).get('audits', {})
|
| 102 |
+
|
| 103 |
+
return {
|
| 104 |
+
'mobile': {
|
| 105 |
+
'lcp': round(get_metric_value(mobile_data, 'largest-contentful-paint'), 2),
|
| 106 |
+
'cls': round(mobile_audits.get('cumulative-layout-shift', {}).get('numericValue', 0), 3),
|
| 107 |
+
'inp': round(get_metric_value(mobile_data, 'interaction-to-next-paint'), 0),
|
| 108 |
+
'fcp': round(get_metric_value(mobile_data, 'first-contentful-paint'), 2)
|
| 109 |
+
},
|
| 110 |
+
'desktop': {
|
| 111 |
+
'lcp': round(get_metric_value(desktop_data, 'largest-contentful-paint'), 2),
|
| 112 |
+
'cls': round(desktop_audits.get('cumulative-layout-shift', {}).get('numericValue', 0), 3),
|
| 113 |
+
'inp': round(get_metric_value(desktop_data, 'interaction-to-next-paint'), 0),
|
| 114 |
+
'fcp': round(get_metric_value(desktop_data, 'first-contentful-paint'), 2)
|
| 115 |
+
}
|
| 116 |
+
}
|
| 117 |
+
|
| 118 |
+
def _extract_opportunities(self, mobile_data: Dict[str, Any], desktop_data: Dict[str, Any]) -> Dict[str, Any]:
|
| 119 |
+
"""Extract optimization opportunities"""
|
| 120 |
+
mobile_audits = mobile_data.get('lighthouseResult', {}).get('audits', {})
|
| 121 |
+
|
| 122 |
+
opportunities = []
|
| 123 |
+
opportunity_keys = [
|
| 124 |
+
'unused-css-rules', 'unused-javascript', 'modern-image-formats',
|
| 125 |
+
'offscreen-images', 'render-blocking-resources', 'unminified-css',
|
| 126 |
+
'unminified-javascript', 'efficient-animated-content'
|
| 127 |
+
]
|
| 128 |
+
|
| 129 |
+
for key in opportunity_keys:
|
| 130 |
+
audit = mobile_audits.get(key, {})
|
| 131 |
+
if audit.get('score', 1) < 0.9: # Only include if score is low
|
| 132 |
+
opportunities.append({
|
| 133 |
+
'id': key,
|
| 134 |
+
'title': audit.get('title', key.replace('-', ' ').title()),
|
| 135 |
+
'description': audit.get('description', ''),
|
| 136 |
+
'score': audit.get('score', 0),
|
| 137 |
+
'potential_savings': audit.get('details', {}).get('overallSavingsMs', 0)
|
| 138 |
+
})
|
| 139 |
+
|
| 140 |
+
return {'opportunities': opportunities[:5]} # Top 5 opportunities
|
| 141 |
+
|
| 142 |
+
def _extract_diagnostics(self, mobile_data: Dict[str, Any], desktop_data: Dict[str, Any]) -> Dict[str, Any]:
|
| 143 |
+
"""Extract diagnostic information"""
|
| 144 |
+
mobile_audits = mobile_data.get('lighthouseResult', {}).get('audits', {})
|
| 145 |
+
|
| 146 |
+
diagnostics = []
|
| 147 |
+
diagnostic_keys = [
|
| 148 |
+
'dom-size', 'uses-text-compression', 'uses-rel-preconnect',
|
| 149 |
+
'font-display', 'server-response-time', 'uses-responsive-images'
|
| 150 |
+
]
|
| 151 |
+
|
| 152 |
+
for key in diagnostic_keys:
|
| 153 |
+
audit = mobile_audits.get(key, {})
|
| 154 |
+
if audit.get('score', 1) < 1:
|
| 155 |
+
diagnostics.append({
|
| 156 |
+
'id': key,
|
| 157 |
+
'title': audit.get('title', key.replace('-', ' ').title()),
|
| 158 |
+
'description': audit.get('description', ''),
|
| 159 |
+
'score': audit.get('score', 0)
|
| 160 |
+
})
|
| 161 |
+
|
| 162 |
+
return {'diagnostics': diagnostics}
|
| 163 |
+
|
| 164 |
+
def _get_fallback_data(self, url: str, error: str) -> Dict[str, Any]:
|
| 165 |
+
"""Return fallback data when API fails"""
|
| 166 |
+
return {
|
| 167 |
+
'url': url,
|
| 168 |
+
'error': f"PageSpeed API unavailable: {error}",
|
| 169 |
+
'mobile': {
|
| 170 |
+
'strategy': 'mobile',
|
| 171 |
+
'performance_score': 0,
|
| 172 |
+
'seo_score': 0,
|
| 173 |
+
'accessibility_score': 0,
|
| 174 |
+
'best_practices_score': 0,
|
| 175 |
+
'loading_experience': {}
|
| 176 |
+
},
|
| 177 |
+
'desktop': {
|
| 178 |
+
'strategy': 'desktop',
|
| 179 |
+
'performance_score': 0,
|
| 180 |
+
'seo_score': 0,
|
| 181 |
+
'accessibility_score': 0,
|
| 182 |
+
'best_practices_score': 0,
|
| 183 |
+
'loading_experience': {}
|
| 184 |
+
},
|
| 185 |
+
'core_web_vitals': {
|
| 186 |
+
'mobile': {'lcp': 0, 'cls': 0, 'inp': 0, 'fcp': 0},
|
| 187 |
+
'desktop': {'lcp': 0, 'cls': 0, 'inp': 0, 'fcp': 0}
|
| 188 |
+
},
|
| 189 |
+
'opportunities': {'opportunities': []},
|
| 190 |
+
'diagnostics': {'diagnostics': []}
|
| 191 |
+
}
|
pdf_generator.py
ADDED
|
@@ -0,0 +1,457 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from weasyprint import HTML, CSS
|
| 2 |
+
import base64
|
| 3 |
+
import io
|
| 4 |
+
from typing import Dict, Any, List
|
| 5 |
+
|
| 6 |
+
class PDFGenerator:
|
| 7 |
+
def __init__(self):
|
| 8 |
+
self.css_styles = self._get_pdf_styles()
|
| 9 |
+
|
| 10 |
+
def generate_pdf(self, html_content: str) -> bytes:
|
| 11 |
+
"""
|
| 12 |
+
Generate PDF from HTML content
|
| 13 |
+
|
| 14 |
+
Args:
|
| 15 |
+
html_content: HTML string to convert to PDF
|
| 16 |
+
|
| 17 |
+
Returns:
|
| 18 |
+
PDF content as bytes
|
| 19 |
+
"""
|
| 20 |
+
try:
|
| 21 |
+
# Clean HTML for PDF generation (remove interactive elements)
|
| 22 |
+
pdf_html = self._prepare_html_for_pdf(html_content)
|
| 23 |
+
|
| 24 |
+
# Create HTML document
|
| 25 |
+
html_doc = HTML(string=pdf_html)
|
| 26 |
+
|
| 27 |
+
# Generate PDF
|
| 28 |
+
pdf_buffer = io.BytesIO()
|
| 29 |
+
html_doc.write_pdf(pdf_buffer, stylesheets=[CSS(string=self.css_styles)])
|
| 30 |
+
|
| 31 |
+
return pdf_buffer.getvalue()
|
| 32 |
+
|
| 33 |
+
except Exception as e:
|
| 34 |
+
print(f"PDF generation failed: {e}")
|
| 35 |
+
raise
|
| 36 |
+
|
| 37 |
+
def _prepare_html_for_pdf(self, html_content: str) -> str:
|
| 38 |
+
"""
|
| 39 |
+
Prepare HTML content for PDF generation by removing interactive elements
|
| 40 |
+
"""
|
| 41 |
+
# Remove Plotly scripts and interactive charts
|
| 42 |
+
# Replace with static chart placeholders
|
| 43 |
+
pdf_html = html_content.replace(
|
| 44 |
+
'<script src="https://cdn.plot.ly/plotly-latest.min.js"></script>',
|
| 45 |
+
''
|
| 46 |
+
)
|
| 47 |
+
|
| 48 |
+
# Remove any JavaScript
|
| 49 |
+
import re
|
| 50 |
+
pdf_html = re.sub(r'<script[^>]*>.*?</script>', '', pdf_html, flags=re.DOTALL)
|
| 51 |
+
|
| 52 |
+
# Replace interactive Plotly divs with chart placeholders
|
| 53 |
+
pdf_html = re.sub(
|
| 54 |
+
r'<div[^>]*class="plotly-graph-div"[^>]*>.*?</div>',
|
| 55 |
+
'<div class="chart-placeholder"><p>π Chart: View interactive version in HTML report</p></div>',
|
| 56 |
+
pdf_html,
|
| 57 |
+
flags=re.DOTALL
|
| 58 |
+
)
|
| 59 |
+
|
| 60 |
+
return pdf_html
|
| 61 |
+
|
| 62 |
+
def _get_pdf_styles(self) -> str:
|
| 63 |
+
"""
|
| 64 |
+
Get CSS styles optimized for PDF generation
|
| 65 |
+
"""
|
| 66 |
+
return """
|
| 67 |
+
@page {
|
| 68 |
+
margin: 2cm;
|
| 69 |
+
size: A4;
|
| 70 |
+
@top-center {
|
| 71 |
+
content: "SEO Report";
|
| 72 |
+
font-size: 10pt;
|
| 73 |
+
color: #666;
|
| 74 |
+
}
|
| 75 |
+
@bottom-center {
|
| 76 |
+
content: "Page " counter(page) " of " counter(pages);
|
| 77 |
+
font-size: 10pt;
|
| 78 |
+
color: #666;
|
| 79 |
+
}
|
| 80 |
+
}
|
| 81 |
+
|
| 82 |
+
* {
|
| 83 |
+
margin: 0;
|
| 84 |
+
padding: 0;
|
| 85 |
+
box-sizing: border-box;
|
| 86 |
+
}
|
| 87 |
+
|
| 88 |
+
body {
|
| 89 |
+
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
|
| 90 |
+
line-height: 1.4;
|
| 91 |
+
color: #333;
|
| 92 |
+
font-size: 11pt;
|
| 93 |
+
}
|
| 94 |
+
|
| 95 |
+
.report-container {
|
| 96 |
+
max-width: 100%;
|
| 97 |
+
}
|
| 98 |
+
|
| 99 |
+
.report-header {
|
| 100 |
+
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
|
| 101 |
+
color: white;
|
| 102 |
+
padding: 30px;
|
| 103 |
+
text-align: center;
|
| 104 |
+
border-radius: 8px;
|
| 105 |
+
margin-bottom: 20px;
|
| 106 |
+
break-inside: avoid;
|
| 107 |
+
}
|
| 108 |
+
|
| 109 |
+
.report-header h1 {
|
| 110 |
+
font-size: 24pt;
|
| 111 |
+
margin-bottom: 10px;
|
| 112 |
+
}
|
| 113 |
+
|
| 114 |
+
.section {
|
| 115 |
+
background: white;
|
| 116 |
+
margin-bottom: 20px;
|
| 117 |
+
padding: 20px;
|
| 118 |
+
border: 1px solid #ddd;
|
| 119 |
+
border-radius: 8px;
|
| 120 |
+
break-inside: avoid-page;
|
| 121 |
+
}
|
| 122 |
+
|
| 123 |
+
.section h2 {
|
| 124 |
+
color: #2c3e50;
|
| 125 |
+
margin-bottom: 15px;
|
| 126 |
+
font-size: 16pt;
|
| 127 |
+
border-bottom: 2px solid #3498db;
|
| 128 |
+
padding-bottom: 5px;
|
| 129 |
+
}
|
| 130 |
+
|
| 131 |
+
.summary-card {
|
| 132 |
+
display: flex;
|
| 133 |
+
justify-content: space-between;
|
| 134 |
+
align-items: center;
|
| 135 |
+
margin-bottom: 20px;
|
| 136 |
+
padding: 15px;
|
| 137 |
+
background: #f8f9fa;
|
| 138 |
+
border-radius: 8px;
|
| 139 |
+
border: 1px solid #dee2e6;
|
| 140 |
+
}
|
| 141 |
+
|
| 142 |
+
.health-score {
|
| 143 |
+
text-align: center;
|
| 144 |
+
margin-right: 20px;
|
| 145 |
+
}
|
| 146 |
+
|
| 147 |
+
.score-circle {
|
| 148 |
+
width: 80px;
|
| 149 |
+
height: 80px;
|
| 150 |
+
border: 4px solid #3498db;
|
| 151 |
+
border-radius: 50%;
|
| 152 |
+
display: flex;
|
| 153 |
+
flex-direction: column;
|
| 154 |
+
align-items: center;
|
| 155 |
+
justify-content: center;
|
| 156 |
+
margin: 10px auto;
|
| 157 |
+
}
|
| 158 |
+
|
| 159 |
+
.score-number {
|
| 160 |
+
font-size: 18pt;
|
| 161 |
+
font-weight: bold;
|
| 162 |
+
color: #3498db;
|
| 163 |
+
}
|
| 164 |
+
|
| 165 |
+
.score-label {
|
| 166 |
+
font-size: 8pt;
|
| 167 |
+
}
|
| 168 |
+
|
| 169 |
+
.key-metrics {
|
| 170 |
+
display: flex;
|
| 171 |
+
gap: 20px;
|
| 172 |
+
flex: 1;
|
| 173 |
+
}
|
| 174 |
+
|
| 175 |
+
.metric {
|
| 176 |
+
text-align: center;
|
| 177 |
+
flex: 1;
|
| 178 |
+
}
|
| 179 |
+
|
| 180 |
+
.metric h4 {
|
| 181 |
+
margin-bottom: 5px;
|
| 182 |
+
font-size: 10pt;
|
| 183 |
+
color: #666;
|
| 184 |
+
}
|
| 185 |
+
|
| 186 |
+
.quick-wins {
|
| 187 |
+
background: #fff3cd;
|
| 188 |
+
border: 1px solid #ffeeba;
|
| 189 |
+
border-radius: 6px;
|
| 190 |
+
padding: 15px;
|
| 191 |
+
break-inside: avoid;
|
| 192 |
+
}
|
| 193 |
+
|
| 194 |
+
.quick-wins h3 {
|
| 195 |
+
color: #856404;
|
| 196 |
+
margin-bottom: 10px;
|
| 197 |
+
font-size: 12pt;
|
| 198 |
+
}
|
| 199 |
+
|
| 200 |
+
.quick-wins ul {
|
| 201 |
+
list-style-type: none;
|
| 202 |
+
}
|
| 203 |
+
|
| 204 |
+
.quick-wins li {
|
| 205 |
+
color: #856404;
|
| 206 |
+
margin-bottom: 5px;
|
| 207 |
+
padding-left: 15px;
|
| 208 |
+
position: relative;
|
| 209 |
+
}
|
| 210 |
+
|
| 211 |
+
.quick-wins li:before {
|
| 212 |
+
content: "β";
|
| 213 |
+
position: absolute;
|
| 214 |
+
left: 0;
|
| 215 |
+
color: #ffc107;
|
| 216 |
+
font-weight: bold;
|
| 217 |
+
}
|
| 218 |
+
|
| 219 |
+
.metric-row {
|
| 220 |
+
display: flex;
|
| 221 |
+
gap: 15px;
|
| 222 |
+
margin-bottom: 20px;
|
| 223 |
+
flex-wrap: wrap;
|
| 224 |
+
}
|
| 225 |
+
|
| 226 |
+
.metric-card {
|
| 227 |
+
background: #667eea;
|
| 228 |
+
color: white;
|
| 229 |
+
padding: 15px;
|
| 230 |
+
border-radius: 8px;
|
| 231 |
+
text-align: center;
|
| 232 |
+
flex: 1;
|
| 233 |
+
min-width: 120px;
|
| 234 |
+
}
|
| 235 |
+
|
| 236 |
+
.metric-card h4 {
|
| 237 |
+
font-size: 9pt;
|
| 238 |
+
margin-bottom: 8px;
|
| 239 |
+
opacity: 0.9;
|
| 240 |
+
}
|
| 241 |
+
|
| 242 |
+
.metric-card .score {
|
| 243 |
+
font-size: 16pt;
|
| 244 |
+
font-weight: bold;
|
| 245 |
+
}
|
| 246 |
+
|
| 247 |
+
.chart-placeholder {
|
| 248 |
+
background: #f8f9fa;
|
| 249 |
+
border: 2px dashed #ddd;
|
| 250 |
+
padding: 40px;
|
| 251 |
+
text-align: center;
|
| 252 |
+
border-radius: 8px;
|
| 253 |
+
margin: 15px 0;
|
| 254 |
+
}
|
| 255 |
+
|
| 256 |
+
.chart-placeholder p {
|
| 257 |
+
color: #666;
|
| 258 |
+
font-style: italic;
|
| 259 |
+
}
|
| 260 |
+
|
| 261 |
+
.stat {
|
| 262 |
+
display: flex;
|
| 263 |
+
justify-content: space-between;
|
| 264 |
+
align-items: center;
|
| 265 |
+
padding: 8px 0;
|
| 266 |
+
border-bottom: 1px solid #eee;
|
| 267 |
+
}
|
| 268 |
+
|
| 269 |
+
.stat:last-child {
|
| 270 |
+
border-bottom: none;
|
| 271 |
+
}
|
| 272 |
+
|
| 273 |
+
.stat .label {
|
| 274 |
+
font-weight: 600;
|
| 275 |
+
color: #2c3e50;
|
| 276 |
+
font-size: 10pt;
|
| 277 |
+
}
|
| 278 |
+
|
| 279 |
+
.stat .value {
|
| 280 |
+
font-weight: bold;
|
| 281 |
+
color: #3498db;
|
| 282 |
+
font-size: 10pt;
|
| 283 |
+
}
|
| 284 |
+
|
| 285 |
+
.stat .benchmark {
|
| 286 |
+
font-size: 8pt;
|
| 287 |
+
color: #7f8c8d;
|
| 288 |
+
}
|
| 289 |
+
|
| 290 |
+
.opportunity {
|
| 291 |
+
background: #f8f9fa;
|
| 292 |
+
border-left: 3px solid #ff6b6b;
|
| 293 |
+
padding: 10px;
|
| 294 |
+
margin-bottom: 10px;
|
| 295 |
+
break-inside: avoid;
|
| 296 |
+
}
|
| 297 |
+
|
| 298 |
+
.opportunity h4 {
|
| 299 |
+
color: #2c3e50;
|
| 300 |
+
margin-bottom: 5px;
|
| 301 |
+
font-size: 11pt;
|
| 302 |
+
}
|
| 303 |
+
|
| 304 |
+
.savings {
|
| 305 |
+
display: inline-block;
|
| 306 |
+
background: #ff6b6b;
|
| 307 |
+
color: white;
|
| 308 |
+
padding: 2px 6px;
|
| 309 |
+
border-radius: 3px;
|
| 310 |
+
font-size: 8pt;
|
| 311 |
+
margin-top: 5px;
|
| 312 |
+
}
|
| 313 |
+
|
| 314 |
+
.comparison-table {
|
| 315 |
+
width: 100%;
|
| 316 |
+
border-collapse: collapse;
|
| 317 |
+
margin-top: 15px;
|
| 318 |
+
font-size: 9pt;
|
| 319 |
+
}
|
| 320 |
+
|
| 321 |
+
.comparison-table th,
|
| 322 |
+
.comparison-table td {
|
| 323 |
+
padding: 8px;
|
| 324 |
+
text-align: left;
|
| 325 |
+
border-bottom: 1px solid #ddd;
|
| 326 |
+
}
|
| 327 |
+
|
| 328 |
+
.comparison-table th {
|
| 329 |
+
background: #f8f9fa;
|
| 330 |
+
font-weight: bold;
|
| 331 |
+
color: #2c3e50;
|
| 332 |
+
}
|
| 333 |
+
|
| 334 |
+
.primary-site {
|
| 335 |
+
background: #e8f5e8;
|
| 336 |
+
font-weight: bold;
|
| 337 |
+
}
|
| 338 |
+
|
| 339 |
+
.placeholder-sections {
|
| 340 |
+
display: flex;
|
| 341 |
+
flex-wrap: wrap;
|
| 342 |
+
gap: 15px;
|
| 343 |
+
}
|
| 344 |
+
|
| 345 |
+
.placeholder-section {
|
| 346 |
+
border: 2px dashed #ddd;
|
| 347 |
+
border-radius: 8px;
|
| 348 |
+
padding: 15px;
|
| 349 |
+
text-align: center;
|
| 350 |
+
background: #fafafa;
|
| 351 |
+
flex: 1;
|
| 352 |
+
min-width: 250px;
|
| 353 |
+
}
|
| 354 |
+
|
| 355 |
+
.placeholder-section h3 {
|
| 356 |
+
color: #7f8c8d;
|
| 357 |
+
margin-bottom: 10px;
|
| 358 |
+
font-size: 12pt;
|
| 359 |
+
}
|
| 360 |
+
|
| 361 |
+
.placeholder-content p {
|
| 362 |
+
color: #7f8c8d;
|
| 363 |
+
font-style: italic;
|
| 364 |
+
margin-bottom: 10px;
|
| 365 |
+
font-size: 9pt;
|
| 366 |
+
}
|
| 367 |
+
|
| 368 |
+
.placeholder-content ul {
|
| 369 |
+
list-style: none;
|
| 370 |
+
color: #95a5a6;
|
| 371 |
+
font-size: 9pt;
|
| 372 |
+
}
|
| 373 |
+
|
| 374 |
+
.recommendations-section {
|
| 375 |
+
background: #667eea;
|
| 376 |
+
color: white;
|
| 377 |
+
border-radius: 8px;
|
| 378 |
+
padding: 20px;
|
| 379 |
+
}
|
| 380 |
+
|
| 381 |
+
.recommendations-section h3 {
|
| 382 |
+
margin-bottom: 15px;
|
| 383 |
+
font-size: 14pt;
|
| 384 |
+
}
|
| 385 |
+
|
| 386 |
+
.recommendation {
|
| 387 |
+
background: white;
|
| 388 |
+
color: #333;
|
| 389 |
+
border-radius: 6px;
|
| 390 |
+
padding: 15px;
|
| 391 |
+
margin-bottom: 15px;
|
| 392 |
+
break-inside: avoid;
|
| 393 |
+
}
|
| 394 |
+
|
| 395 |
+
.rec-header {
|
| 396 |
+
display: flex;
|
| 397 |
+
align-items: center;
|
| 398 |
+
gap: 8px;
|
| 399 |
+
margin-bottom: 8px;
|
| 400 |
+
}
|
| 401 |
+
|
| 402 |
+
.rec-number {
|
| 403 |
+
background: #3498db;
|
| 404 |
+
color: white;
|
| 405 |
+
width: 24px;
|
| 406 |
+
height: 24px;
|
| 407 |
+
border-radius: 50%;
|
| 408 |
+
display: flex;
|
| 409 |
+
align-items: center;
|
| 410 |
+
justify-content: center;
|
| 411 |
+
font-weight: bold;
|
| 412 |
+
font-size: 10pt;
|
| 413 |
+
}
|
| 414 |
+
|
| 415 |
+
.rec-priority {
|
| 416 |
+
color: white;
|
| 417 |
+
padding: 3px 6px;
|
| 418 |
+
border-radius: 3px;
|
| 419 |
+
font-size: 8pt;
|
| 420 |
+
font-weight: bold;
|
| 421 |
+
}
|
| 422 |
+
|
| 423 |
+
.rec-category {
|
| 424 |
+
background: #ecf0f1;
|
| 425 |
+
color: #2c3e50;
|
| 426 |
+
padding: 3px 6px;
|
| 427 |
+
border-radius: 3px;
|
| 428 |
+
font-size: 8pt;
|
| 429 |
+
}
|
| 430 |
+
|
| 431 |
+
.recommendation h4 {
|
| 432 |
+
font-size: 11pt;
|
| 433 |
+
margin-bottom: 5px;
|
| 434 |
+
}
|
| 435 |
+
|
| 436 |
+
.recommendation p {
|
| 437 |
+
font-size: 9pt;
|
| 438 |
+
line-height: 1.3;
|
| 439 |
+
}
|
| 440 |
+
|
| 441 |
+
.rec-timeline {
|
| 442 |
+
color: #7f8c8d;
|
| 443 |
+
font-size: 8pt;
|
| 444 |
+
margin-top: 8px;
|
| 445 |
+
font-weight: bold;
|
| 446 |
+
}
|
| 447 |
+
|
| 448 |
+
.error-message {
|
| 449 |
+
background: #f8d7da;
|
| 450 |
+
border: 1px solid #f5c6cb;
|
| 451 |
+
color: #721c24;
|
| 452 |
+
padding: 15px;
|
| 453 |
+
border-radius: 6px;
|
| 454 |
+
text-align: center;
|
| 455 |
+
font-size: 10pt;
|
| 456 |
+
}
|
| 457 |
+
"""
|
report_generator.py
ADDED
|
@@ -0,0 +1,1096 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import json
|
| 2 |
+
from typing import Dict, Any, List
|
| 3 |
+
from datetime import datetime
|
| 4 |
+
import plotly.graph_objects as go
|
| 5 |
+
import plotly.express as px
|
| 6 |
+
from plotly.offline import plot
|
| 7 |
+
import plotly
|
| 8 |
+
|
| 9 |
+
class ReportGenerator:
|
| 10 |
+
def __init__(self):
|
| 11 |
+
self.report_template = self._get_report_template()
|
| 12 |
+
|
| 13 |
+
def generate_html_report(self, url: str, technical_data: Dict[str, Any],
|
| 14 |
+
content_data: Dict[str, Any], competitor_data: List[Dict] = None,
|
| 15 |
+
include_charts: bool = True) -> str:
|
| 16 |
+
"""Generate complete HTML SEO report"""
|
| 17 |
+
|
| 18 |
+
# Generate charts
|
| 19 |
+
charts_html = ""
|
| 20 |
+
if include_charts:
|
| 21 |
+
charts_html = self._generate_charts(technical_data, content_data, competitor_data)
|
| 22 |
+
|
| 23 |
+
# Generate executive summary
|
| 24 |
+
executive_summary = self._generate_executive_summary(technical_data, content_data)
|
| 25 |
+
|
| 26 |
+
# Generate technical SEO section
|
| 27 |
+
technical_section = self._generate_technical_section(technical_data)
|
| 28 |
+
|
| 29 |
+
# Generate content audit section
|
| 30 |
+
content_section = self._generate_content_section(content_data)
|
| 31 |
+
|
| 32 |
+
# Generate competitor section
|
| 33 |
+
competitor_section = ""
|
| 34 |
+
if competitor_data:
|
| 35 |
+
competitor_section = self._generate_competitor_section(competitor_data, technical_data, content_data)
|
| 36 |
+
|
| 37 |
+
# Generate placeholder sections
|
| 38 |
+
placeholder_sections = self._generate_placeholder_sections()
|
| 39 |
+
|
| 40 |
+
# Generate recommendations
|
| 41 |
+
recommendations = self._generate_recommendations(technical_data, content_data)
|
| 42 |
+
|
| 43 |
+
# Compile final report
|
| 44 |
+
report_html = self.report_template.format(
|
| 45 |
+
url=url,
|
| 46 |
+
generated_date=datetime.now().strftime("%B %d, %Y at %I:%M %p"),
|
| 47 |
+
charts=charts_html,
|
| 48 |
+
executive_summary=executive_summary,
|
| 49 |
+
technical_section=technical_section,
|
| 50 |
+
content_section=content_section,
|
| 51 |
+
competitor_section=competitor_section,
|
| 52 |
+
placeholder_sections=placeholder_sections,
|
| 53 |
+
recommendations=recommendations
|
| 54 |
+
)
|
| 55 |
+
|
| 56 |
+
return report_html
|
| 57 |
+
|
| 58 |
+
def _generate_charts(self, technical_data: Dict[str, Any], content_data: Dict[str, Any],
|
| 59 |
+
competitor_data: List[Dict] = None) -> str:
|
| 60 |
+
"""Generate interactive charts using Plotly"""
|
| 61 |
+
charts_html = ""
|
| 62 |
+
|
| 63 |
+
# Performance Scores Chart
|
| 64 |
+
if not technical_data.get('error'):
|
| 65 |
+
mobile_scores = technical_data.get('mobile', {})
|
| 66 |
+
desktop_scores = technical_data.get('desktop', {})
|
| 67 |
+
|
| 68 |
+
performance_fig = go.Figure()
|
| 69 |
+
|
| 70 |
+
categories = ['Performance', 'SEO', 'Accessibility', 'Best Practices']
|
| 71 |
+
mobile_values = [
|
| 72 |
+
mobile_scores.get('performance_score', 0),
|
| 73 |
+
mobile_scores.get('seo_score', 0),
|
| 74 |
+
mobile_scores.get('accessibility_score', 0),
|
| 75 |
+
mobile_scores.get('best_practices_score', 0)
|
| 76 |
+
]
|
| 77 |
+
desktop_values = [
|
| 78 |
+
desktop_scores.get('performance_score', 0),
|
| 79 |
+
desktop_scores.get('seo_score', 0),
|
| 80 |
+
desktop_scores.get('accessibility_score', 0),
|
| 81 |
+
desktop_scores.get('best_practices_score', 0)
|
| 82 |
+
]
|
| 83 |
+
|
| 84 |
+
performance_fig.add_trace(go.Bar(
|
| 85 |
+
name='Mobile',
|
| 86 |
+
x=categories,
|
| 87 |
+
y=mobile_values,
|
| 88 |
+
marker_color='#FF6B6B'
|
| 89 |
+
))
|
| 90 |
+
|
| 91 |
+
performance_fig.add_trace(go.Bar(
|
| 92 |
+
name='Desktop',
|
| 93 |
+
x=categories,
|
| 94 |
+
y=desktop_values,
|
| 95 |
+
marker_color='#4ECDC4'
|
| 96 |
+
))
|
| 97 |
+
|
| 98 |
+
performance_fig.update_layout(
|
| 99 |
+
title='PageSpeed Insights Scores',
|
| 100 |
+
xaxis_title='Categories',
|
| 101 |
+
yaxis_title='Score (0-100)',
|
| 102 |
+
barmode='group',
|
| 103 |
+
height=400,
|
| 104 |
+
showlegend=True
|
| 105 |
+
)
|
| 106 |
+
|
| 107 |
+
charts_html += f'<div class="chart-container">{plot(performance_fig, output_type="div", include_plotlyjs=False)}</div>'
|
| 108 |
+
|
| 109 |
+
# Core Web Vitals Chart
|
| 110 |
+
if not technical_data.get('error'):
|
| 111 |
+
cwv_data = technical_data.get('core_web_vitals', {})
|
| 112 |
+
mobile_cwv = cwv_data.get('mobile', {})
|
| 113 |
+
desktop_cwv = cwv_data.get('desktop', {})
|
| 114 |
+
|
| 115 |
+
cwv_fig = go.Figure()
|
| 116 |
+
|
| 117 |
+
metrics = ['LCP (s)', 'CLS', 'INP (ms)', 'FCP (s)']
|
| 118 |
+
mobile_cwv_values = [
|
| 119 |
+
mobile_cwv.get('lcp', 0),
|
| 120 |
+
mobile_cwv.get('cls', 0),
|
| 121 |
+
mobile_cwv.get('inp', 0),
|
| 122 |
+
mobile_cwv.get('fcp', 0)
|
| 123 |
+
]
|
| 124 |
+
desktop_cwv_values = [
|
| 125 |
+
desktop_cwv.get('lcp', 0),
|
| 126 |
+
desktop_cwv.get('cls', 0),
|
| 127 |
+
desktop_cwv.get('inp', 0),
|
| 128 |
+
desktop_cwv.get('fcp', 0)
|
| 129 |
+
]
|
| 130 |
+
|
| 131 |
+
cwv_fig.add_trace(go.Scatter(
|
| 132 |
+
name='Mobile',
|
| 133 |
+
x=metrics,
|
| 134 |
+
y=mobile_cwv_values,
|
| 135 |
+
mode='lines+markers',
|
| 136 |
+
line=dict(color='#FF6B6B', width=3),
|
| 137 |
+
marker=dict(size=8)
|
| 138 |
+
))
|
| 139 |
+
|
| 140 |
+
cwv_fig.add_trace(go.Scatter(
|
| 141 |
+
name='Desktop',
|
| 142 |
+
x=metrics,
|
| 143 |
+
y=desktop_cwv_values,
|
| 144 |
+
mode='lines+markers',
|
| 145 |
+
line=dict(color='#4ECDC4', width=3),
|
| 146 |
+
marker=dict(size=8)
|
| 147 |
+
))
|
| 148 |
+
|
| 149 |
+
cwv_fig.update_layout(
|
| 150 |
+
title='Core Web Vitals Performance',
|
| 151 |
+
xaxis_title='Metrics',
|
| 152 |
+
yaxis_title='Values',
|
| 153 |
+
height=400,
|
| 154 |
+
showlegend=True
|
| 155 |
+
)
|
| 156 |
+
|
| 157 |
+
charts_html += f'<div class="chart-container">{plot(cwv_fig, output_type="div", include_plotlyjs=False)}</div>'
|
| 158 |
+
|
| 159 |
+
# Metadata Completeness Chart
|
| 160 |
+
if not content_data.get('error'):
|
| 161 |
+
metadata = content_data.get('metadata_completeness', {})
|
| 162 |
+
|
| 163 |
+
completeness_fig = go.Figure(data=[go.Pie(
|
| 164 |
+
labels=['Title Tags', 'Meta Descriptions', 'H1 Tags'],
|
| 165 |
+
values=[
|
| 166 |
+
metadata.get('title_coverage', 0),
|
| 167 |
+
metadata.get('description_coverage', 0),
|
| 168 |
+
metadata.get('h1_coverage', 0)
|
| 169 |
+
],
|
| 170 |
+
hole=0.4,
|
| 171 |
+
marker_colors=['#FF6B6B', '#4ECDC4', '#45B7D1']
|
| 172 |
+
)])
|
| 173 |
+
|
| 174 |
+
completeness_fig.update_layout(
|
| 175 |
+
title='Metadata Completeness (%)',
|
| 176 |
+
height=400,
|
| 177 |
+
showlegend=True
|
| 178 |
+
)
|
| 179 |
+
|
| 180 |
+
charts_html += f'<div class="chart-container">{plot(completeness_fig, output_type="div", include_plotlyjs=False)}</div>'
|
| 181 |
+
|
| 182 |
+
# Content Freshness Chart
|
| 183 |
+
if not content_data.get('error'):
|
| 184 |
+
freshness = content_data.get('content_freshness', {})
|
| 185 |
+
|
| 186 |
+
freshness_fig = go.Figure(data=[go.Pie(
|
| 187 |
+
labels=['Fresh (<6 months)', 'Moderate (6-18 months)', 'Stale (>18 months)', 'Unknown Date'],
|
| 188 |
+
values=[
|
| 189 |
+
freshness.get('fresh_content', {}).get('count', 0),
|
| 190 |
+
freshness.get('moderate_content', {}).get('count', 0),
|
| 191 |
+
freshness.get('stale_content', {}).get('count', 0),
|
| 192 |
+
freshness.get('unknown_date', {}).get('count', 0)
|
| 193 |
+
],
|
| 194 |
+
marker_colors=['#2ECC71', '#F39C12', '#E74C3C', '#95A5A6']
|
| 195 |
+
)])
|
| 196 |
+
|
| 197 |
+
freshness_fig.update_layout(
|
| 198 |
+
title='Content Freshness Distribution',
|
| 199 |
+
height=400,
|
| 200 |
+
showlegend=True
|
| 201 |
+
)
|
| 202 |
+
|
| 203 |
+
charts_html += f'<div class="chart-container">{plot(freshness_fig, output_type="div", include_plotlyjs=False)}</div>'
|
| 204 |
+
|
| 205 |
+
return charts_html
|
| 206 |
+
|
| 207 |
+
def _generate_executive_summary(self, technical_data: Dict[str, Any], content_data: Dict[str, Any]) -> str:
|
| 208 |
+
"""Generate executive summary section"""
|
| 209 |
+
# Calculate overall health score
|
| 210 |
+
mobile_perf = technical_data.get('mobile', {}).get('performance_score', 0)
|
| 211 |
+
desktop_perf = technical_data.get('desktop', {}).get('performance_score', 0)
|
| 212 |
+
avg_performance = (mobile_perf + desktop_perf) / 2
|
| 213 |
+
|
| 214 |
+
metadata_avg = 0
|
| 215 |
+
if not content_data.get('error'):
|
| 216 |
+
metadata = content_data.get('metadata_completeness', {})
|
| 217 |
+
metadata_avg = (
|
| 218 |
+
metadata.get('title_coverage', 0) +
|
| 219 |
+
metadata.get('description_coverage', 0) +
|
| 220 |
+
metadata.get('h1_coverage', 0)
|
| 221 |
+
) / 3
|
| 222 |
+
|
| 223 |
+
overall_score = (avg_performance + metadata_avg) / 2
|
| 224 |
+
|
| 225 |
+
# Health status
|
| 226 |
+
if overall_score >= 80:
|
| 227 |
+
health_status = "Excellent"
|
| 228 |
+
health_color = "#2ECC71"
|
| 229 |
+
elif overall_score >= 60:
|
| 230 |
+
health_status = "Good"
|
| 231 |
+
health_color = "#F39C12"
|
| 232 |
+
elif overall_score >= 40:
|
| 233 |
+
health_status = "Fair"
|
| 234 |
+
health_color = "#FF6B6B"
|
| 235 |
+
else:
|
| 236 |
+
health_status = "Poor"
|
| 237 |
+
health_color = "#E74C3C"
|
| 238 |
+
|
| 239 |
+
# Quick wins
|
| 240 |
+
quick_wins = []
|
| 241 |
+
if not content_data.get('error'):
|
| 242 |
+
metadata = content_data.get('metadata_completeness', {})
|
| 243 |
+
if metadata.get('title_coverage', 0) < 90:
|
| 244 |
+
quick_wins.append(f"Complete missing title tags ({100 - metadata.get('title_coverage', 0):.1f}% of pages missing)")
|
| 245 |
+
if metadata.get('description_coverage', 0) < 90:
|
| 246 |
+
quick_wins.append(f"Add missing meta descriptions ({100 - metadata.get('description_coverage', 0):.1f}% of pages missing)")
|
| 247 |
+
if metadata.get('h1_coverage', 0) < 90:
|
| 248 |
+
quick_wins.append(f"Add missing H1 tags ({100 - metadata.get('h1_coverage', 0):.1f}% of pages missing)")
|
| 249 |
+
|
| 250 |
+
if mobile_perf < 70:
|
| 251 |
+
quick_wins.append(f"Improve mobile performance score (currently {mobile_perf:.1f}/100)")
|
| 252 |
+
|
| 253 |
+
quick_wins_html = "".join([f"<li>{win}</li>" for win in quick_wins[:5]])
|
| 254 |
+
|
| 255 |
+
return f"""
|
| 256 |
+
<div class="summary-card">
|
| 257 |
+
<div class="health-score">
|
| 258 |
+
<h3>Overall SEO Health</h3>
|
| 259 |
+
<div class="score-circle" style="border-color: {health_color}">
|
| 260 |
+
<span class="score-number" style="color: {health_color}">{overall_score:.0f}</span>
|
| 261 |
+
<span class="score-label">/ 100</span>
|
| 262 |
+
</div>
|
| 263 |
+
<p class="health-status" style="color: {health_color}">{health_status}</p>
|
| 264 |
+
</div>
|
| 265 |
+
|
| 266 |
+
<div class="key-metrics">
|
| 267 |
+
<div class="metric">
|
| 268 |
+
<h4>Performance Score</h4>
|
| 269 |
+
<p>Mobile: {mobile_perf:.1f}/100</p>
|
| 270 |
+
<p>Desktop: {desktop_perf:.1f}/100</p>
|
| 271 |
+
</div>
|
| 272 |
+
<div class="metric">
|
| 273 |
+
<h4>Content Analysis</h4>
|
| 274 |
+
<p>Pages Analyzed: {content_data.get('pages_analyzed', 0)}</p>
|
| 275 |
+
<p>Metadata Completeness: {metadata_avg:.1f}%</p>
|
| 276 |
+
</div>
|
| 277 |
+
</div>
|
| 278 |
+
</div>
|
| 279 |
+
|
| 280 |
+
<div class="quick-wins">
|
| 281 |
+
<h3>π― Quick Wins</h3>
|
| 282 |
+
<ul>
|
| 283 |
+
{quick_wins_html}
|
| 284 |
+
{'' if quick_wins else '<li>Great job! No immediate quick wins identified.</li>'}
|
| 285 |
+
</ul>
|
| 286 |
+
</div>
|
| 287 |
+
"""
|
| 288 |
+
|
| 289 |
+
def _generate_technical_section(self, technical_data: Dict[str, Any]) -> str:
|
| 290 |
+
"""Generate technical SEO section"""
|
| 291 |
+
if technical_data.get('error'):
|
| 292 |
+
return f"""
|
| 293 |
+
<div class="error-message">
|
| 294 |
+
<h3>β οΈ Technical SEO Analysis</h3>
|
| 295 |
+
<p>Unable to complete technical analysis: {technical_data.get('error')}</p>
|
| 296 |
+
</div>
|
| 297 |
+
"""
|
| 298 |
+
|
| 299 |
+
mobile = technical_data.get('mobile', {})
|
| 300 |
+
desktop = technical_data.get('desktop', {})
|
| 301 |
+
cwv = technical_data.get('core_web_vitals', {})
|
| 302 |
+
opportunities = technical_data.get('opportunities', {}).get('opportunities', [])
|
| 303 |
+
|
| 304 |
+
# Core Web Vitals analysis
|
| 305 |
+
mobile_cwv = cwv.get('mobile', {})
|
| 306 |
+
cwv_analysis = []
|
| 307 |
+
|
| 308 |
+
lcp = mobile_cwv.get('lcp', 0)
|
| 309 |
+
if lcp > 2.5:
|
| 310 |
+
cwv_analysis.append(f"β οΈ LCP ({lcp:.2f}s) - Should be under 2.5s")
|
| 311 |
+
else:
|
| 312 |
+
cwv_analysis.append(f"β
LCP ({lcp:.2f}s) - Good")
|
| 313 |
+
|
| 314 |
+
cls = mobile_cwv.get('cls', 0)
|
| 315 |
+
if cls > 0.1:
|
| 316 |
+
cwv_analysis.append(f"β οΈ CLS ({cls:.3f}) - Should be under 0.1")
|
| 317 |
+
else:
|
| 318 |
+
cwv_analysis.append(f"β
CLS ({cls:.3f}) - Good")
|
| 319 |
+
|
| 320 |
+
# Opportunities list
|
| 321 |
+
opportunities_html = ""
|
| 322 |
+
for opp in opportunities[:5]:
|
| 323 |
+
opportunities_html += f"""
|
| 324 |
+
<div class="opportunity">
|
| 325 |
+
<h4>{opp.get('title', 'Optimization Opportunity')}</h4>
|
| 326 |
+
<p>{opp.get('description', '')}</p>
|
| 327 |
+
<span class="savings">Potential savings: {opp.get('potential_savings', 0):.0f}ms</span>
|
| 328 |
+
</div>
|
| 329 |
+
"""
|
| 330 |
+
|
| 331 |
+
return f"""
|
| 332 |
+
<div class="technical-metrics">
|
| 333 |
+
<div class="metric-row">
|
| 334 |
+
<div class="metric-card">
|
| 335 |
+
<h4>Mobile Performance</h4>
|
| 336 |
+
<div class="score">{mobile.get('performance_score', 0):.1f}/100</div>
|
| 337 |
+
</div>
|
| 338 |
+
<div class="metric-card">
|
| 339 |
+
<h4>Desktop Performance</h4>
|
| 340 |
+
<div class="score">{desktop.get('performance_score', 0):.1f}/100</div>
|
| 341 |
+
</div>
|
| 342 |
+
<div class="metric-card">
|
| 343 |
+
<h4>SEO Score</h4>
|
| 344 |
+
<div class="score">{mobile.get('seo_score', 0):.1f}/100</div>
|
| 345 |
+
</div>
|
| 346 |
+
<div class="metric-card">
|
| 347 |
+
<h4>Accessibility</h4>
|
| 348 |
+
<div class="score">{mobile.get('accessibility_score', 0):.1f}/100</div>
|
| 349 |
+
</div>
|
| 350 |
+
</div>
|
| 351 |
+
</div>
|
| 352 |
+
|
| 353 |
+
<div class="cwv-analysis">
|
| 354 |
+
<h3>Core Web Vitals Analysis</h3>
|
| 355 |
+
<ul>
|
| 356 |
+
{"".join([f"<li>{analysis}</li>" for analysis in cwv_analysis])}
|
| 357 |
+
</ul>
|
| 358 |
+
</div>
|
| 359 |
+
|
| 360 |
+
<div class="optimization-opportunities">
|
| 361 |
+
<h3>π§ Optimization Opportunities</h3>
|
| 362 |
+
{opportunities_html if opportunities_html else '<p>No major optimization opportunities identified.</p>'}
|
| 363 |
+
</div>
|
| 364 |
+
"""
|
| 365 |
+
|
| 366 |
+
def _generate_content_section(self, content_data: Dict[str, Any]) -> str:
|
| 367 |
+
"""Generate content audit section"""
|
| 368 |
+
if content_data.get('error'):
|
| 369 |
+
return f"""
|
| 370 |
+
<div class="error-message">
|
| 371 |
+
<h3>β οΈ Content Audit</h3>
|
| 372 |
+
<p>Unable to complete content analysis: {content_data.get('error')}</p>
|
| 373 |
+
</div>
|
| 374 |
+
"""
|
| 375 |
+
|
| 376 |
+
metadata = content_data.get('metadata_completeness', {})
|
| 377 |
+
content_metrics = content_data.get('content_metrics', {})
|
| 378 |
+
freshness = content_data.get('content_freshness', {})
|
| 379 |
+
|
| 380 |
+
return f"""
|
| 381 |
+
<div class="content-overview">
|
| 382 |
+
<div class="metric-row">
|
| 383 |
+
<div class="metric-card">
|
| 384 |
+
<h4>Pages Discovered</h4>
|
| 385 |
+
<div class="score">{content_data.get('total_pages_discovered', 0)}</div>
|
| 386 |
+
</div>
|
| 387 |
+
<div class="metric-card">
|
| 388 |
+
<h4>Pages Analyzed</h4>
|
| 389 |
+
<div class="score">{content_data.get('pages_analyzed', 0)}</div>
|
| 390 |
+
</div>
|
| 391 |
+
<div class="metric-card">
|
| 392 |
+
<h4>Avg. Word Count</h4>
|
| 393 |
+
<div class="score">{content_metrics.get('avg_word_count', 0):.0f}</div>
|
| 394 |
+
</div>
|
| 395 |
+
<div class="metric-card">
|
| 396 |
+
<h4>CTA Coverage</h4>
|
| 397 |
+
<div class="score">{content_metrics.get('cta_coverage', 0):.1f}%</div>
|
| 398 |
+
</div>
|
| 399 |
+
</div>
|
| 400 |
+
</div>
|
| 401 |
+
|
| 402 |
+
<div class="metadata-analysis">
|
| 403 |
+
<h3>π Metadata Completeness</h3>
|
| 404 |
+
<div class="metadata-stats">
|
| 405 |
+
<div class="stat">
|
| 406 |
+
<span class="label">Title Tags:</span>
|
| 407 |
+
<span class="value">{metadata.get('title_coverage', 0):.1f}% complete</span>
|
| 408 |
+
<span class="benchmark">(Target: 90%+)</span>
|
| 409 |
+
</div>
|
| 410 |
+
<div class="stat">
|
| 411 |
+
<span class="label">Meta Descriptions:</span>
|
| 412 |
+
<span class="value">{metadata.get('description_coverage', 0):.1f}% complete</span>
|
| 413 |
+
<span class="benchmark">(Target: 90%+)</span>
|
| 414 |
+
</div>
|
| 415 |
+
<div class="stat">
|
| 416 |
+
<span class="label">H1 Tags:</span>
|
| 417 |
+
<span class="value">{metadata.get('h1_coverage', 0):.1f}% complete</span>
|
| 418 |
+
<span class="benchmark">(Target: 90%+)</span>
|
| 419 |
+
</div>
|
| 420 |
+
</div>
|
| 421 |
+
</div>
|
| 422 |
+
|
| 423 |
+
<div class="content-quality">
|
| 424 |
+
<h3>π Content Quality Metrics</h3>
|
| 425 |
+
<div class="quality-stats">
|
| 426 |
+
<div class="stat">
|
| 427 |
+
<span class="label">Average Word Count:</span>
|
| 428 |
+
<span class="value">{content_metrics.get('avg_word_count', 0):.0f} words</span>
|
| 429 |
+
<span class="benchmark">(Recommended: 800-1200)</span>
|
| 430 |
+
</div>
|
| 431 |
+
<div class="stat">
|
| 432 |
+
<span class="label">Call-to-Action Coverage:</span>
|
| 433 |
+
<span class="value">{content_metrics.get('cta_coverage', 0):.1f}% of pages</span>
|
| 434 |
+
<span class="benchmark">(Target: 80%+)</span>
|
| 435 |
+
</div>
|
| 436 |
+
</div>
|
| 437 |
+
</div>
|
| 438 |
+
|
| 439 |
+
<div class="content-freshness">
|
| 440 |
+
<h3>ποΈ Content Freshness</h3>
|
| 441 |
+
<div class="freshness-stats">
|
| 442 |
+
<div class="stat">
|
| 443 |
+
<span class="label">Fresh Content (<6 months):</span>
|
| 444 |
+
<span class="value">{freshness.get('fresh_content', {}).get('percentage', 0):.1f}%</span>
|
| 445 |
+
</div>
|
| 446 |
+
<div class="stat">
|
| 447 |
+
<span class="label">Moderate Age (6-18 months):</span>
|
| 448 |
+
<span class="value">{freshness.get('moderate_content', {}).get('percentage', 0):.1f}%</span>
|
| 449 |
+
</div>
|
| 450 |
+
<div class="stat">
|
| 451 |
+
<span class="label">Stale Content (>18 months):</span>
|
| 452 |
+
<span class="value">{freshness.get('stale_content', {}).get('percentage', 0):.1f}%</span>
|
| 453 |
+
</div>
|
| 454 |
+
</div>
|
| 455 |
+
</div>
|
| 456 |
+
"""
|
| 457 |
+
|
| 458 |
+
def _generate_competitor_section(self, competitor_data: List[Dict],
|
| 459 |
+
primary_technical: Dict[str, Any],
|
| 460 |
+
primary_content: Dict[str, Any]) -> str:
|
| 461 |
+
"""Generate competitor comparison section"""
|
| 462 |
+
if not competitor_data:
|
| 463 |
+
return ""
|
| 464 |
+
|
| 465 |
+
comparison_html = """
|
| 466 |
+
<div class="competitor-comparison">
|
| 467 |
+
<h3>π Competitor Benchmarking</h3>
|
| 468 |
+
<table class="comparison-table">
|
| 469 |
+
<thead>
|
| 470 |
+
<tr>
|
| 471 |
+
<th>Domain</th>
|
| 472 |
+
<th>Mobile Perf.</th>
|
| 473 |
+
<th>Desktop Perf.</th>
|
| 474 |
+
<th>SEO Score</th>
|
| 475 |
+
<th>Content Pages</th>
|
| 476 |
+
</tr>
|
| 477 |
+
</thead>
|
| 478 |
+
<tbody>
|
| 479 |
+
"""
|
| 480 |
+
|
| 481 |
+
# Add primary site
|
| 482 |
+
primary_mobile = primary_technical.get('mobile', {}).get('performance_score', 0)
|
| 483 |
+
primary_desktop = primary_technical.get('desktop', {}).get('performance_score', 0)
|
| 484 |
+
primary_seo = primary_technical.get('mobile', {}).get('seo_score', 0)
|
| 485 |
+
primary_pages = primary_content.get('pages_analyzed', 0)
|
| 486 |
+
|
| 487 |
+
comparison_html += f"""
|
| 488 |
+
<tr class="primary-site">
|
| 489 |
+
<td><strong>Your Site</strong></td>
|
| 490 |
+
<td>{primary_mobile:.1f}</td>
|
| 491 |
+
<td>{primary_desktop:.1f}</td>
|
| 492 |
+
<td>{primary_seo:.1f}</td>
|
| 493 |
+
<td>{primary_pages}</td>
|
| 494 |
+
</tr>
|
| 495 |
+
"""
|
| 496 |
+
|
| 497 |
+
# Add competitors
|
| 498 |
+
for comp in competitor_data:
|
| 499 |
+
comp_technical = comp.get('technical', {})
|
| 500 |
+
comp_content = comp.get('content', {})
|
| 501 |
+
comp_mobile = comp_technical.get('mobile', {}).get('performance_score', 0)
|
| 502 |
+
comp_desktop = comp_technical.get('desktop', {}).get('performance_score', 0)
|
| 503 |
+
comp_seo = comp_technical.get('mobile', {}).get('seo_score', 0)
|
| 504 |
+
comp_pages = comp_content.get('pages_analyzed', 0)
|
| 505 |
+
|
| 506 |
+
domain = comp.get('url', '').replace('https://', '').replace('http://', '')
|
| 507 |
+
|
| 508 |
+
comparison_html += f"""
|
| 509 |
+
<tr>
|
| 510 |
+
<td>{domain}</td>
|
| 511 |
+
<td>{comp_mobile:.1f}</td>
|
| 512 |
+
<td>{comp_desktop:.1f}</td>
|
| 513 |
+
<td>{comp_seo:.1f}</td>
|
| 514 |
+
<td>{comp_pages}</td>
|
| 515 |
+
</tr>
|
| 516 |
+
"""
|
| 517 |
+
|
| 518 |
+
comparison_html += """
|
| 519 |
+
</tbody>
|
| 520 |
+
</table>
|
| 521 |
+
</div>
|
| 522 |
+
"""
|
| 523 |
+
|
| 524 |
+
return comparison_html
|
| 525 |
+
|
| 526 |
+
def _generate_placeholder_sections(self) -> str:
|
| 527 |
+
"""Generate placeholder sections for future modules"""
|
| 528 |
+
return """
|
| 529 |
+
<div class="placeholder-sections">
|
| 530 |
+
<div class="placeholder-section">
|
| 531 |
+
<h3>π Keyword Rankings</h3>
|
| 532 |
+
<div class="placeholder-content">
|
| 533 |
+
<p><em>Coming in future versions</em></p>
|
| 534 |
+
<ul>
|
| 535 |
+
<li>Google Search Console integration</li>
|
| 536 |
+
<li>Keyword ranking positions</li>
|
| 537 |
+
<li>Search volume analysis</li>
|
| 538 |
+
<li>Keyword opportunities</li>
|
| 539 |
+
</ul>
|
| 540 |
+
</div>
|
| 541 |
+
</div>
|
| 542 |
+
|
| 543 |
+
<div class="placeholder-section">
|
| 544 |
+
<h3>π Backlink Profile</h3>
|
| 545 |
+
<div class="placeholder-content">
|
| 546 |
+
<p><em>Coming in future versions</em></p>
|
| 547 |
+
<ul>
|
| 548 |
+
<li>Total backlinks and referring domains</li>
|
| 549 |
+
<li>Domain authority metrics</li>
|
| 550 |
+
<li>Anchor text analysis</li>
|
| 551 |
+
<li>Link acquisition opportunities</li>
|
| 552 |
+
</ul>
|
| 553 |
+
</div>
|
| 554 |
+
</div>
|
| 555 |
+
|
| 556 |
+
<div class="placeholder-section">
|
| 557 |
+
<h3>π Conversion Tracking</h3>
|
| 558 |
+
<div class="placeholder-content">
|
| 559 |
+
<p><em>Coming in future versions</em></p>
|
| 560 |
+
<ul>
|
| 561 |
+
<li>Google Analytics integration</li>
|
| 562 |
+
<li>Organic traffic conversion rates</li>
|
| 563 |
+
<li>Goal completion tracking</li>
|
| 564 |
+
<li>Revenue attribution</li>
|
| 565 |
+
</ul>
|
| 566 |
+
</div>
|
| 567 |
+
</div>
|
| 568 |
+
</div>
|
| 569 |
+
"""
|
| 570 |
+
|
| 571 |
+
def _generate_recommendations(self, technical_data: Dict[str, Any], content_data: Dict[str, Any]) -> str:
|
| 572 |
+
"""Generate prioritized recommendations"""
|
| 573 |
+
recommendations = []
|
| 574 |
+
|
| 575 |
+
# Technical recommendations
|
| 576 |
+
if not technical_data.get('error'):
|
| 577 |
+
mobile = technical_data.get('mobile', {})
|
| 578 |
+
if mobile.get('performance_score', 0) < 70:
|
| 579 |
+
recommendations.append({
|
| 580 |
+
'priority': 'High',
|
| 581 |
+
'category': 'Technical SEO',
|
| 582 |
+
'title': 'Improve Mobile Performance',
|
| 583 |
+
'description': f'Mobile performance score is {mobile.get("performance_score", 0):.1f}/100. Focus on Core Web Vitals optimization.',
|
| 584 |
+
'timeline': '2-4 weeks'
|
| 585 |
+
})
|
| 586 |
+
|
| 587 |
+
# Content recommendations
|
| 588 |
+
if not content_data.get('error'):
|
| 589 |
+
metadata = content_data.get('metadata_completeness', {})
|
| 590 |
+
|
| 591 |
+
if metadata.get('title_coverage', 0) < 90:
|
| 592 |
+
recommendations.append({
|
| 593 |
+
'priority': 'High',
|
| 594 |
+
'category': 'Content',
|
| 595 |
+
'title': 'Complete Missing Title Tags',
|
| 596 |
+
'description': f'{100 - metadata.get("title_coverage", 0):.1f}% of pages are missing title tags. This directly impacts search visibility.',
|
| 597 |
+
'timeline': '1-2 weeks'
|
| 598 |
+
})
|
| 599 |
+
|
| 600 |
+
if metadata.get('description_coverage', 0) < 90:
|
| 601 |
+
recommendations.append({
|
| 602 |
+
'priority': 'Medium',
|
| 603 |
+
'category': 'Content',
|
| 604 |
+
'title': 'Add Missing Meta Descriptions',
|
| 605 |
+
'description': f'{100 - metadata.get("description_coverage", 0):.1f}% of pages are missing meta descriptions. Improve click-through rates from search results.',
|
| 606 |
+
'timeline': '2-3 weeks'
|
| 607 |
+
})
|
| 608 |
+
|
| 609 |
+
content_metrics = content_data.get('content_metrics', {})
|
| 610 |
+
if content_metrics.get('avg_word_count', 0) < 800:
|
| 611 |
+
recommendations.append({
|
| 612 |
+
'priority': 'Medium',
|
| 613 |
+
'category': 'Content',
|
| 614 |
+
'title': 'Increase Content Depth',
|
| 615 |
+
'description': f'Average word count is {content_metrics.get("avg_word_count", 0):.0f} words. Aim for 800-1200 words per page for better rankings.',
|
| 616 |
+
'timeline': '4-6 weeks'
|
| 617 |
+
})
|
| 618 |
+
|
| 619 |
+
# Sort by priority
|
| 620 |
+
priority_order = {'High': 0, 'Medium': 1, 'Low': 2}
|
| 621 |
+
recommendations.sort(key=lambda x: priority_order.get(x['priority'], 2))
|
| 622 |
+
|
| 623 |
+
recommendations_html = ""
|
| 624 |
+
for i, rec in enumerate(recommendations[:8], 1):
|
| 625 |
+
priority_color = {
|
| 626 |
+
'High': '#E74C3C',
|
| 627 |
+
'Medium': '#F39C12',
|
| 628 |
+
'Low': '#2ECC71'
|
| 629 |
+
}.get(rec['priority'], '#95A5A6')
|
| 630 |
+
|
| 631 |
+
recommendations_html += f"""
|
| 632 |
+
<div class="recommendation">
|
| 633 |
+
<div class="rec-header">
|
| 634 |
+
<span class="rec-number">{i}</span>
|
| 635 |
+
<span class="rec-priority" style="background-color: {priority_color}">{rec['priority']}</span>
|
| 636 |
+
<span class="rec-category">{rec['category']}</span>
|
| 637 |
+
</div>
|
| 638 |
+
<h4>{rec['title']}</h4>
|
| 639 |
+
<p>{rec['description']}</p>
|
| 640 |
+
<div class="rec-timeline">Timeline: {rec['timeline']}</div>
|
| 641 |
+
</div>
|
| 642 |
+
"""
|
| 643 |
+
|
| 644 |
+
return f"""
|
| 645 |
+
<div class="recommendations-section">
|
| 646 |
+
<h3>π― Prioritized Recommendations</h3>
|
| 647 |
+
<div class="recommendations-list">
|
| 648 |
+
{recommendations_html if recommendations_html else '<p>Great job! No immediate recommendations identified.</p>'}
|
| 649 |
+
</div>
|
| 650 |
+
</div>
|
| 651 |
+
"""
|
| 652 |
+
|
| 653 |
+
def _get_report_template(self) -> str:
|
| 654 |
+
"""Get the HTML template for the report"""
|
| 655 |
+
return """
|
| 656 |
+
<!DOCTYPE html>
|
| 657 |
+
<html lang="en">
|
| 658 |
+
<head>
|
| 659 |
+
<meta charset="UTF-8">
|
| 660 |
+
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
| 661 |
+
<title>SEO Report - {url}</title>
|
| 662 |
+
<script src="https://cdn.plot.ly/plotly-latest.min.js"></script>
|
| 663 |
+
<style>
|
| 664 |
+
* {{
|
| 665 |
+
margin: 0;
|
| 666 |
+
padding: 0;
|
| 667 |
+
box-sizing: border-box;
|
| 668 |
+
}}
|
| 669 |
+
|
| 670 |
+
body {{
|
| 671 |
+
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, sans-serif;
|
| 672 |
+
line-height: 1.6;
|
| 673 |
+
color: #333;
|
| 674 |
+
background-color: #f8f9fa;
|
| 675 |
+
}}
|
| 676 |
+
|
| 677 |
+
.report-container {{
|
| 678 |
+
max-width: 1200px;
|
| 679 |
+
margin: 0 auto;
|
| 680 |
+
padding: 20px;
|
| 681 |
+
}}
|
| 682 |
+
|
| 683 |
+
.report-header {{
|
| 684 |
+
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
|
| 685 |
+
color: white;
|
| 686 |
+
padding: 40px;
|
| 687 |
+
border-radius: 10px;
|
| 688 |
+
margin-bottom: 30px;
|
| 689 |
+
text-align: center;
|
| 690 |
+
}}
|
| 691 |
+
|
| 692 |
+
.report-header h1 {{
|
| 693 |
+
font-size: 2.5rem;
|
| 694 |
+
margin-bottom: 10px;
|
| 695 |
+
}}
|
| 696 |
+
|
| 697 |
+
.report-header p {{
|
| 698 |
+
font-size: 1.1rem;
|
| 699 |
+
opacity: 0.9;
|
| 700 |
+
}}
|
| 701 |
+
|
| 702 |
+
.section {{
|
| 703 |
+
background: white;
|
| 704 |
+
margin-bottom: 30px;
|
| 705 |
+
padding: 30px;
|
| 706 |
+
border-radius: 10px;
|
| 707 |
+
box-shadow: 0 2px 10px rgba(0,0,0,0.1);
|
| 708 |
+
}}
|
| 709 |
+
|
| 710 |
+
.section h2 {{
|
| 711 |
+
color: #2c3e50;
|
| 712 |
+
margin-bottom: 20px;
|
| 713 |
+
font-size: 1.8rem;
|
| 714 |
+
border-bottom: 3px solid #3498db;
|
| 715 |
+
padding-bottom: 10px;
|
| 716 |
+
}}
|
| 717 |
+
|
| 718 |
+
.summary-card {{
|
| 719 |
+
display: flex;
|
| 720 |
+
justify-content: space-between;
|
| 721 |
+
align-items: center;
|
| 722 |
+
margin-bottom: 30px;
|
| 723 |
+
padding: 20px;
|
| 724 |
+
background: linear-gradient(135deg, #f093fb 0%, #f5576c 100%);
|
| 725 |
+
border-radius: 10px;
|
| 726 |
+
color: white;
|
| 727 |
+
}}
|
| 728 |
+
|
| 729 |
+
.health-score {{
|
| 730 |
+
text-align: center;
|
| 731 |
+
}}
|
| 732 |
+
|
| 733 |
+
.score-circle {{
|
| 734 |
+
width: 120px;
|
| 735 |
+
height: 120px;
|
| 736 |
+
border: 6px solid;
|
| 737 |
+
border-radius: 50%;
|
| 738 |
+
display: flex;
|
| 739 |
+
flex-direction: column;
|
| 740 |
+
align-items: center;
|
| 741 |
+
justify-content: center;
|
| 742 |
+
margin: 10px auto;
|
| 743 |
+
}}
|
| 744 |
+
|
| 745 |
+
.score-number {{
|
| 746 |
+
font-size: 2rem;
|
| 747 |
+
font-weight: bold;
|
| 748 |
+
}}
|
| 749 |
+
|
| 750 |
+
.score-label {{
|
| 751 |
+
font-size: 0.9rem;
|
| 752 |
+
opacity: 0.8;
|
| 753 |
+
}}
|
| 754 |
+
|
| 755 |
+
.health-status {{
|
| 756 |
+
font-size: 1.2rem;
|
| 757 |
+
font-weight: bold;
|
| 758 |
+
margin-top: 10px;
|
| 759 |
+
}}
|
| 760 |
+
|
| 761 |
+
.key-metrics {{
|
| 762 |
+
display: flex;
|
| 763 |
+
gap: 30px;
|
| 764 |
+
}}
|
| 765 |
+
|
| 766 |
+
.metric {{
|
| 767 |
+
text-align: center;
|
| 768 |
+
}}
|
| 769 |
+
|
| 770 |
+
.metric h4 {{
|
| 771 |
+
margin-bottom: 10px;
|
| 772 |
+
font-size: 1rem;
|
| 773 |
+
opacity: 0.9;
|
| 774 |
+
}}
|
| 775 |
+
|
| 776 |
+
.metric p {{
|
| 777 |
+
font-size: 1.1rem;
|
| 778 |
+
margin-bottom: 5px;
|
| 779 |
+
}}
|
| 780 |
+
|
| 781 |
+
.quick-wins {{
|
| 782 |
+
background: #fff3cd;
|
| 783 |
+
border: 1px solid #ffeeba;
|
| 784 |
+
border-radius: 8px;
|
| 785 |
+
padding: 20px;
|
| 786 |
+
}}
|
| 787 |
+
|
| 788 |
+
.quick-wins h3 {{
|
| 789 |
+
color: #856404;
|
| 790 |
+
margin-bottom: 15px;
|
| 791 |
+
}}
|
| 792 |
+
|
| 793 |
+
.quick-wins ul {{
|
| 794 |
+
list-style-type: none;
|
| 795 |
+
}}
|
| 796 |
+
|
| 797 |
+
.quick-wins li {{
|
| 798 |
+
color: #856404;
|
| 799 |
+
margin-bottom: 8px;
|
| 800 |
+
position: relative;
|
| 801 |
+
padding-left: 20px;
|
| 802 |
+
}}
|
| 803 |
+
|
| 804 |
+
.quick-wins li:before {{
|
| 805 |
+
content: "β";
|
| 806 |
+
position: absolute;
|
| 807 |
+
left: 0;
|
| 808 |
+
color: #ffc107;
|
| 809 |
+
font-weight: bold;
|
| 810 |
+
}}
|
| 811 |
+
|
| 812 |
+
.metric-row {{
|
| 813 |
+
display: grid;
|
| 814 |
+
grid-template-columns: repeat(auto-fit, minmax(200px, 1fr));
|
| 815 |
+
gap: 20px;
|
| 816 |
+
margin-bottom: 30px;
|
| 817 |
+
}}
|
| 818 |
+
|
| 819 |
+
.metric-card {{
|
| 820 |
+
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
|
| 821 |
+
color: white;
|
| 822 |
+
padding: 20px;
|
| 823 |
+
border-radius: 10px;
|
| 824 |
+
text-align: center;
|
| 825 |
+
}}
|
| 826 |
+
|
| 827 |
+
.metric-card h4 {{
|
| 828 |
+
font-size: 0.9rem;
|
| 829 |
+
margin-bottom: 10px;
|
| 830 |
+
opacity: 0.9;
|
| 831 |
+
}}
|
| 832 |
+
|
| 833 |
+
.metric-card .score {{
|
| 834 |
+
font-size: 2rem;
|
| 835 |
+
font-weight: bold;
|
| 836 |
+
}}
|
| 837 |
+
|
| 838 |
+
.chart-container {{
|
| 839 |
+
margin: 30px 0;
|
| 840 |
+
background: white;
|
| 841 |
+
border-radius: 10px;
|
| 842 |
+
padding: 20px;
|
| 843 |
+
box-shadow: 0 2px 5px rgba(0,0,0,0.1);
|
| 844 |
+
}}
|
| 845 |
+
|
| 846 |
+
.cwv-analysis ul, .metadata-stats, .quality-stats, .freshness-stats {{
|
| 847 |
+
list-style: none;
|
| 848 |
+
}}
|
| 849 |
+
|
| 850 |
+
.stat {{
|
| 851 |
+
display: flex;
|
| 852 |
+
justify-content: space-between;
|
| 853 |
+
align-items: center;
|
| 854 |
+
padding: 10px 0;
|
| 855 |
+
border-bottom: 1px solid #eee;
|
| 856 |
+
}}
|
| 857 |
+
|
| 858 |
+
.stat:last-child {{
|
| 859 |
+
border-bottom: none;
|
| 860 |
+
}}
|
| 861 |
+
|
| 862 |
+
.stat .label {{
|
| 863 |
+
font-weight: 600;
|
| 864 |
+
color: #2c3e50;
|
| 865 |
+
}}
|
| 866 |
+
|
| 867 |
+
.stat .value {{
|
| 868 |
+
font-weight: bold;
|
| 869 |
+
color: #3498db;
|
| 870 |
+
}}
|
| 871 |
+
|
| 872 |
+
.stat .benchmark {{
|
| 873 |
+
font-size: 0.85rem;
|
| 874 |
+
color: #7f8c8d;
|
| 875 |
+
}}
|
| 876 |
+
|
| 877 |
+
.opportunity {{
|
| 878 |
+
background: #f8f9fa;
|
| 879 |
+
border-left: 4px solid #ff6b6b;
|
| 880 |
+
padding: 15px;
|
| 881 |
+
margin-bottom: 15px;
|
| 882 |
+
border-radius: 5px;
|
| 883 |
+
}}
|
| 884 |
+
|
| 885 |
+
.opportunity h4 {{
|
| 886 |
+
color: #2c3e50;
|
| 887 |
+
margin-bottom: 8px;
|
| 888 |
+
}}
|
| 889 |
+
|
| 890 |
+
.savings {{
|
| 891 |
+
display: inline-block;
|
| 892 |
+
background: #ff6b6b;
|
| 893 |
+
color: white;
|
| 894 |
+
padding: 4px 8px;
|
| 895 |
+
border-radius: 4px;
|
| 896 |
+
font-size: 0.8rem;
|
| 897 |
+
margin-top: 8px;
|
| 898 |
+
}}
|
| 899 |
+
|
| 900 |
+
.comparison-table {{
|
| 901 |
+
width: 100%;
|
| 902 |
+
border-collapse: collapse;
|
| 903 |
+
margin-top: 20px;
|
| 904 |
+
}}
|
| 905 |
+
|
| 906 |
+
.comparison-table th,
|
| 907 |
+
.comparison-table td {{
|
| 908 |
+
padding: 12px;
|
| 909 |
+
text-align: left;
|
| 910 |
+
border-bottom: 1px solid #ddd;
|
| 911 |
+
}}
|
| 912 |
+
|
| 913 |
+
.comparison-table th {{
|
| 914 |
+
background: #f8f9fa;
|
| 915 |
+
font-weight: bold;
|
| 916 |
+
color: #2c3e50;
|
| 917 |
+
}}
|
| 918 |
+
|
| 919 |
+
.primary-site {{
|
| 920 |
+
background: #e8f5e8;
|
| 921 |
+
font-weight: bold;
|
| 922 |
+
}}
|
| 923 |
+
|
| 924 |
+
.placeholder-sections {{
|
| 925 |
+
display: grid;
|
| 926 |
+
grid-template-columns: repeat(auto-fit, minmax(300px, 1fr));
|
| 927 |
+
gap: 20px;
|
| 928 |
+
}}
|
| 929 |
+
|
| 930 |
+
.placeholder-section {{
|
| 931 |
+
border: 2px dashed #ddd;
|
| 932 |
+
border-radius: 10px;
|
| 933 |
+
padding: 20px;
|
| 934 |
+
text-align: center;
|
| 935 |
+
background: #fafafa;
|
| 936 |
+
}}
|
| 937 |
+
|
| 938 |
+
.placeholder-section h3 {{
|
| 939 |
+
color: #7f8c8d;
|
| 940 |
+
margin-bottom: 15px;
|
| 941 |
+
}}
|
| 942 |
+
|
| 943 |
+
.placeholder-content p {{
|
| 944 |
+
color: #7f8c8d;
|
| 945 |
+
font-style: italic;
|
| 946 |
+
margin-bottom: 15px;
|
| 947 |
+
}}
|
| 948 |
+
|
| 949 |
+
.placeholder-content ul {{
|
| 950 |
+
list-style: none;
|
| 951 |
+
color: #95a5a6;
|
| 952 |
+
}}
|
| 953 |
+
|
| 954 |
+
.placeholder-content li {{
|
| 955 |
+
margin-bottom: 8px;
|
| 956 |
+
}}
|
| 957 |
+
|
| 958 |
+
.recommendations-section {{
|
| 959 |
+
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
|
| 960 |
+
color: white;
|
| 961 |
+
border-radius: 10px;
|
| 962 |
+
padding: 30px;
|
| 963 |
+
}}
|
| 964 |
+
|
| 965 |
+
.recommendations-section h3 {{
|
| 966 |
+
margin-bottom: 25px;
|
| 967 |
+
font-size: 1.8rem;
|
| 968 |
+
}}
|
| 969 |
+
|
| 970 |
+
.recommendation {{
|
| 971 |
+
background: white;
|
| 972 |
+
color: #333;
|
| 973 |
+
border-radius: 8px;
|
| 974 |
+
padding: 20px;
|
| 975 |
+
margin-bottom: 20px;
|
| 976 |
+
}}
|
| 977 |
+
|
| 978 |
+
.rec-header {{
|
| 979 |
+
display: flex;
|
| 980 |
+
align-items: center;
|
| 981 |
+
gap: 10px;
|
| 982 |
+
margin-bottom: 10px;
|
| 983 |
+
}}
|
| 984 |
+
|
| 985 |
+
.rec-number {{
|
| 986 |
+
background: #3498db;
|
| 987 |
+
color: white;
|
| 988 |
+
width: 30px;
|
| 989 |
+
height: 30px;
|
| 990 |
+
border-radius: 50%;
|
| 991 |
+
display: flex;
|
| 992 |
+
align-items: center;
|
| 993 |
+
justify-content: center;
|
| 994 |
+
font-weight: bold;
|
| 995 |
+
}}
|
| 996 |
+
|
| 997 |
+
.rec-priority {{
|
| 998 |
+
color: white;
|
| 999 |
+
padding: 4px 8px;
|
| 1000 |
+
border-radius: 4px;
|
| 1001 |
+
font-size: 0.8rem;
|
| 1002 |
+
font-weight: bold;
|
| 1003 |
+
}}
|
| 1004 |
+
|
| 1005 |
+
.rec-category {{
|
| 1006 |
+
background: #ecf0f1;
|
| 1007 |
+
color: #2c3e50;
|
| 1008 |
+
padding: 4px 8px;
|
| 1009 |
+
border-radius: 4px;
|
| 1010 |
+
font-size: 0.8rem;
|
| 1011 |
+
}}
|
| 1012 |
+
|
| 1013 |
+
.rec-timeline {{
|
| 1014 |
+
color: #7f8c8d;
|
| 1015 |
+
font-size: 0.9rem;
|
| 1016 |
+
margin-top: 10px;
|
| 1017 |
+
font-weight: bold;
|
| 1018 |
+
}}
|
| 1019 |
+
|
| 1020 |
+
.error-message {{
|
| 1021 |
+
background: #f8d7da;
|
| 1022 |
+
border: 1px solid #f5c6cb;
|
| 1023 |
+
color: #721c24;
|
| 1024 |
+
padding: 20px;
|
| 1025 |
+
border-radius: 8px;
|
| 1026 |
+
text-align: center;
|
| 1027 |
+
}}
|
| 1028 |
+
|
| 1029 |
+
@media (max-width: 768px) {{
|
| 1030 |
+
.report-container {{
|
| 1031 |
+
padding: 10px;
|
| 1032 |
+
}}
|
| 1033 |
+
|
| 1034 |
+
.section {{
|
| 1035 |
+
padding: 20px;
|
| 1036 |
+
}}
|
| 1037 |
+
|
| 1038 |
+
.summary-card {{
|
| 1039 |
+
flex-direction: column;
|
| 1040 |
+
text-align: center;
|
| 1041 |
+
gap: 20px;
|
| 1042 |
+
}}
|
| 1043 |
+
|
| 1044 |
+
.key-metrics {{
|
| 1045 |
+
flex-direction: column;
|
| 1046 |
+
gap: 15px;
|
| 1047 |
+
}}
|
| 1048 |
+
|
| 1049 |
+
.metric-row {{
|
| 1050 |
+
grid-template-columns: 1fr;
|
| 1051 |
+
}}
|
| 1052 |
+
}}
|
| 1053 |
+
</style>
|
| 1054 |
+
</head>
|
| 1055 |
+
<body>
|
| 1056 |
+
<div class="report-container">
|
| 1057 |
+
<div class="report-header">
|
| 1058 |
+
<h1>π SEO Analysis Report</h1>
|
| 1059 |
+
<p>{url}</p>
|
| 1060 |
+
<p>Generated on {generated_date}</p>
|
| 1061 |
+
</div>
|
| 1062 |
+
|
| 1063 |
+
<div class="section">
|
| 1064 |
+
<h2>π Executive Summary</h2>
|
| 1065 |
+
{executive_summary}
|
| 1066 |
+
</div>
|
| 1067 |
+
|
| 1068 |
+
<div class="section">
|
| 1069 |
+
<h2>π Performance Charts</h2>
|
| 1070 |
+
{charts}
|
| 1071 |
+
</div>
|
| 1072 |
+
|
| 1073 |
+
<div class="section">
|
| 1074 |
+
<h2>β‘ Technical SEO</h2>
|
| 1075 |
+
{technical_section}
|
| 1076 |
+
</div>
|
| 1077 |
+
|
| 1078 |
+
<div class="section">
|
| 1079 |
+
<h2>π Content Audit</h2>
|
| 1080 |
+
{content_section}
|
| 1081 |
+
</div>
|
| 1082 |
+
|
| 1083 |
+
{competitor_section}
|
| 1084 |
+
|
| 1085 |
+
<div class="section">
|
| 1086 |
+
<h2>π§ Future Modules</h2>
|
| 1087 |
+
{placeholder_sections}
|
| 1088 |
+
</div>
|
| 1089 |
+
|
| 1090 |
+
<div class="section">
|
| 1091 |
+
{recommendations}
|
| 1092 |
+
</div>
|
| 1093 |
+
</div>
|
| 1094 |
+
</body>
|
| 1095 |
+
</html>
|
| 1096 |
+
"""
|
requirements.txt
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
streamlit
|
| 2 |
+
requests
|
| 3 |
+
beautifulsoup4
|
| 4 |
+
pandas
|
| 5 |
+
plotly
|
| 6 |
+
jinja2
|
| 7 |
+
validators
|
| 8 |
+
urllib3
|
| 9 |
+
lxml
|
run.py
ADDED
|
@@ -0,0 +1,40 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Quick start script for SEO Report Generator
|
| 3 |
+
"""
|
| 4 |
+
|
| 5 |
+
import subprocess
|
| 6 |
+
import sys
|
| 7 |
+
import os
|
| 8 |
+
|
| 9 |
+
def main():
|
| 10 |
+
print("π SEO Report Generator")
|
| 11 |
+
print("=" * 40)
|
| 12 |
+
|
| 13 |
+
# Check if we're in the right directory
|
| 14 |
+
if not os.path.exists('app.py'):
|
| 15 |
+
print("β Error: app.py not found. Make sure you're in the correct directory.")
|
| 16 |
+
sys.exit(1)
|
| 17 |
+
|
| 18 |
+
print("π¦ Starting Streamlit application...")
|
| 19 |
+
print("π App will be available at: http://localhost:8501")
|
| 20 |
+
print("π Press Ctrl+C to stop the application")
|
| 21 |
+
print("\nπ‘ Quick Tips:")
|
| 22 |
+
print(" β’ Enter any website URL to analyze")
|
| 23 |
+
print(" β’ Add competitor URLs for benchmarking")
|
| 24 |
+
print(" β’ Reports include technical SEO + content audit")
|
| 25 |
+
print(" β’ Download HTML reports (PDF via browser print)")
|
| 26 |
+
print("=" * 40)
|
| 27 |
+
|
| 28 |
+
try:
|
| 29 |
+
# Start Streamlit app
|
| 30 |
+
subprocess.run([sys.executable, "-m", "streamlit", "run", "app.py"], check=True)
|
| 31 |
+
except KeyboardInterrupt:
|
| 32 |
+
print("\nπ Application stopped by user")
|
| 33 |
+
except subprocess.CalledProcessError as e:
|
| 34 |
+
print(f"β Error starting application: {e}")
|
| 35 |
+
print("π‘ Make sure you have installed the requirements: pip install -r requirements.txt")
|
| 36 |
+
except FileNotFoundError:
|
| 37 |
+
print("β Streamlit not found. Install it with: pip install streamlit")
|
| 38 |
+
|
| 39 |
+
if __name__ == "__main__":
|
| 40 |
+
main()
|
simple_pdf_generator.py
ADDED
|
@@ -0,0 +1,104 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Simple PDF generation fallback using reportlab (if available)
|
| 3 |
+
or browser-based PDF conversion instructions
|
| 4 |
+
"""
|
| 5 |
+
|
| 6 |
+
import io
|
| 7 |
+
from typing import Dict, Any
|
| 8 |
+
|
| 9 |
+
class SimplePDFGenerator:
|
| 10 |
+
def __init__(self):
|
| 11 |
+
self.available = False
|
| 12 |
+
try:
|
| 13 |
+
import reportlab
|
| 14 |
+
self.available = True
|
| 15 |
+
except ImportError:
|
| 16 |
+
self.available = False
|
| 17 |
+
|
| 18 |
+
def generate_pdf(self, html_content: str) -> bytes:
|
| 19 |
+
"""
|
| 20 |
+
Generate PDF from HTML content using simple text-based approach
|
| 21 |
+
"""
|
| 22 |
+
if not self.available:
|
| 23 |
+
raise ImportError("PDF generation requires reportlab: pip install reportlab")
|
| 24 |
+
|
| 25 |
+
# Import reportlab components
|
| 26 |
+
from reportlab.pdfgen import canvas
|
| 27 |
+
from reportlab.lib.pagesizes import letter, A4
|
| 28 |
+
from reportlab.lib.styles import getSampleStyleSheet
|
| 29 |
+
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer
|
| 30 |
+
from reportlab.lib.units import inch
|
| 31 |
+
from bs4 import BeautifulSoup
|
| 32 |
+
|
| 33 |
+
# Parse HTML and extract text content
|
| 34 |
+
soup = BeautifulSoup(html_content, 'html.parser')
|
| 35 |
+
|
| 36 |
+
# Remove style and script tags
|
| 37 |
+
for tag in soup(["style", "script"]):
|
| 38 |
+
tag.decompose()
|
| 39 |
+
|
| 40 |
+
# Create PDF buffer
|
| 41 |
+
buffer = io.BytesIO()
|
| 42 |
+
|
| 43 |
+
# Create PDF document
|
| 44 |
+
doc = SimpleDocTemplate(buffer, pagesize=A4)
|
| 45 |
+
styles = getSampleStyleSheet()
|
| 46 |
+
story = []
|
| 47 |
+
|
| 48 |
+
# Extract title
|
| 49 |
+
title_tag = soup.find('title')
|
| 50 |
+
title = title_tag.text if title_tag else "SEO Report"
|
| 51 |
+
|
| 52 |
+
# Add title
|
| 53 |
+
story.append(Paragraph(title, styles['Title']))
|
| 54 |
+
story.append(Spacer(1, 12))
|
| 55 |
+
|
| 56 |
+
# Extract main content sections
|
| 57 |
+
sections = soup.find_all(['h1', 'h2', 'h3', 'p', 'div'])
|
| 58 |
+
|
| 59 |
+
for section in sections:
|
| 60 |
+
if section.name in ['h1', 'h2', 'h3']:
|
| 61 |
+
# Headers
|
| 62 |
+
text = section.get_text().strip()
|
| 63 |
+
if text:
|
| 64 |
+
if section.name == 'h1':
|
| 65 |
+
story.append(Paragraph(text, styles['Heading1']))
|
| 66 |
+
elif section.name == 'h2':
|
| 67 |
+
story.append(Paragraph(text, styles['Heading2']))
|
| 68 |
+
else:
|
| 69 |
+
story.append(Paragraph(text, styles['Heading3']))
|
| 70 |
+
story.append(Spacer(1, 6))
|
| 71 |
+
|
| 72 |
+
elif section.name in ['p', 'div']:
|
| 73 |
+
# Paragraphs
|
| 74 |
+
text = section.get_text().strip()
|
| 75 |
+
if text and len(text) > 20: # Skip very short text
|
| 76 |
+
try:
|
| 77 |
+
story.append(Paragraph(text[:500], styles['Normal'])) # Limit length
|
| 78 |
+
story.append(Spacer(1, 6))
|
| 79 |
+
except:
|
| 80 |
+
pass # Skip problematic content
|
| 81 |
+
|
| 82 |
+
# Build PDF
|
| 83 |
+
doc.build(story)
|
| 84 |
+
|
| 85 |
+
# Get PDF data
|
| 86 |
+
buffer.seek(0)
|
| 87 |
+
return buffer.getvalue()
|
| 88 |
+
|
| 89 |
+
def create_browser_pdf_instructions() -> str:
|
| 90 |
+
"""
|
| 91 |
+
Return instructions for manual PDF creation using browser
|
| 92 |
+
"""
|
| 93 |
+
return """
|
| 94 |
+
## How to Create PDF from HTML Report:
|
| 95 |
+
|
| 96 |
+
1. **Download the HTML report** using the button above
|
| 97 |
+
2. **Open the HTML file** in your web browser (Chrome, Firefox, Edge)
|
| 98 |
+
3. **Print the page**: Press Ctrl+P (Windows) or Cmd+P (Mac)
|
| 99 |
+
4. **Select destination**: Choose "Save as PDF" or "Microsoft Print to PDF"
|
| 100 |
+
5. **Adjust settings**: Select A4 size, include background graphics
|
| 101 |
+
6. **Save**: Click Save and choose your location
|
| 102 |
+
|
| 103 |
+
This will create a high-quality PDF with all charts and formatting preserved.
|
| 104 |
+
"""
|
test_app.py
ADDED
|
@@ -0,0 +1,122 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Test script for SEO Report Generator
|
| 3 |
+
Run this to test the core functionality without the Streamlit UI
|
| 4 |
+
"""
|
| 5 |
+
|
| 6 |
+
from modules.technical_seo import TechnicalSEOModule
|
| 7 |
+
from modules.content_audit import ContentAuditModule
|
| 8 |
+
from report_generator import ReportGenerator
|
| 9 |
+
from pdf_generator import PDFGenerator
|
| 10 |
+
|
| 11 |
+
def test_seo_report_generation():
|
| 12 |
+
"""Test the complete SEO report generation process"""
|
| 13 |
+
|
| 14 |
+
# Test URLs
|
| 15 |
+
test_urls = [
|
| 16 |
+
"https://example.com",
|
| 17 |
+
"https://python.org",
|
| 18 |
+
"https://github.com"
|
| 19 |
+
]
|
| 20 |
+
|
| 21 |
+
print("π Starting SEO Report Generator Tests\n")
|
| 22 |
+
|
| 23 |
+
for url in test_urls:
|
| 24 |
+
print(f"Testing URL: {url}")
|
| 25 |
+
print("-" * 50)
|
| 26 |
+
|
| 27 |
+
try:
|
| 28 |
+
# Initialize modules
|
| 29 |
+
technical_module = TechnicalSEOModule()
|
| 30 |
+
content_module = ContentAuditModule()
|
| 31 |
+
report_gen = ReportGenerator()
|
| 32 |
+
|
| 33 |
+
# Technical SEO Analysis
|
| 34 |
+
print("β‘ Running Technical SEO analysis...")
|
| 35 |
+
technical_data = technical_module.analyze(url)
|
| 36 |
+
|
| 37 |
+
if technical_data.get('error'):
|
| 38 |
+
print(f"β οΈ Technical analysis failed: {technical_data['error']}")
|
| 39 |
+
else:
|
| 40 |
+
mobile_score = technical_data.get('mobile', {}).get('performance_score', 0)
|
| 41 |
+
desktop_score = technical_data.get('desktop', {}).get('performance_score', 0)
|
| 42 |
+
print(f"β
Performance scores - Mobile: {mobile_score}/100, Desktop: {desktop_score}/100")
|
| 43 |
+
|
| 44 |
+
# Content Audit
|
| 45 |
+
print("π Running Content audit...")
|
| 46 |
+
content_data = content_module.analyze(url, quick_scan=True) # Quick scan for testing
|
| 47 |
+
|
| 48 |
+
if content_data.get('error'):
|
| 49 |
+
print(f"β οΈ Content analysis failed: {content_data['error']}")
|
| 50 |
+
else:
|
| 51 |
+
pages_analyzed = content_data.get('pages_analyzed', 0)
|
| 52 |
+
title_coverage = content_data.get('metadata_completeness', {}).get('title_coverage', 0)
|
| 53 |
+
print(f"β
Content metrics - Pages analyzed: {pages_analyzed}, Title coverage: {title_coverage}%")
|
| 54 |
+
|
| 55 |
+
# Generate HTML Report
|
| 56 |
+
print("π Generating HTML report...")
|
| 57 |
+
report_html = report_gen.generate_html_report(
|
| 58 |
+
url=url,
|
| 59 |
+
technical_data=technical_data,
|
| 60 |
+
content_data=content_data,
|
| 61 |
+
include_charts=True
|
| 62 |
+
)
|
| 63 |
+
|
| 64 |
+
# Save HTML report
|
| 65 |
+
filename = f"test_report_{url.replace('https://', '').replace('/', '_')}.html"
|
| 66 |
+
with open(filename, 'w', encoding='utf-8') as f:
|
| 67 |
+
f.write(report_html)
|
| 68 |
+
print(f"β
HTML report saved: {filename}")
|
| 69 |
+
|
| 70 |
+
# Test PDF generation
|
| 71 |
+
print("π Testing PDF generation...")
|
| 72 |
+
try:
|
| 73 |
+
pdf_gen = PDFGenerator()
|
| 74 |
+
pdf_data = pdf_gen.generate_pdf(report_html)
|
| 75 |
+
|
| 76 |
+
pdf_filename = filename.replace('.html', '.pdf')
|
| 77 |
+
with open(pdf_filename, 'wb') as f:
|
| 78 |
+
f.write(pdf_data)
|
| 79 |
+
print(f"β
PDF report saved: {pdf_filename}")
|
| 80 |
+
|
| 81 |
+
except Exception as pdf_error:
|
| 82 |
+
print(f"β οΈ PDF generation failed: {pdf_error}")
|
| 83 |
+
|
| 84 |
+
print("β
Test completed successfully!\n")
|
| 85 |
+
|
| 86 |
+
except Exception as e:
|
| 87 |
+
print(f"β Test failed for {url}: {str(e)}\n")
|
| 88 |
+
|
| 89 |
+
def test_individual_modules():
|
| 90 |
+
"""Test individual modules separately"""
|
| 91 |
+
print("π§ͺ Testing Individual Modules\n")
|
| 92 |
+
|
| 93 |
+
# Test Technical SEO Module
|
| 94 |
+
print("Testing Technical SEO Module...")
|
| 95 |
+
tech_module = TechnicalSEOModule()
|
| 96 |
+
tech_result = tech_module.analyze("https://example.com")
|
| 97 |
+
print(f"Technical SEO result keys: {list(tech_result.keys())}")
|
| 98 |
+
|
| 99 |
+
# Test Content Audit Module
|
| 100 |
+
print("\nTesting Content Audit Module...")
|
| 101 |
+
content_module = ContentAuditModule()
|
| 102 |
+
content_result = content_module.analyze("https://example.com", quick_scan=True)
|
| 103 |
+
print(f"Content Audit result keys: {list(content_result.keys())}")
|
| 104 |
+
|
| 105 |
+
print("\nβ
Individual module tests completed!")
|
| 106 |
+
|
| 107 |
+
if __name__ == "__main__":
|
| 108 |
+
print("=" * 60)
|
| 109 |
+
print("SEO REPORT GENERATOR - TEST SUITE")
|
| 110 |
+
print("=" * 60)
|
| 111 |
+
|
| 112 |
+
# Run individual module tests
|
| 113 |
+
test_individual_modules()
|
| 114 |
+
print("\n" + "=" * 60 + "\n")
|
| 115 |
+
|
| 116 |
+
# Run full report generation tests
|
| 117 |
+
test_seo_report_generation()
|
| 118 |
+
|
| 119 |
+
print("=" * 60)
|
| 120 |
+
print("π All tests completed!")
|
| 121 |
+
print("Check the generated HTML and PDF files to verify output.")
|
| 122 |
+
print("=" * 60)
|