🌐 AI Web Analyzer

A comprehensive full-stack web analyzer with advanced AI-powered content analysis, built with Express.js, React.js, TypeScript, and Playwright. Extract, analyze, and understand any website with intelligent insights, SEO recommendations, and content quality scoring.

✨ Features

🔍 Smart Web Analysis

Playwright-powered scraping: Handles JavaScript-rendered content and dynamic websites
Comprehensive extraction: Titles, headings (H1-H6), paragraphs, links, images, and metadata
Intelligent content prioritization: Automatically ranks content by importance
Robust error handling: Handles timeouts, invalid URLs, and edge cases

🤖 Advanced AI Analysis

Content summarization: AI-generated comprehensive summaries
Entity extraction: Identifies people, organizations, locations, and technologies
Keyword analysis: Extracts keywords with relevance scores (0-100)
Topic extraction: Automatically identifies key topics and themes
Sentiment analysis: Determines sentiment with confidence scores
Content categorization: Classifies content into relevant categories
Readability scoring: Flesch-Kincaid based readability analysis
Content quality scoring: Overall content quality assessment (0-100)
Competitive insights: Strategic recommendations for improvement

📊 Comprehensive Analytics

SEO insights: Detailed SEO analysis with actionable recommendations
Link analysis: Internal vs external links, broken link detection
Image analysis: Alt text coverage and optimization metrics
Content metrics: Word count, reading time, heading structure analysis
SEO scoring: Overall SEO score from 0-100
Quality scoring: Content quality assessment with specific insights

🎨 Modern UI/UX

Minimal, clean design: Shows only essential data by default
Expandable sections: Detailed data available on demand
Dark mode design: Beautiful, modern dark theme with glassmorphism effects
Responsive layout: Works perfectly on all screen sizes
Smooth animations: Micro-interactions and transitions for better UX
Real-time feedback: Loading states, error messages, and progress indicators

🚀 Quick Start

Prerequisites

Node.js 18+ installed
npm or yarn package manager

Installation

Clone the repository ```bash git clone cd ai-web-analyzer ```
Install backend dependencies ```bash cd backend npm install ```
Set up environment variables ```bash cp .env.example .env ```

Edit .env and add your Gemini API key: ```env PORT=3001 GEMINI_API_KEY=your_gemini_api_key_here CORS_ORIGIN=http://localhost:5173 NODE_ENV=development ```

Install frontend dependencies ```bash cd ../frontend npm install ```

Running the Application

Start the backend server (in the backend directory): ```bash npm run dev ``` Backend will run on http://localhost:3001
Start the frontend (in the frontend directory): ```bash npm run dev ``` Frontend will run on http://localhost:5173
Open your browser and navigate to http://localhost:5173

📖 API Documentation

POST /api/scrape

Analyze a website with comprehensive AI insights.

Request Body: ```json { "url": "https://example.com", "options": { "waitForSelector": "optional-css-selector", "timeout": 30000, "includeAIAnalysis": true } } ```

Response: ```json { "success": true, "data": { "id": "uuid", "data": { "url": "https://example.com", "title": "Page Title", "headings": { "h1": [], "h2": [], ... }, "paragraphs": [{ "text": "...", "summary": "...", "importance": 85 }], "links": [{ "text": "...", "href": "...", "isInternal": true }], "images": [{ "src": "...", "alt": "..." }], "metadata": { ... } }, "aiAnalysis": { "contentSummary": "...", "keyTopics": [], "sentiment": "positive", "sentimentConfidence": 85, "readabilityScore": 75, "seoInsights": { ... }, "contentCategories": [], "entities": { "people": [], "organizations": [], "locations": [], "technologies": [] }, "keywords": [{ "keyword": "...", "relevance": 95 }], "contentQualityScore": 82, "contentQualityInsights": [], "competitiveInsights": [] }, "analytics": { "totalWords": 1500, "readingTime": 7, "linkAnalysis": { ... }, "imageAnalysis": { ... }, "headingAnalysis": { ... }, "seoScore": 85 } } } ```

GET /api/scrape/:id

Retrieve a specific analysis result by ID.

GET /api/history

Get analysis history (last 50 results).

GET /api/health

Health check endpoint.

🏗️ Project Structure

``` ai-web-analyzer/ ├── backend/ │ ├── src/ │ │ ├── services/ │ │ │ ├── scraper.service.ts # Playwright scraping logic │ │ │ ├── ai.service.ts # Gemini AI integration │ │ │ └── analytics.service.ts # Analytics generation │ │ ├── routes/ │ │ │ └── scraper.routes.ts # API routes │ │ ├── middleware/ │ │ │ ├── validation.middleware.ts │ │ │ └── error.middleware.ts │ │ ├── types/ │ │ │ └── index.ts # TypeScript types │ │ └── server.ts # Express server │ ├── package.json │ └── tsconfig.json │ └── frontend/ ├── src/ │ ├── components/ │ │ └── ResultsDisplay.tsx # Results UI component │ ├── services/ │ │ └── api.ts # API client │ ├── App.tsx # Main app component │ └── index.css # Design system ├── package.json └── vite.config.ts ```

🛠️ Technologies Used

Backend

Express.js: Web framework
TypeScript: Type-safe JavaScript
Playwright: Headless browser automation
Google Gemini AI: Advanced AI-powered content analysis
Zod: Schema validation
Helmet: Security middleware
CORS: Cross-origin resource sharing
Express Rate Limit: API rate limiting

Frontend

React 19: UI library
TypeScript: Type-safe JavaScript
Vite: Build tool and dev server
Axios: HTTP client
Lucide React: Icon library
CSS Variables: Design system

🎯 Use Cases

SEO Analysis: Comprehensive SEO optimization opportunities and recommendations
Content Research: Extract and analyze content from competitor websites
Content Quality Assessment: Evaluate content quality with AI-powered insights
Entity Extraction: Identify key people, organizations, locations, and technologies
Keyword Research: Extract relevant keywords with importance scores
Website Audits: Perform comprehensive website audits with actionable insights
Competitive Analysis: Get strategic recommendations for improvement
Content Migration: Extract content for migration purposes
Market Research: Analyze content trends and strategies across multiple sites

🔒 Security Features

Rate Limiting: Prevents API abuse (100 requests per 15 minutes)
Helmet.js: Sets security-related HTTP headers
CORS: Configured for specific origins
Input Validation: Zod schema validation for all inputs
Error Handling: Comprehensive error handling and sanitization

🚧 Error Handling

The application handles various edge cases:

Invalid URLs
Timeout errors
Network failures
JavaScript-heavy websites
Protected/blocked content
Missing or malformed data
API rate limits
AI analysis failures

📝 Environment Variables

Variable	Description	Default
`PORT`	Backend server port	3001
`GEMINI_API_KEY`	Google Gemini API key	Required
`CORS_ORIGIN`	Allowed CORS origin	http://localhost:5173
`NODE_ENV`	Environment mode	development

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📄 License

MIT License - feel free to use this project for learning and development.

🙏 Acknowledgments

Google Gemini AI for powerful content analysis
Playwright for robust web scraping
React and Vite for excellent developer experience

Built with ❤️ using TypeScript, React, and Express.js

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
API_DOCUMENTATION.md		API_DOCUMENTATION.md
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
DEPLOYMENT.md		DEPLOYMENT.md
LICENSE		LICENSE
README.md		README.md
USAGE_EXAMPLES.md		USAGE_EXAMPLES.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌐 AI Web Analyzer

✨ Features

🔍 Smart Web Analysis

🤖 Advanced AI Analysis

📊 Comprehensive Analytics

🎨 Modern UI/UX

🚀 Quick Start

Prerequisites

Installation

Running the Application

📖 API Documentation

POST /api/scrape

GET /api/scrape/:id

GET /api/history

GET /api/health

🏗️ Project Structure

🛠️ Technologies Used

Backend

Frontend

🎯 Use Cases

🔒 Security Features

🚧 Error Handling

📝 Environment Variables

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🌐 AI Web Analyzer

✨ Features

🔍 Smart Web Analysis

🤖 Advanced AI Analysis

📊 Comprehensive Analytics

🎨 Modern UI/UX

🚀 Quick Start

Prerequisites

Installation

Running the Application

📖 API Documentation

POST /api/scrape

GET /api/scrape/:id

GET /api/history

GET /api/health

🏗️ Project Structure

🛠️ Technologies Used

Backend

Frontend

🎯 Use Cases

🔒 Security Features

🚧 Error Handling

📝 Environment Variables

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages