diff --git a/README.md b/README.md index 701a09c..902458c 100644 --- a/README.md +++ b/README.md @@ -1,1173 +1,36 @@ -# Nordic Privacy AI πŸ›‘οΈ# Nordic Privacy AI πŸ›‘οΈ# AI Governance Module +# Mushroom Empire : ShroomShield +## Problem Statement 4 : AI-Powered GDPR Compliance & Personal Data Protection for Nordic Software Ecosystems +Nordic organizations handle vast amounts of personal data through digital +platforms in healthcare, retail, housing, and citizen services. With the EU’s +GDPR and local data laws, ensuring privacy and compliance has become +both critical and complex. +Many companies lack visibility into where sensitive information (like names, +health data, and national IDs) is stored across apps, logs, and cloud systems. +Manual audits and static policies can’t keep pace with the volume and velocity of +data. +This creates major risks β€” from data breaches and consent violations to +multimillion-euro GDPR fines and loss of public trust. +Identity systems are also fragmented: while BankID, MitID, and Suomi.fi exist, +most software platforms don’t intelligently manage user identity or data flow +between them. +There’s a growing need for AI-driven solutions that can automatically detect, +classify, and anonymize PII, identify security gaps, and continuously monitor +compliance across digital ecosystems. +Nordic companies need proactive, intelligent privacy protection that aligns with +regional trust culture and EU-level data regulations β€” not reactive, manual +compliance. -**AI-Powered GDPR Compliance & Privacy Protection Platform** - - - -A comprehensive solution for AI governance, bias detection, risk assessment, and automated PII cleaning with GDPR compliance. Built for Nordic ecosystems and beyond.**AI-Powered GDPR Compliance & Privacy Protection Platform**A Python package for detecting bias and analyzing risks in machine learning models. Provides comprehensive fairness metrics, privacy risk assessment, and ethical AI evaluation. - - - -[![Python](https://img.shields.io/badge/Python-3.8+-blue.svg)](https://www.python.org/) - -[![FastAPI](https://img.shields.io/badge/FastAPI-0.109+-green.svg)](https://fastapi.tiangolo.com/) - -[![Next.js](https://img.shields.io/badge/Next.js-14.2+-black.svg)](https://nextjs.org/)A comprehensive solution for AI governance, bias detection, risk assessment, and automated PII cleaning with GDPR compliance. Built for Nordic ecosystems and beyond.## Features - -[![License](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE) - +## Members +``` +Ashish Y G +Dion Joshua Lobo +Nithin +Shovin Jeson Dsouza +``` --- -[![Python](https://img.shields.io/badge/Python-3.8+-blue.svg)](https://www.python.org/)### 🎯 Bias Detection - -## πŸš€ Quick Start - -[![FastAPI](https://img.shields.io/badge/FastAPI-0.109+-green.svg)](https://fastapi.tiangolo.com/)- **Fairness Metrics**: Disparate Impact, Statistical Parity Difference, Equal Opportunity Difference - -### Prerequisites - -- Python 3.8+[![Next.js](https://img.shields.io/badge/Next.js-14.2+-black.svg)](https://nextjs.org/)- **Demographic Analysis**: Group-wise performance evaluation - -- Node.js 18+ - -- GPU (optional, for faster processing)[![License](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)- **Violation Detection**: Automatic flagging with severity levels - - - -### Installation - - - -1. **Clone the repository**---### πŸ›‘οΈ Risk Assessment - -```powershell - -git clone https://github.com/PlatypusPus/MushroomEmpire.git- **Privacy Risks**: PII detection, GDPR compliance, data exposure analysis - -cd MushroomEmpire - -```## πŸš€ Quick Start- **Ethical Risks**: Fairness, transparency, accountability, social impact - - - -2. **Install Python dependencies**- **Compliance Risks**: Regulatory adherence (GDPR, CCPA, AI Act) - -```powershell - -pip install -r requirements.txt### Prerequisites- **Data Quality**: Missing data, class imbalance, outlier detection - -python -m spacy download en_core_web_sm - -```- Python 3.8+ - - - -3. **Install frontend dependencies**- Node.js 18+### πŸ€– Machine Learning - -```powershell - -cd frontend- GPU (optional, for faster processing)- Generalized classification model (works with any dataset) - -npm install - -cd ..- Auto-detection of feature types and protected attributes - -``` - -### Installation- Comprehensive performance metrics - -### Running the Application - -- Feature importance analysis - -1. **Start the FastAPI backend** (Terminal 1) - -```powershell1. **Clone the repository** - -python start_api.py - -``````powershell## Installation - -Backend runs at: **http://localhost:8000** - -git clone https://github.com/PlatypusPus/MushroomEmpire.git - -2. **Start the Next.js frontend** (Terminal 2) - -```powershellcd MushroomEmpire```bash - -cd frontend - -npm run dev```pip install -r requirements.txt - -``` - -Frontend runs at: **http://localhost:3000**``` - - - -3. **Access the application**2. **Install Python dependencies** - - - Frontend UI: http://localhost:3000 - - - Try It Page: http://localhost:3000/try```powershellOr install as a package: - - - API Documentation: http://localhost:8000/docs - - - Health Check: http://localhost:8000/healthpip install -r requirements.txt - - - ----python -m spacy download en_core_web_sm```bash - - - -## πŸ“‹ Features```pip install -e . - - - -### 🎯 AI Governance & Bias Detection``` - -- **Fairness Metrics**: Disparate Impact, Statistical Parity, Equal Opportunity - -- **Demographic Analysis**: Group-wise performance evaluation3. **Install frontend dependencies** - -- **Violation Detection**: Automatic flagging with severity levels (HIGH/MEDIUM/LOW) - -- **Model Performance**: Comprehensive ML metrics (accuracy, precision, recall, F1)```powershell## Quick Start - - - -### πŸ›‘οΈ Privacy Risk Assessmentcd frontend/nordic-privacy-ai - -- **Privacy Risks**: PII detection, GDPR compliance scoring, data exposure analysis - -- **Ethical Risks**: Fairness, transparency, accountability evaluationnpm install```python - -- **Compliance Risks**: Regulatory adherence (GDPR, CCPA, AI Act) - -- **Data Quality**: Missing data, class imbalance, outlier detectioncd ../..from ai_governance import AIGovernanceAnalyzer - - - -### 🧹 Automated Data Cleaning``` - -- **PII Detection**: Email, phone, SSN, credit cards, IP addresses, and more - -- **GPU Acceleration**: CUDA-enabled for 10x faster processing# Initialize analyzer - -- **GDPR Compliance**: Automatic anonymization with audit trails - -- **Smart Anonymization**: Context-aware masking and pseudonymization### Running the Applicationanalyzer = AIGovernanceAnalyzer() - - - -### 🌐 Modern Web Interface - -- **Drag & Drop Upload**: Intuitive CSV file handling - -- **Real-time Processing**: Live feedback and progress tracking1. **Start the FastAPI backend** (Terminal 1)# Run complete analysis - -- **Interactive Dashboards**: Visualize bias metrics, risk scores, and results - -- **Report Downloads**: JSON reports, cleaned CSV, and audit logs```powershellreport = analyzer.analyze( - - - ----python start_api.py data_path='your_data.csv', - - - -## πŸ—οΈ Project Structure``` target_column='target', - - - -```Backend runs at: **http://localhost:8000** protected_attributes=['gender', 'age', 'race'] - -MushroomEmpire/ - -β”œβ”€β”€ api/ # FastAPI Backend) - -β”‚ β”œβ”€β”€ main.py # Application entry point - -β”‚ β”œβ”€β”€ routers/2. **Start the Next.js frontend** (Terminal 2) - -β”‚ β”‚ β”œβ”€β”€ analyze.py # POST /api/analyze - AI Governance - -β”‚ β”‚ └── clean.py # POST /api/clean - Data Cleaning```powershell# Access results - -β”‚ └── utils/ # Helper utilities - -β”‚cd frontend/nordic-privacy-aiprint(f"Bias Score: {report['summary']['overall_bias_score']:.3f}") - -β”œβ”€β”€ ai_governance/ # Core AI Governance Module - -β”‚ β”œβ”€β”€ __init__.py # AIGovernanceAnalyzer classnpm run devprint(f"Risk Level: {report['summary']['risk_level']}") - -β”‚ β”œβ”€β”€ data_processor.py # Data preprocessing - -β”‚ β”œβ”€β”€ model_trainer.py # ML model training```print(f"Model Accuracy: {report['summary']['model_accuracy']:.3f}") - -β”‚ β”œβ”€β”€ bias_analyzer.py # Bias detection engine - -β”‚ β”œβ”€β”€ risk_analyzer.py # Risk assessment engineFrontend runs at: **http://localhost:3000** - -β”‚ └── report_generator.py # JSON report generation - -β”‚# Save report - -β”œβ”€β”€ data_cleaning/ # Data Cleaning Module - -β”‚ β”œβ”€β”€ __init__.py # DataCleaner class3. **Access the application**analyzer.save_report(report, 'governance_report.json') - -β”‚ β”œβ”€β”€ cleaner.py # PII detection & anonymization - -β”‚ └── config.py # PII patterns & GDPR rules - Frontend UI: http://localhost:3000``` - -β”‚ - -β”œβ”€β”€ frontend/ # Next.js Frontend - API Documentation: http://localhost:8000/docs - -β”‚ β”œβ”€β”€ app/ # App Router pages - -β”‚ β”‚ β”œβ”€β”€ page.tsx # Landing page - Health Check: http://localhost:8000/health## Module Structure - -β”‚ β”‚ └── try/page.tsx # Try it page (workflow UI) - -β”‚ β”œβ”€β”€ components/ - -β”‚ β”‚ └── try/ - -β”‚ β”‚ β”œβ”€β”€ CenterPanel.tsx # File upload & results---``` - -β”‚ β”‚ β”œβ”€β”€ Sidebar.tsx # Workflow tabs - -β”‚ β”‚ └── ChatbotPanel.tsx # AI assistantai_governance/ - -β”‚ └── lib/ - -β”‚ β”œβ”€β”€ api.ts # TypeScript API client## πŸ“‹ Featuresβ”œβ”€β”€ __init__.py # Main API - -β”‚ └── indexeddb.ts # Browser caching utilities - -β”‚β”œβ”€β”€ data_processor.py # Data preprocessing - -β”œβ”€β”€ Datasets/ # Sample datasets - -β”‚ └── loan_data.csv # Example: Loan approval dataset### 🎯 AI Governance & Bias Detectionβ”œβ”€β”€ model_trainer.py # ML model training - -β”‚ - -β”œβ”€β”€ reports/ # Generated reports (auto-created)- **Fairness Metrics**: Disparate Impact, Statistical Parity, Equal Opportunityβ”œβ”€β”€ bias_analyzer.py # Bias detection - -β”‚ β”œβ”€β”€ governance_report_*.json - -β”‚ β”œβ”€β”€ cleaned_*.csv- **Demographic Analysis**: Group-wise performance evaluationβ”œβ”€β”€ risk_analyzer.py # Risk assessment - -β”‚ └── cleaning_audit_*.json - -β”‚- **Violation Detection**: Automatic flagging with severity levels (HIGH/MEDIUM/LOW)└── report_generator.py # Report generation - -β”œβ”€β”€ start_api.py # Backend startup script - -β”œβ”€β”€ setup.py # Package configuration- **Model Performance**: Comprehensive ML metrics (accuracy, precision, recall, F1)``` - -β”œβ”€β”€ requirements.txt # Python dependencies - -└── README.md # This file - -``` - -### πŸ›‘οΈ Privacy Risk Assessment## API Reference - ---- - -- **Privacy Risks**: PII detection, GDPR compliance scoring, data exposure analysis - -## πŸ“‘ API Reference - -- **Ethical Risks**: Fairness, transparency, accountability evaluation### AIGovernanceAnalyzer - -### Base URL - -```- **Compliance Risks**: Regulatory adherence (GDPR, CCPA, AI Act) - -http://localhost:8000 - -```- **Data Quality**: Missing data, class imbalance, outlier detectionMain class for running AI governance analysis. - - - -### Endpoints - - - -#### **POST /api/analyze**### 🧹 Automated Data Cleaning```python - -Analyze dataset for bias, fairness, and risk assessment. - -- **PII Detection**: Email, phone, SSN, credit cards, IP addresses, and moreanalyzer = AIGovernanceAnalyzer() - -**Request:** - -```bash- **GPU Acceleration**: CUDA-enabled for 10x faster processing - -curl -X POST "http://localhost:8000/api/analyze" \ - - -F "file=@Datasets/loan_data.csv"- **GDPR Compliance**: Automatic anonymization with audit trails# Analyze from DataFrame - -``` - -- **Smart Anonymization**: Context-aware masking and pseudonymizationreport = analyzer.analyze_dataframe( - -**Response:** - -```json df=dataframe, - -{ - - "status": "success",### 🌐 Modern Web Interface target_column='target', - - "filename": "loan_data.csv", - - "dataset_info": {- **Drag & Drop Upload**: Intuitive CSV file handling protected_attributes=['gender', 'age'] - - "rows": 1000, - - "columns": 15- **Real-time Processing**: Live feedback and progress tracking) - - }, - - "model_performance": {- **Interactive Dashboards**: Visualize bias metrics, risk scores, and results - - "accuracy": 0.85, - - "precision": 0.82,- **Report Downloads**: JSON reports, cleaned CSV, and audit logs# Analyze from file - - "recall": 0.88, - - "f1_score": 0.85report = analyzer.analyze( - - }, - - "bias_metrics": {--- data_path='data.csv', - - "overall_bias_score": 0.23, - - "violations_detected": [] target_column='target', - - }, - - "risk_assessment": {## πŸ—οΈ Project Structure protected_attributes=['gender', 'age'] - - "overall_risk_score": 0.35, - - "privacy_risks": [],) - - "ethical_risks": [] - - },`````` - - "recommendations": [ - - "[HIGH] Privacy: Remove PII columns before deployment",MushroomEmpire/ - - "[MEDIUM] Fairness: Monitor demographic parity over time" - - ],β”œβ”€β”€ api/ # FastAPI Backend### Individual Components - - "report_file": "/reports/governance_report_20251107_123456.json" - -}β”‚ β”œβ”€β”€ main.py # Application entry point - -``` - -β”‚ β”œβ”€β”€ routers/```python - -#### **POST /api/clean** - -Detect and anonymize PII in datasets.β”‚ β”‚ β”œβ”€β”€ analyze.py # POST /api/analyze - AI Governancefrom ai_governance import ( - - - -**Request:**β”‚ β”‚ └── clean.py # POST /api/clean - Data Cleaning DataProcessor, - -```bash - -curl -X POST "http://localhost:8000/api/clean" \β”‚ └── utils/ # Helper utilities GeneralizedModelTrainer, - - -F "file=@Datasets/loan_data.csv" - -```β”‚ BiasAnalyzer, - - - -**Response:**β”œβ”€β”€ ai_governance/ # Core AI Governance Module RiskAnalyzer, - -```json - -{β”‚ β”œβ”€β”€ __init__.py # AIGovernanceAnalyzer class ReportGenerator - - "status": "success", - - "dataset_info": {β”‚ β”œβ”€β”€ data_processor.py # Data preprocessing) - - "original_rows": 1000, - - "original_columns": 15,β”‚ β”œβ”€β”€ model_trainer.py # ML model training - - "cleaned_rows": 1000, - - "cleaned_columns": 13β”‚ β”œβ”€β”€ bias_analyzer.py # Bias detection engine# Process data - - }, - - "summary": {β”‚ β”œβ”€β”€ risk_analyzer.py # Risk assessment engineprocessor = DataProcessor(df) - - "columns_removed": ["ssn", "email"], - - "columns_anonymized": ["phone", "address"],β”‚ └── report_generator.py # JSON report generationprocessor.target_column = 'target' - - "total_cells_affected": 2847 - - },β”‚processor.protected_attributes = ['gender', 'age'] - - "pii_detections": { - - "EMAIL": 1000,β”œβ”€β”€ cleaning.py # Core PII detection & anonymizationprocessor.prepare_data() - - "PHONE": 987, - - "SSN": 1000β”œβ”€β”€ cleaning_config.py # Configuration for data cleaning - - }, - - "gdpr_compliance": [β”œβ”€β”€ test_cleaning.py # Unit tests for cleaning module# Train model - - "Article 5(1)(c) - Data minimization", - - "Article 17 - Right to erasure",β”‚trainer = GeneralizedModelTrainer( - - "Article 25 - Data protection by design" - - ],β”œβ”€β”€ frontend/nordic-privacy-ai/ # Next.js Frontend processor.X_train, - - "files": { - - "cleaned_csv": "/reports/cleaned_20251107_123456.csv",β”‚ β”œβ”€β”€ app/ # App Router pages processor.X_test, - - "audit_report": "/reports/cleaning_audit_20251107_123456.json" - - }β”‚ β”‚ β”œβ”€β”€ page.tsx # Landing page processor.y_train, - -} - -```β”‚ β”‚ └── try/page.tsx # Try it page (workflow UI) processor.y_test, - - - -#### **GET /health**β”‚ β”œβ”€β”€ components/ processor.feature_names - -Health check endpoint with GPU status. - -β”‚ β”‚ └── try/) - -**Response:** - -```jsonβ”‚ β”‚ β”œβ”€β”€ CenterPanel.tsx # File upload & resultstrainer.train() - -{ - - "status": "healthy",β”‚ β”‚ β”œβ”€β”€ Sidebar.tsx # Workflow tabstrainer.evaluate() - - "version": "1.0.0", - - "gpu_available": trueβ”‚ β”‚ └── ChatbotPanel.tsx # AI assistant - -} - -```β”‚ └── lib/# Analyze bias - - - -#### **GET /reports/{filename}**β”‚ β”œβ”€β”€ api.ts # TypeScript API clientbias_analyzer = BiasAnalyzer( - -Download generated reports and cleaned files. - -β”‚ └── indexeddb.ts # Browser caching utilities processor.X_test, - ---- - -β”‚ processor.y_test, - -## πŸ”§ Configuration - -β”œβ”€β”€ Datasets/ # Sample datasets trainer.y_pred, - -### Environment Variables - -β”‚ └── loan_data.csv # Example: Loan approval dataset processor.df, - -Create `.env` file in `frontend/`: - -```envβ”‚ processor.protected_attributes, - -NEXT_PUBLIC_API_URL=http://localhost:8000 - -```β”œβ”€β”€ reports/ # Generated reports (auto-created) processor.target_column - - - -### CORS Configurationβ”‚ β”œβ”€β”€ governance_report_*.json) - - - -Edit `api/main.py` to add production domains:β”‚ β”œβ”€β”€ cleaned_*.csvbias_results = bias_analyzer.analyze() - -```python - -origins = [β”‚ └── cleaning_audit_*.json - - "http://localhost:3000", - - "https://your-production-domain.com"β”‚# Assess risks - -] - -```β”œβ”€β”€ start_api.py # Backend startup scriptrisk_analyzer = RiskAnalyzer( - - - -### GPU Accelerationβ”œβ”€β”€ setup.py # Package configuration processor.df, - - - -GPU is automatically detected and used if available. To force CPU mode:β”œβ”€β”€ requirements.txt # Python dependencies trainer.results, - -```python - -# In data_cleaning/cleaner.py or api endpoints└── README.md # This file bias_results, - -DataCleaner(use_gpu=False) - -`````` processor.protected_attributes, - - - ---- processor.target_column - - - -## πŸ§ͺ Testing---) - - - -### Test the Backendrisk_results = risk_analyzer.analyze() - -```powershell - -# Test analyze endpoint## πŸ“‘ API Reference - -curl -X POST "http://localhost:8000/api/analyze" -F "file=@Datasets/loan_data.csv" - -# Generate report - -# Test clean endpoint - -curl -X POST "http://localhost:8000/api/clean" -F "file=@Datasets/loan_data.csv"### Base URLreport_gen = ReportGenerator( - - - -# Check health``` trainer.results, - -curl http://localhost:8000/health - -```http://localhost:8000 bias_results, - - - -### Run Unit Tests``` risk_results, - -```powershell - -# Test cleaning module processor.df - -python test_cleaning.py - -### Endpoints) - -# Run all tests (if pytest configured) - -pytestreport = report_gen.generate_report() - -``` - -#### **POST /api/analyze**``` - ---- - -Analyze dataset for bias, fairness, and risk assessment. - -## πŸ“Š Usage Examples - -## Report Structure - -### Python SDK Usage - -**Request:** - -```python - -from ai_governance import AIGovernanceAnalyzer```bashThe module generates comprehensive JSON reports: - - - -# Initialize analyzercurl -X POST "http://localhost:8000/api/analyze" \ - -analyzer = AIGovernanceAnalyzer() - - -F "file=@Datasets/loan_data.csv"```json - -# Analyze dataset - -report = analyzer.analyze(```{ - - data_path='Datasets/loan_data.csv', - - target_column='loan_approved', "metadata": { - - protected_attributes=['gender', 'age', 'race'] - -)**Response:** "report_id": "unique_id", - - - -# Print results```json "generated_at": "timestamp", - -print(f"Bias Score: {report['summary']['overall_bias_score']:.3f}") - -print(f"Risk Level: {report['summary']['risk_level']}"){ "dataset_info": {} - -print(f"Model Accuracy: {report['summary']['model_accuracy']:.3f}") - - "status": "success", }, - -# Save report - -analyzer.save_report(report, 'my_report.json') "filename": "loan_data.csv", "summary": { - -``` - - "dataset_info": { "overall_bias_score": 0.0-1.0, - -### Data Cleaning Usage - - "rows": 1000, "overall_risk_score": 0.0-1.0, - -```python - -from data_cleaning import DataCleaner "columns": 15 "risk_level": "LOW|MEDIUM|HIGH", - - - -# Initialize cleaner with GPU }, "model_accuracy": 0.0-1.0, - -cleaner = DataCleaner(use_gpu=True) - - "model_performance": { "fairness_violations_count": 0 - -# Load and clean data - -df = cleaner.load_data('Datasets/loan_data.csv') "accuracy": 0.85, }, - -cleaned_df, audit = cleaner.anonymize_pii(df) - - "precision": 0.82, "model_performance": {}, - -# Save results - -cleaner.save_cleaned_data(cleaned_df, 'cleaned_output.csv') "recall": 0.88, "bias_analysis": {}, - -cleaner.save_audit_report(audit, 'audit_report.json') - -``` "f1_score": 0.85 "risk_assessment": {}, - - - -### Frontend Integration }, "key_findings": [], - - - -```typescript "bias_metrics": { "recommendations": [] - -import { analyzeDataset, cleanDataset } from '@/lib/api'; - - "overall_bias_score": 0.23,} - -// Analyze uploaded file - -const handleAnalyze = async (file: File) => { "violations_detected": []``` - - const result = await analyzeDataset(file); - - console.log('Bias Score:', result.bias_metrics.overall_bias_score); }, - - console.log('Download:', result.report_file); - -}; "risk_assessment": {## Metrics Interpretation - - - -// Clean uploaded file "overall_risk_score": 0.35, - -const handleClean = async (file: File) => { - - const result = await cleanDataset(file); "privacy_risks": [],### Bias Score (0-1, lower is better) - - console.log('Cells anonymized:', result.summary.total_cells_affected); - - console.log('Download cleaned:', result.files.cleaned_csv); "ethical_risks": []- **0.0 - 0.3**: Low bias βœ… - -}; - -``` },- **0.3 - 0.5**: Moderate bias ⚠️ - - - ---- "recommendations": [- **0.5 - 1.0**: High bias ❌ - - - -## πŸ“ˆ Metrics Interpretation "[HIGH] Privacy: Remove PII columns before deployment", - - - -### Bias Score (0-1, lower is better) "[MEDIUM] Fairness: Monitor demographic parity over time"### Risk Score (0-1, lower is better) - -- **0.0 - 0.3**: βœ… Low bias - Good fairness - -- **0.3 - 0.5**: ⚠️ Moderate bias - Monitoring recommended ],- **0.0 - 0.4**: LOW risk βœ… - -- **0.5 - 1.0**: ❌ High bias - Immediate action required - - "report_file": "/reports/governance_report_20251107_123456.json"- **0.4 - 0.7**: MEDIUM risk ⚠️ - -### Risk Score (0-1, lower is better) - -- **0.0 - 0.4**: βœ… LOW risk}- **0.7 - 1.0**: HIGH risk ❌ - -- **0.4 - 0.7**: ⚠️ MEDIUM risk - -- **0.7 - 1.0**: ❌ HIGH risk``` - - - -### Fairness Metrics### Fairness Metrics - -- **Disparate Impact**: Fair range 0.8 - 1.25 - -- **Statistical Parity**: Fair threshold < 0.1#### **POST /api/clean**- **Disparate Impact**: Fair range 0.8 - 1.25 - -- **Equal Opportunity**: Fair threshold < 0.1 - -Detect and anonymize PII in datasets.- **Statistical Parity**: Fair threshold < 0.1 - ---- - -- **Equal Opportunity**: Fair threshold < 0.1 - -## πŸ› οΈ Technology Stack - -**Request:** - -### Backend - -- **FastAPI** - Modern Python web framework```bash## Requirements - -- **scikit-learn** - Machine learning - -- **spaCy** - NLP for PII detectioncurl -X POST "http://localhost:8000/api/clean" \ - -- **PyTorch** - GPU acceleration (optional) - -- **pandas** - Data processing -F "file=@Datasets/loan_data.csv"- Python 3.8+ - - - -### Frontend```- pandas >= 2.0.0 - -- **Next.js 14** - React framework with App Router - -- **TypeScript** - Type safety- numpy >= 1.24.0 - -- **Tailwind CSS** - Styling - -- **IndexedDB** - Browser storage**Response:**- scikit-learn >= 1.3.0 - - - ----```json - - - -## 🀝 Contributing{See `requirements.txt` for complete list. - - - -Contributions are welcome! Please follow these steps: "status": "success", - - - -1. Fork the repository "dataset_info": {## Integration Examples - -2. Create a feature branch (`git checkout -b feature/AmazingFeature`) - -3. Commit your changes (`git commit -m 'Add some AmazingFeature'`) "original_rows": 1000, - -4. Push to the branch (`git push origin feature/AmazingFeature`) - -5. Open a Pull Request "original_columns": 15,### FastAPI Backend - - - ---- "cleaned_rows": 1000, - - - -## πŸ“ License "cleaned_columns": 13```python - - - -This project is licensed under the MIT License - see the LICENSE file for details. },from fastapi import FastAPI, UploadFile - - - ---- "summary": {from ai_governance import AIGovernanceAnalyzer - - - -## πŸŽ“ Citation "columns_removed": ["ssn", "email"], - - - -If you use this project in your research or work, please cite: "columns_anonymized": ["phone", "address"],app = FastAPI() - - - -```bibtex "total_cells_affected": 2847analyzer = AIGovernanceAnalyzer() - -@software{nordic_privacy_ai, - - title = {Nordic Privacy AI - GDPR Compliance & AI Governance Platform}, }, - - author = {PlatypusPus}, - - year = {2025}, "pii_detections": {@app.post("/analyze") - - url = {https://github.com/PlatypusPus/MushroomEmpire} - -} "EMAIL": 1000,async def analyze(file: UploadFile, target: str, protected: list): - -``` - - "PHONE": 987, df = pd.read_csv(file.file) - ---- - - "SSN": 1000 report = analyzer.analyze_dataframe(df, target, protected) - -## πŸ“§ Support - - }, return report - -- **Issues**: [GitHub Issues](https://github.com/PlatypusPus/MushroomEmpire/issues) - -- **Discussions**: [GitHub Discussions](https://github.com/PlatypusPus/MushroomEmpire/discussions) "gdpr_compliance": [``` - - - ---- "Article 5(1)(c) - Data minimization", - - - -## πŸ™ Acknowledgments "Article 17 - Right to erasure",### Flask Backend - - - -- Built for Nordic ecosystems (BankID, MitID, Suomi.fi) "Article 25 - Data protection by design" - -- Inspired by GDPR, CCPA, and EU AI Act requirements - -- Developed during a hackathon prototype ],```python - - - ---- "files": {from flask import Flask, request, jsonify - - - -**Made with ❀️ by the Nordic Privacy AI Team** "cleaned_csv": "/reports/cleaned_20251107_123456.csv",from ai_governance import AIGovernanceAnalyzer - - - "audit_report": "/reports/cleaning_audit_20251107_123456.json" - - }app = Flask(__name__) - -}analyzer = AIGovernanceAnalyzer() - -``` - -@app.route('/analyze', methods=['POST']) - -#### **GET /health**def analyze(): - -Health check endpoint with GPU status. file = request.files['file'] - - df = pd.read_csv(file) - -**Response:** report = analyzer.analyze_dataframe( - -```json df, - -{ request.form['target'], - - "status": "healthy", request.form.getlist('protected') - - "version": "1.0.0", ) - - "gpu_available": true return jsonify(report) - -}``` - -``` - -## License - -#### **GET /reports/{filename}** - -Download generated reports and cleaned files.MIT License - - - ----## Contributing - - - -## πŸ”§ ConfigurationContributions welcome! Please open an issue or submit a pull request. - - - -### Environment Variables## Citation - - - -Create `.env` file in `frontend/nordic-privacy-ai/`:If you use this module in your research or project, please cite: - -```env - -NEXT_PUBLIC_API_URL=http://localhost:8000``` - -```AI Governance Module - Bias Detection and Risk Analysis - -https://github.com/PlatypusPus/MushroomEmpire - -### CORS Configuration``` - - -Edit `api/main.py` to add production domains: -```python -origins = [ - "http://localhost:3000", - "https://your-production-domain.com" -] -``` - -### GPU Acceleration - -GPU is automatically detected and used if available. To force CPU mode: -```python -# In cleaning.py or api endpoints -DataCleaner(use_gpu=False) -``` - ---- - -## πŸ§ͺ Testing - -### Test the Backend -```powershell -# Test analyze endpoint -curl -X POST "http://localhost:8000/api/analyze" -F "file=@Datasets/loan_data.csv" - -# Test clean endpoint -curl -X POST "http://localhost:8000/api/clean" -F "file=@Datasets/loan_data.csv" - -# Check health -curl http://localhost:8000/health -``` - -### Run Unit Tests -```powershell -# Test cleaning module -python test_cleaning.py - -# Run all tests (if pytest configured) -pytest -``` - ---- - -## πŸ“Š Usage Examples - -### Python SDK Usage - -```python -from ai_governance import AIGovernanceAnalyzer - -# Initialize analyzer -analyzer = AIGovernanceAnalyzer() - -# Analyze dataset -report = analyzer.analyze( - data_path='Datasets/loan_data.csv', - target_column='loan_approved', - protected_attributes=['gender', 'age', 'race'] -) - -# Print results -print(f"Bias Score: {report['summary']['overall_bias_score']:.3f}") -print(f"Risk Level: {report['summary']['risk_level']}") -print(f"Model Accuracy: {report['summary']['model_accuracy']:.3f}") - -# Save report -analyzer.save_report(report, 'my_report.json') -``` - -### Data Cleaning Usage - -```python -from cleaning import DataCleaner - -# Initialize cleaner with GPU -cleaner = DataCleaner(use_gpu=True) - -# Load and clean data -df = cleaner.load_data('Datasets/loan_data.csv') -cleaned_df, audit = cleaner.anonymize_pii(df) - -# Save results -cleaner.save_cleaned_data(cleaned_df, 'cleaned_output.csv') -cleaner.save_audit_report(audit, 'audit_report.json') -``` - -### Frontend Integration - -```typescript -import { analyzeDataset, cleanDataset } from '@/lib/api'; - -// Analyze uploaded file -const handleAnalyze = async (file: File) => { - const result = await analyzeDataset(file); - console.log('Bias Score:', result.bias_metrics.overall_bias_score); - console.log('Download:', result.report_file); -}; - -// Clean uploaded file -const handleClean = async (file: File) => { - const result = await cleanDataset(file); - console.log('Cells anonymized:', result.summary.total_cells_affected); - console.log('Download cleaned:', result.files.cleaned_csv); -}; -``` - ---- - -## πŸ“ˆ Metrics Interpretation - -### Bias Score (0-1, lower is better) -- **0.0 - 0.3**: βœ… Low bias - Good fairness -- **0.3 - 0.5**: ⚠️ Moderate bias - Monitoring recommended -- **0.5 - 1.0**: ❌ High bias - Immediate action required - -### Risk Score (0-1, lower is better) -- **0.0 - 0.4**: βœ… LOW risk -- **0.4 - 0.7**: ⚠️ MEDIUM risk -- **0.7 - 1.0**: ❌ HIGH risk - -### Fairness Metrics -- **Disparate Impact**: Fair range 0.8 - 1.25 -- **Statistical Parity**: Fair threshold < 0.1 -- **Equal Opportunity**: Fair threshold < 0.1 - ---- - -## πŸ› οΈ Technology Stack - -### Backend -- **FastAPI** - Modern Python web framework -- **scikit-learn** - Machine learning -- **spaCy** - NLP for PII detection -- **PyTorch** - GPU acceleration (optional) -- **pandas** - Data processing - -### Frontend -- **Next.js 14** - React framework with App Router -- **TypeScript** - Type safety -- **Tailwind CSS** - Styling -- **IndexedDB** - Browser storage - ---- - -## 🀝 Contributing - -Contributions are welcome! Please follow these steps: - -1. Fork the repository -2. Create a feature branch (`git checkout -b feature/AmazingFeature`) -3. Commit your changes (`git commit -m 'Add some AmazingFeature'`) -4. Push to the branch (`git push origin feature/AmazingFeature`) -5. Open a Pull Request - ---- - -## πŸ“ License - -This project is licensed under the MIT License - see the LICENSE file for details. - ---- - -## πŸŽ“ Citation - -If you use this project in your research or work, please cite: - -```bibtex -@software{nordic_privacy_ai, - title = {Nordic Privacy AI - GDPR Compliance & AI Governance Platform}, - author = {PlatypusPus}, - year = {2025}, - url = {https://github.com/PlatypusPus/MushroomEmpire} -} -``` - ---- - -## πŸ“§ Support - -- **Issues**: [GitHub Issues](https://github.com/PlatypusPus/MushroomEmpire/issues) -- **Discussions**: [GitHub Discussions](https://github.com/PlatypusPus/MushroomEmpire/discussions) - ---- - -## πŸ™ Acknowledgments - -- Built for Nordic ecosystems (BankID, MitID, Suomi.fi) -- Inspired by GDPR, CCPA, and EU AI Act requirements -- Developed during a hackathon prototype - ---- - -**Made with ❀️ by the Nordic Privacy AI Team** +**Made with ❀️ by Mushroom Empire**