mirror of
https://github.com/PlatypusPus/MushroomEmpire.git
synced 2026-02-07 22:18:59 +00:00
feat:Enhanced the Bias Analyzer
This commit is contained in:
@@ -1,365 +0,0 @@
|
|||||||
# Enhanced Bias & Fairness Analysis Guide
|
|
||||||
|
|
||||||
## Overview
|
|
||||||
|
|
||||||
The Nordic Privacy AI platform now includes a comprehensive, adaptive bias and fairness analysis system that works accurately across **all types of datasets**, including:
|
|
||||||
|
|
||||||
- Small datasets (< 100 samples)
|
|
||||||
- Imbalanced groups
|
|
||||||
- Multiple protected attributes
|
|
||||||
- Binary and multi-class targets
|
|
||||||
- High-cardinality features
|
|
||||||
- Missing data
|
|
||||||
|
|
||||||
## Key Enhancements
|
|
||||||
|
|
||||||
### 1. **Adaptive Fairness Thresholds**
|
|
||||||
|
|
||||||
The system automatically adjusts fairness thresholds based on dataset characteristics:
|
|
||||||
|
|
||||||
- **Sample Size Factor**: Relaxes thresholds for small sample sizes
|
|
||||||
- **Group Imbalance Factor**: Adjusts for unequal group sizes
|
|
||||||
- **Dynamic Thresholds**:
|
|
||||||
- Disparate Impact: 0.7-0.8 (adapts to data)
|
|
||||||
- Statistical Parity: 0.1-0.15 (adapts to data)
|
|
||||||
- Equal Opportunity: 0.1-0.15 (adapts to data)
|
|
||||||
|
|
||||||
### 2. **Comprehensive Fairness Metrics**
|
|
||||||
|
|
||||||
#### Individual Metrics (6 types analyzed):
|
|
||||||
|
|
||||||
1. **Disparate Impact Ratio** (4/5ths rule)
|
|
||||||
- Measures: min_rate / max_rate across all groups
|
|
||||||
- Fair range: 0.8 - 1.25 (or adaptive)
|
|
||||||
- Higher weight in overall score
|
|
||||||
|
|
||||||
2. **Statistical Parity Difference**
|
|
||||||
- Measures: Absolute difference in positive rates
|
|
||||||
- Fair threshold: < 0.1 (or adaptive)
|
|
||||||
- Ensures equal selection rates
|
|
||||||
|
|
||||||
3. **Equal Opportunity** (TPR equality)
|
|
||||||
- Measures: Difference in True Positive Rates
|
|
||||||
- Fair threshold: < 0.1 (or adaptive)
|
|
||||||
- Ensures equal recall across groups
|
|
||||||
|
|
||||||
4. **Equalized Odds** (TPR + FPR equality)
|
|
||||||
- Measures: Both TPR and FPR differences
|
|
||||||
- Fair threshold: < 0.1 (or adaptive)
|
|
||||||
- Most comprehensive fairness criterion
|
|
||||||
|
|
||||||
5. **Predictive Parity** (Precision equality)
|
|
||||||
- Measures: Difference in precision across groups
|
|
||||||
- Fair threshold: < 0.1
|
|
||||||
- Ensures positive predictions are equally accurate
|
|
||||||
|
|
||||||
6. **Calibration** (FNR equality)
|
|
||||||
- Measures: Difference in False Negative Rates
|
|
||||||
- Fair threshold: < 0.1
|
|
||||||
- Ensures balanced error rates
|
|
||||||
|
|
||||||
#### Group-Level Metrics (per demographic group):
|
|
||||||
|
|
||||||
- Positive Rate
|
|
||||||
- Selection Rate
|
|
||||||
- True Positive Rate (TPR/Recall/Sensitivity)
|
|
||||||
- False Positive Rate (FPR)
|
|
||||||
- True Negative Rate (TNR/Specificity)
|
|
||||||
- False Negative Rate (FNR)
|
|
||||||
- Precision (PPV)
|
|
||||||
- F1 Score
|
|
||||||
- Accuracy
|
|
||||||
- Sample Size & Distribution
|
|
||||||
|
|
||||||
### 3. **Weighted Bias Scoring**
|
|
||||||
|
|
||||||
The overall bias score (0-1, higher = more bias) is calculated using:
|
|
||||||
|
|
||||||
```python
|
|
||||||
Overall Score = Weighted Average of:
|
|
||||||
- Disparate Impact (weight: 1.5x sample_weight)
|
|
||||||
- Statistical Parity (weight: 1.0x sample_weight)
|
|
||||||
- Equal Opportunity (weight: 1.0x sample_weight)
|
|
||||||
- Equalized Odds (weight: 0.8x sample_weight)
|
|
||||||
- Predictive Parity (weight: 0.7x sample_weight)
|
|
||||||
- Calibration (weight: 0.7x sample_weight)
|
|
||||||
```
|
|
||||||
|
|
||||||
Sample weight = min(1.0, total_samples / 100)
|
|
||||||
|
|
||||||
### 4. **Intelligent Violation Detection**
|
|
||||||
|
|
||||||
Violations are categorized by severity:
|
|
||||||
|
|
||||||
- **CRITICAL**: di_value < 0.5, or deviation > 50%
|
|
||||||
- **HIGH**: di_value < 0.6, or deviation > 30%
|
|
||||||
- **MEDIUM**: di_value < 0.7, or deviation > 15%
|
|
||||||
- **LOW**: Minor deviations
|
|
||||||
|
|
||||||
Each violation includes:
|
|
||||||
- Affected groups
|
|
||||||
- Specific measurements
|
|
||||||
- Actionable recommendations
|
|
||||||
- Context-aware severity assessment
|
|
||||||
|
|
||||||
### 5. **Robust Data Handling**
|
|
||||||
|
|
||||||
#### Missing Values:
|
|
||||||
- Numerical: Filled with median
|
|
||||||
- Categorical: Filled with mode or 'Unknown'
|
|
||||||
- Comprehensive logging
|
|
||||||
|
|
||||||
#### Data Type Detection:
|
|
||||||
- Binary detection (0/1, Yes/No)
|
|
||||||
- Small discrete values (< 10 unique)
|
|
||||||
- High cardinality warnings (> 50 categories)
|
|
||||||
- Mixed type handling
|
|
||||||
|
|
||||||
#### Target Encoding:
|
|
||||||
- Automatic categorical → numeric conversion
|
|
||||||
- Binary value normalization
|
|
||||||
- Clear encoding maps printed
|
|
||||||
|
|
||||||
#### Class Imbalance:
|
|
||||||
- Stratified splitting when appropriate
|
|
||||||
- Minimum class size validation
|
|
||||||
- Balanced metrics calculation
|
|
||||||
|
|
||||||
### 6. **Enhanced Reporting**
|
|
||||||
|
|
||||||
Each analysis includes:
|
|
||||||
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"overall_bias_score": 0.954,
|
|
||||||
"fairness_metrics": {
|
|
||||||
"Gender": {
|
|
||||||
"disparate_impact": {
|
|
||||||
"value": 0.276,
|
|
||||||
"threshold": 0.8,
|
|
||||||
"fair": false,
|
|
||||||
"min_group": "Female",
|
|
||||||
"max_group": "Male",
|
|
||||||
"min_rate": 0.25,
|
|
||||||
"max_rate": 0.906
|
|
||||||
},
|
|
||||||
"statistical_parity_difference": {...},
|
|
||||||
"equal_opportunity_difference": {...},
|
|
||||||
"equalized_odds": {...},
|
|
||||||
"predictive_parity": {...},
|
|
||||||
"calibration": {...},
|
|
||||||
"attribute_fairness_score": 0.89,
|
|
||||||
"group_metrics": {
|
|
||||||
"Male": {
|
|
||||||
"positive_rate": 0.906,
|
|
||||||
"tpr": 0.95,
|
|
||||||
"fpr": 0.03,
|
|
||||||
"precision": 0.92,
|
|
||||||
"f1_score": 0.93,
|
|
||||||
"sample_size": 450
|
|
||||||
},
|
|
||||||
"Female": {...}
|
|
||||||
},
|
|
||||||
"sample_statistics": {
|
|
||||||
"total_samples": 500,
|
|
||||||
"min_group_size": 50,
|
|
||||||
"max_group_size": 450,
|
|
||||||
"imbalance_ratio": 0.11,
|
|
||||||
"num_groups": 2
|
|
||||||
}
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"fairness_violations": [
|
|
||||||
{
|
|
||||||
"attribute": "Gender",
|
|
||||||
"metric": "Disparate Impact",
|
|
||||||
"severity": "CRITICAL",
|
|
||||||
"value": 0.276,
|
|
||||||
"affected_groups": ["Female", "Male"],
|
|
||||||
"message": "...",
|
|
||||||
"recommendation": "CRITICAL: Group 'Female' has less than half the approval rate..."
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
## Usage Examples
|
|
||||||
|
|
||||||
### Basic Analysis
|
|
||||||
|
|
||||||
```python
|
|
||||||
from ai_governance import AIGovernanceAnalyzer
|
|
||||||
|
|
||||||
# Initialize
|
|
||||||
analyzer = AIGovernanceAnalyzer()
|
|
||||||
|
|
||||||
# Analyze with protected attributes
|
|
||||||
report = analyzer.analyze(
|
|
||||||
df=your_dataframe,
|
|
||||||
target_column='ApprovalStatus',
|
|
||||||
protected_attributes=['Gender', 'Age', 'Race']
|
|
||||||
)
|
|
||||||
|
|
||||||
# Check bias score
|
|
||||||
print(f"Bias Score: {report['bias_analysis']['overall_bias_score']:.1%}")
|
|
||||||
|
|
||||||
# Review violations
|
|
||||||
for violation in report['bias_analysis']['fairness_violations']:
|
|
||||||
print(f"{violation['severity']}: {violation['message']}")
|
|
||||||
```
|
|
||||||
|
|
||||||
### With Presidio (Enhanced PII Detection)
|
|
||||||
|
|
||||||
```python
|
|
||||||
# Enable Presidio for automatic demographic detection
|
|
||||||
analyzer = AIGovernanceAnalyzer(use_presidio=True)
|
|
||||||
```
|
|
||||||
|
|
||||||
### API Usage
|
|
||||||
|
|
||||||
```bash
|
|
||||||
curl -X POST http://localhost:8000/api/analyze \
|
|
||||||
-F "file=@dataset.csv" \
|
|
||||||
-F "target_column=Outcome" \
|
|
||||||
-F "protected_attributes=Gender,Age"
|
|
||||||
```
|
|
||||||
|
|
||||||
## Interpreting Results
|
|
||||||
|
|
||||||
### Overall Bias Score
|
|
||||||
|
|
||||||
- **< 0.3**: Low bias - Excellent fairness ✅
|
|
||||||
- **0.3 - 0.5**: Moderate bias - Monitor recommended ⚠️
|
|
||||||
- **> 0.5**: High bias - Action required ❌
|
|
||||||
|
|
||||||
### Disparate Impact
|
|
||||||
|
|
||||||
- **0.8 - 1.25**: Fair (4/5ths rule satisfied)
|
|
||||||
- **< 0.8**: Disadvantaged group exists
|
|
||||||
- **> 1.25**: Advantaged group exists
|
|
||||||
|
|
||||||
### Statistical Parity
|
|
||||||
|
|
||||||
- **< 0.1**: Fair (similar positive rates)
|
|
||||||
- **> 0.1**: Groups receive different treatment
|
|
||||||
|
|
||||||
### Recommendations by Severity
|
|
||||||
|
|
||||||
#### CRITICAL
|
|
||||||
- **DO NOT DEPLOY** without remediation
|
|
||||||
- Investigate systemic bias sources
|
|
||||||
- Review training data representation
|
|
||||||
- Implement fairness constraints
|
|
||||||
- Consider re-collection if necessary
|
|
||||||
|
|
||||||
#### HIGH
|
|
||||||
- Address before deployment
|
|
||||||
- Use fairness-aware training methods
|
|
||||||
- Implement threshold optimization
|
|
||||||
- Regular monitoring required
|
|
||||||
|
|
||||||
#### MEDIUM
|
|
||||||
- Monitor closely
|
|
||||||
- Consider mitigation strategies
|
|
||||||
- Regular fairness audits
|
|
||||||
- Document findings
|
|
||||||
|
|
||||||
#### LOW
|
|
||||||
- Continue monitoring
|
|
||||||
- Maintain fairness standards
|
|
||||||
- Periodic reviews
|
|
||||||
|
|
||||||
## Best Practices
|
|
||||||
|
|
||||||
### 1. Data Collection
|
|
||||||
- Ensure representative sampling
|
|
||||||
- Balance protected groups when possible
|
|
||||||
- Document data sources
|
|
||||||
- Check for historical bias
|
|
||||||
|
|
||||||
### 2. Feature Engineering
|
|
||||||
- Avoid proxy features for protected attributes
|
|
||||||
- Check feature correlations with demographics
|
|
||||||
- Use feature importance analysis
|
|
||||||
- Consider fairness-aware feature selection
|
|
||||||
|
|
||||||
### 3. Model Training
|
|
||||||
- Use fairness-aware algorithms
|
|
||||||
- Implement fairness constraints
|
|
||||||
- Try multiple fairness definitions
|
|
||||||
- Cross-validate with fairness metrics
|
|
||||||
|
|
||||||
### 4. Post-Processing
|
|
||||||
- Threshold optimization per group
|
|
||||||
- Calibration techniques
|
|
||||||
- Reject option classification
|
|
||||||
- Regular bias audits
|
|
||||||
|
|
||||||
### 5. Monitoring
|
|
||||||
- Track fairness metrics over time
|
|
||||||
- Monitor for fairness drift
|
|
||||||
- Regular re-evaluation
|
|
||||||
- Document all findings
|
|
||||||
|
|
||||||
## Technical Details
|
|
||||||
|
|
||||||
### Dependencies
|
|
||||||
|
|
||||||
```
|
|
||||||
numpy>=1.21.0
|
|
||||||
pandas>=1.3.0
|
|
||||||
scikit-learn>=1.0.0
|
|
||||||
presidio-analyzer>=2.2.0 # Optional
|
|
||||||
spacy>=3.0.0 # Optional for Presidio
|
|
||||||
```
|
|
||||||
|
|
||||||
### Performance
|
|
||||||
|
|
||||||
- Handles datasets from 50 to 1M+ rows
|
|
||||||
- Adaptive algorithms scale with data size
|
|
||||||
- Memory-efficient group comparisons
|
|
||||||
- Parallel metric calculations
|
|
||||||
|
|
||||||
### Limitations
|
|
||||||
|
|
||||||
- Requires at least 2 groups per protected attribute
|
|
||||||
- Minimum 10 samples per group recommended
|
|
||||||
- Binary classification focus (multi-class supported)
|
|
||||||
- Assumes independent test set
|
|
||||||
|
|
||||||
## Troubleshooting
|
|
||||||
|
|
||||||
### "Insufficient valid groups"
|
|
||||||
- Check protected attribute has at least 2 non-null groups
|
|
||||||
- Ensure groups appear in test set
|
|
||||||
- Increase test_size parameter
|
|
||||||
|
|
||||||
### "High cardinality warning"
|
|
||||||
- Feature has > 50 unique values
|
|
||||||
- Consider grouping categories
|
|
||||||
- May need feature engineering
|
|
||||||
|
|
||||||
### "Sample size too small"
|
|
||||||
- System adapts automatically
|
|
||||||
- Results may be less reliable
|
|
||||||
- Consider collecting more data
|
|
||||||
|
|
||||||
### "Presidio initialization failed"
|
|
||||||
- Install: `pip install presidio-analyzer spacy`
|
|
||||||
- Download model: `python -m spacy download en_core_web_sm`
|
|
||||||
- Or use `use_presidio=False`
|
|
||||||
|
|
||||||
## References
|
|
||||||
|
|
||||||
- [Fairness Definitions Explained](https://fairware.cs.umass.edu/papers/Verma.pdf)
|
|
||||||
- [4/5ths Rule (EEOC)](https://www.eeoc.gov/laws/guidance/questions-and-answers-clarify-and-provide-common-interpretation-uniform-guidelines)
|
|
||||||
- [Equalized Odds](https://arxiv.org/abs/1610.02413)
|
|
||||||
- [Fairness Through Awareness](https://arxiv.org/abs/1104.3913)
|
|
||||||
|
|
||||||
## Support
|
|
||||||
|
|
||||||
For issues or questions:
|
|
||||||
- Check logs for detailed diagnostic messages
|
|
||||||
- Review sample statistics in output
|
|
||||||
- Consult violation recommendations
|
|
||||||
- Contact: support@nordicprivacyai.com
|
|
||||||
@@ -98,13 +98,14 @@ class AIGovernanceAnalyzer:
|
|||||||
)
|
)
|
||||||
bias_results = self.bias_analyzer.analyze()
|
bias_results = self.bias_analyzer.analyze()
|
||||||
|
|
||||||
# Step 4: Assess risks
|
# Step 4: Assess risks with Presidio-enhanced detection
|
||||||
self.risk_analyzer = RiskAnalyzer(
|
self.risk_analyzer = RiskAnalyzer(
|
||||||
self.processor.df,
|
self.processor.df,
|
||||||
self.trainer.results,
|
self.trainer.results,
|
||||||
bias_results,
|
bias_results,
|
||||||
self.processor.protected_attributes,
|
self.processor.protected_attributes,
|
||||||
self.processor.target_column
|
self.processor.target_column,
|
||||||
|
use_presidio=False # Set to True after installing: python -m spacy download en_core_web_sm
|
||||||
)
|
)
|
||||||
risk_results = self.risk_analyzer.analyze()
|
risk_results = self.risk_analyzer.analyze()
|
||||||
|
|
||||||
|
|||||||
@@ -49,26 +49,43 @@ class BiasAnalyzer:
|
|||||||
try:
|
try:
|
||||||
print("⏳ Initializing Presidio analyzer (first time only)...")
|
print("⏳ Initializing Presidio analyzer (first time only)...")
|
||||||
|
|
||||||
# Check if spaCy model is available
|
# Check if spaCy and model are available
|
||||||
try:
|
try:
|
||||||
import spacy
|
import spacy
|
||||||
try:
|
|
||||||
spacy.load("en_core_web_sm")
|
# Check if model exists WITHOUT loading it first
|
||||||
except OSError:
|
model_name = "en_core_web_sm"
|
||||||
print("⚠️ spaCy model 'en_core_web_sm' not found. Run: python -m spacy download en_core_web_sm")
|
if not spacy.util.is_package(model_name):
|
||||||
|
print(f"⚠️ spaCy model '{model_name}' not found.")
|
||||||
|
print(f" To enable Presidio, install the model with:")
|
||||||
|
print(f" python -m spacy download {model_name}")
|
||||||
|
print(" Continuing without Presidio-enhanced detection...")
|
||||||
BiasAnalyzer._presidio_init_failed = True
|
BiasAnalyzer._presidio_init_failed = True
|
||||||
return
|
return
|
||||||
|
|
||||||
|
# Model exists, now load it
|
||||||
|
print(f"✓ spaCy model '{model_name}' found, loading...")
|
||||||
|
nlp = spacy.load(model_name)
|
||||||
|
|
||||||
except ImportError:
|
except ImportError:
|
||||||
print("⚠️ spaCy not installed. Install with: pip install spacy")
|
print("⚠️ spaCy not installed. Install with: pip install spacy")
|
||||||
BiasAnalyzer._presidio_init_failed = True
|
BiasAnalyzer._presidio_init_failed = True
|
||||||
return
|
return
|
||||||
|
except Exception as e:
|
||||||
|
print(f"⚠️ Error loading spaCy model: {e}")
|
||||||
|
print(" Continuing without Presidio-enhanced detection...")
|
||||||
|
BiasAnalyzer._presidio_init_failed = True
|
||||||
|
return
|
||||||
|
|
||||||
# Create NLP engine
|
# Create NLP engine configuration (prevent auto-download)
|
||||||
provider = NlpEngineProvider()
|
from presidio_analyzer.nlp_engine import NlpEngineProvider
|
||||||
nlp_configuration = {
|
|
||||||
|
configuration = {
|
||||||
"nlp_engine_name": "spacy",
|
"nlp_engine_name": "spacy",
|
||||||
"models": [{"lang_code": "en", "model_name": "en_core_web_sm"}]
|
"models": [{"lang_code": "en", "model_name": model_name}],
|
||||||
}
|
}
|
||||||
|
|
||||||
|
provider = NlpEngineProvider(nlp_configuration=configuration)
|
||||||
nlp_engine = provider.create_engine()
|
nlp_engine = provider.create_engine()
|
||||||
|
|
||||||
# Initialize analyzer
|
# Initialize analyzer
|
||||||
|
|||||||
File diff suppressed because it is too large
Load Diff
@@ -715,28 +715,305 @@ export function CenterPanel({ tab, onAnalyze }: CenterPanelProps) {
|
|||||||
);
|
);
|
||||||
case "risk-analysis":
|
case "risk-analysis":
|
||||||
return (
|
return (
|
||||||
<div className="space-y-4">
|
<div className="space-y-6">
|
||||||
<h2 className="text-xl font-semibold">Risk Analysis</h2>
|
<h2 className="text-2xl font-bold text-slate-800">🔒 Risk Analysis</h2>
|
||||||
{analyzeResult ? (
|
{analyzeResult ? (
|
||||||
<div className="space-y-4">
|
<div className="space-y-6">
|
||||||
<div className="p-4 bg-white rounded-lg border">
|
{/* Overall Risk Score Card */}
|
||||||
<div className="text-sm text-slate-600">Overall Risk Score</div>
|
<div className="relative overflow-hidden rounded-xl border-2 border-slate-200 bg-gradient-to-br from-slate-50 via-white to-slate-50 p-6 shadow-lg">
|
||||||
<div className="text-2xl font-bold">{(analyzeResult.risk_assessment.overall_risk_score * 100).toFixed(1)}%</div>
|
<div className="absolute top-0 right-0 w-32 h-32 bg-gradient-to-br from-red-500/5 to-orange-500/5 rounded-full blur-3xl"></div>
|
||||||
|
<div className="relative">
|
||||||
|
<div className="flex items-center justify-between mb-4">
|
||||||
|
<div>
|
||||||
|
<div className="text-sm font-medium text-slate-600 mb-1">Overall Risk Score</div>
|
||||||
|
<div className="text-5xl font-bold bg-gradient-to-r from-red-600 to-orange-600 bg-clip-text text-transparent">
|
||||||
|
{(analyzeResult.risk_assessment.overall_risk_score * 100).toFixed(1)}%
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<div className={`px-4 py-2 rounded-full text-sm font-bold ${
|
||||||
|
analyzeResult.risk_assessment.risk_level === 'CRITICAL' ? 'bg-red-100 text-red-700' :
|
||||||
|
analyzeResult.risk_assessment.risk_level === 'HIGH' ? 'bg-orange-100 text-orange-700' :
|
||||||
|
analyzeResult.risk_assessment.risk_level === 'MEDIUM' ? 'bg-yellow-100 text-yellow-700' :
|
||||||
|
'bg-green-100 text-green-700'
|
||||||
|
}`}>
|
||||||
|
{analyzeResult.risk_assessment.risk_level} RISK
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
{analyzeResult.risk_assessment.presidio_enabled && (
|
||||||
|
<div className="inline-flex items-center gap-2 px-3 py-1.5 bg-blue-50 border border-blue-200 rounded-lg text-xs font-medium text-blue-700">
|
||||||
|
<span className="w-2 h-2 bg-blue-500 rounded-full animate-pulse"></span>
|
||||||
|
Presidio-Enhanced Detection
|
||||||
|
</div>
|
||||||
|
)}
|
||||||
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
{cleanResult && (
|
{/* Risk Categories Grid */}
|
||||||
<div className="p-4 bg-white rounded-lg border">
|
<div className="grid grid-cols-2 md:grid-cols-3 gap-4">
|
||||||
<h3 className="font-semibold mb-2">PII Detection Results</h3>
|
{Object.entries(analyzeResult.risk_assessment.risk_categories || {}).map(([category, score]: [string, any]) => {
|
||||||
<div className="text-sm space-y-1">
|
const riskPct = (score * 100);
|
||||||
<div>Cells Anonymized: <span className="font-medium">{cleanResult.summary.total_cells_affected}</span></div>
|
const riskLevel = riskPct >= 70 ? 'CRITICAL' : riskPct >= 50 ? 'HIGH' : riskPct >= 30 ? 'MEDIUM' : 'LOW';
|
||||||
<div>Columns Removed: <span className="font-medium">{cleanResult.summary.columns_removed.length}</span></div>
|
const categoryIcons: Record<string, string> = {
|
||||||
<div>Columns Anonymized: <span className="font-medium">{cleanResult.summary.columns_anonymized.length}</span></div>
|
privacy: '🔒',
|
||||||
|
ethical: '⚖️',
|
||||||
|
compliance: '📋',
|
||||||
|
security: '🛡️',
|
||||||
|
operational: '⚙️',
|
||||||
|
data_quality: '📊'
|
||||||
|
};
|
||||||
|
|
||||||
|
return (
|
||||||
|
<div key={category} className={`relative overflow-hidden rounded-lg border-2 p-4 transition-all hover:shadow-md ${
|
||||||
|
riskLevel === 'CRITICAL' ? 'border-red-200 bg-gradient-to-br from-red-50 to-white' :
|
||||||
|
riskLevel === 'HIGH' ? 'border-orange-200 bg-gradient-to-br from-orange-50 to-white' :
|
||||||
|
riskLevel === 'MEDIUM' ? 'border-yellow-200 bg-gradient-to-br from-yellow-50 to-white' :
|
||||||
|
'border-green-200 bg-gradient-to-br from-green-50 to-white'
|
||||||
|
}`}>
|
||||||
|
<div className="flex items-start justify-between mb-2">
|
||||||
|
<div className="text-2xl">{categoryIcons[category] || '📌'}</div>
|
||||||
|
<span className={`text-xs font-bold px-2 py-1 rounded ${
|
||||||
|
riskLevel === 'CRITICAL' ? 'bg-red-100 text-red-700' :
|
||||||
|
riskLevel === 'HIGH' ? 'bg-orange-100 text-orange-700' :
|
||||||
|
riskLevel === 'MEDIUM' ? 'bg-yellow-100 text-yellow-700' :
|
||||||
|
'bg-green-100 text-green-700'
|
||||||
|
}`}>
|
||||||
|
{riskLevel}
|
||||||
|
</span>
|
||||||
|
</div>
|
||||||
|
<div className="text-sm font-semibold text-slate-700 capitalize mb-1">
|
||||||
|
{category.replace('_', ' ')}
|
||||||
|
</div>
|
||||||
|
<div className="text-2xl font-bold text-slate-800">
|
||||||
|
{riskPct.toFixed(0)}%
|
||||||
|
</div>
|
||||||
|
<div className="mt-2 h-1.5 bg-slate-200 rounded-full overflow-hidden">
|
||||||
|
<div
|
||||||
|
className={`h-full rounded-full transition-all ${
|
||||||
|
riskLevel === 'CRITICAL' ? 'bg-gradient-to-r from-red-500 to-red-600' :
|
||||||
|
riskLevel === 'HIGH' ? 'bg-gradient-to-r from-orange-500 to-orange-600' :
|
||||||
|
riskLevel === 'MEDIUM' ? 'bg-gradient-to-r from-yellow-500 to-yellow-600' :
|
||||||
|
'bg-gradient-to-r from-green-500 to-green-600'
|
||||||
|
}`}
|
||||||
|
style={{ width: `${Math.min(riskPct, 100)}%` }}
|
||||||
|
></div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
);
|
||||||
|
})}
|
||||||
|
</div>
|
||||||
|
|
||||||
|
{/* Privacy Risks - PII Detection */}
|
||||||
|
{analyzeResult.risk_assessment.privacy_risks && (
|
||||||
|
<div className="bg-white rounded-xl border-2 border-slate-200 p-6 shadow-sm">
|
||||||
|
<div className="flex items-center gap-2 mb-4">
|
||||||
|
<span className="text-2xl">🔒</span>
|
||||||
|
<h3 className="text-lg font-bold text-slate-800">Privacy Risks</h3>
|
||||||
|
<span className="ml-auto px-3 py-1 bg-slate-100 text-slate-700 rounded-full text-xs font-semibold">
|
||||||
|
{analyzeResult.risk_assessment.privacy_risks.pii_count} PII Types Detected
|
||||||
|
</span>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
{/* PII Detections */}
|
||||||
|
{analyzeResult.risk_assessment.privacy_risks.pii_detected &&
|
||||||
|
analyzeResult.risk_assessment.privacy_risks.pii_detected.length > 0 ? (
|
||||||
|
<div className="space-y-3">
|
||||||
|
<div className="grid grid-cols-1 md:grid-cols-2 gap-3">
|
||||||
|
{analyzeResult.risk_assessment.privacy_risks.pii_detected.slice(0, 6).map((pii: any, idx: number) => (
|
||||||
|
<div key={idx} className={`p-3 rounded-lg border-2 ${
|
||||||
|
pii.severity === 'CRITICAL' ? 'bg-red-50 border-red-200' :
|
||||||
|
pii.severity === 'HIGH' ? 'bg-orange-50 border-orange-200' :
|
||||||
|
pii.severity === 'MEDIUM' ? 'bg-yellow-50 border-yellow-200' :
|
||||||
|
'bg-blue-50 border-blue-200'
|
||||||
|
}`}>
|
||||||
|
<div className="flex items-center justify-between mb-1">
|
||||||
|
<span className="text-xs font-bold text-slate-600">
|
||||||
|
{pii.column}
|
||||||
|
</span>
|
||||||
|
<span className={`text-xs font-bold px-2 py-0.5 rounded ${
|
||||||
|
pii.severity === 'CRITICAL' ? 'bg-red-100 text-red-700' :
|
||||||
|
pii.severity === 'HIGH' ? 'bg-orange-100 text-orange-700' :
|
||||||
|
pii.severity === 'MEDIUM' ? 'bg-yellow-100 text-yellow-700' :
|
||||||
|
'bg-blue-100 text-blue-700'
|
||||||
|
}`}>
|
||||||
|
{pii.severity}
|
||||||
|
</span>
|
||||||
|
</div>
|
||||||
|
<div className="text-sm font-semibold text-slate-800">
|
||||||
|
{pii.type}
|
||||||
|
</div>
|
||||||
|
<div className="text-xs text-slate-600 mt-1">
|
||||||
|
Detected via: {pii.detection_method}
|
||||||
|
{pii.confidence && ` (${(pii.confidence * 100).toFixed(0)}% confidence)`}
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
))}
|
||||||
|
</div>
|
||||||
|
|
||||||
|
{/* Privacy Metrics */}
|
||||||
|
<div className="grid grid-cols-2 md:grid-cols-4 gap-3 pt-3 border-t border-slate-200">
|
||||||
|
<div className="text-center p-3 bg-slate-50 rounded-lg">
|
||||||
|
<div className="text-xs text-slate-600 mb-1">Re-ID Risk</div>
|
||||||
|
<div className="text-lg font-bold text-slate-800">
|
||||||
|
{(analyzeResult.risk_assessment.privacy_risks.reidentification_risk * 100).toFixed(0)}%
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<div className="text-center p-3 bg-slate-50 rounded-lg">
|
||||||
|
<div className="text-xs text-slate-600 mb-1">Data Minimization</div>
|
||||||
|
<div className="text-lg font-bold text-slate-800">
|
||||||
|
{(analyzeResult.risk_assessment.privacy_risks.data_minimization_score * 100).toFixed(0)}%
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<div className="text-center p-3 bg-slate-50 rounded-lg">
|
||||||
|
<div className="text-xs text-slate-600 mb-1">Anonymization</div>
|
||||||
|
<div className="text-sm font-bold text-slate-800">
|
||||||
|
{analyzeResult.risk_assessment.privacy_risks.anonymization_level}
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<div className="text-center p-3 bg-slate-50 rounded-lg">
|
||||||
|
<div className="text-xs text-slate-600 mb-1">Detection</div>
|
||||||
|
<div className="text-sm font-bold text-slate-800">
|
||||||
|
{analyzeResult.risk_assessment.privacy_risks.detection_method}
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
) : (
|
||||||
|
<div className="text-sm text-slate-600 bg-green-50 border border-green-200 rounded-lg p-3">
|
||||||
|
✓ No PII detected in the dataset
|
||||||
|
</div>
|
||||||
|
)}
|
||||||
|
</div>
|
||||||
|
)}
|
||||||
|
|
||||||
|
{/* Violations */}
|
||||||
|
{analyzeResult.risk_assessment.violations &&
|
||||||
|
analyzeResult.risk_assessment.violations.length > 0 && (
|
||||||
|
<div className="bg-white rounded-xl border-2 border-slate-200 p-6 shadow-sm">
|
||||||
|
<div className="flex items-center gap-2 mb-4">
|
||||||
|
<span className="text-2xl">⚠️</span>
|
||||||
|
<h3 className="text-lg font-bold text-slate-800">Risk Violations</h3>
|
||||||
|
<span className="ml-auto px-3 py-1 bg-red-100 text-red-700 rounded-full text-xs font-semibold">
|
||||||
|
{analyzeResult.risk_assessment.violations.length} Issues
|
||||||
|
</span>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div className="space-y-3">
|
||||||
|
{analyzeResult.risk_assessment.violations.map((violation: any, idx: number) => (
|
||||||
|
<div key={idx} className={`p-4 rounded-lg border-2 ${
|
||||||
|
violation.severity === 'CRITICAL' ? 'bg-red-50 border-red-200' :
|
||||||
|
violation.severity === 'HIGH' ? 'bg-orange-50 border-orange-200' :
|
||||||
|
violation.severity === 'MEDIUM' ? 'bg-yellow-50 border-yellow-200' :
|
||||||
|
'bg-blue-50 border-blue-200'
|
||||||
|
}`}>
|
||||||
|
<div className="flex items-start justify-between gap-3">
|
||||||
|
<div className="flex-1">
|
||||||
|
<div className="flex items-center gap-2 mb-1">
|
||||||
|
<span className={`text-xs font-bold px-2 py-1 rounded ${
|
||||||
|
violation.severity === 'CRITICAL' ? 'bg-red-100 text-red-700' :
|
||||||
|
violation.severity === 'HIGH' ? 'bg-orange-100 text-orange-700' :
|
||||||
|
violation.severity === 'MEDIUM' ? 'bg-yellow-100 text-yellow-700' :
|
||||||
|
'bg-blue-100 text-blue-700'
|
||||||
|
}`}>
|
||||||
|
{violation.severity}
|
||||||
|
</span>
|
||||||
|
<span className="text-xs font-semibold text-slate-600 uppercase">
|
||||||
|
{violation.category}
|
||||||
|
</span>
|
||||||
|
</div>
|
||||||
|
<div className="text-sm font-semibold text-slate-800 mb-1">
|
||||||
|
{violation.message}
|
||||||
|
</div>
|
||||||
|
{violation.details && (
|
||||||
|
<div className="text-xs text-slate-600">
|
||||||
|
{violation.details}
|
||||||
|
</div>
|
||||||
|
)}
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
))}
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
)}
|
||||||
|
|
||||||
|
{/* Key Insights */}
|
||||||
|
{analyzeResult.risk_assessment.insights &&
|
||||||
|
analyzeResult.risk_assessment.insights.length > 0 && (
|
||||||
|
<div className="bg-gradient-to-br from-blue-50 to-indigo-50 rounded-xl border-2 border-blue-200 p-6 shadow-sm">
|
||||||
|
<div className="flex items-center gap-2 mb-4">
|
||||||
|
<span className="text-2xl">💡</span>
|
||||||
|
<h3 className="text-lg font-bold text-slate-800">Key Insights</h3>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div className="space-y-2">
|
||||||
|
{analyzeResult.risk_assessment.insights.map((insight: string, idx: number) => (
|
||||||
|
<div key={idx} className="flex items-start gap-2 text-sm text-slate-700">
|
||||||
|
<span className="text-blue-600 mt-0.5">•</span>
|
||||||
|
<span>{insight}</span>
|
||||||
|
</div>
|
||||||
|
))}
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
)}
|
||||||
|
|
||||||
|
{/* Compliance Status */}
|
||||||
|
{analyzeResult.risk_assessment.compliance_risks && (
|
||||||
|
<div className="bg-white rounded-xl border-2 border-slate-200 p-6 shadow-sm">
|
||||||
|
<div className="flex items-center gap-2 mb-4">
|
||||||
|
<span className="text-2xl">📋</span>
|
||||||
|
<h3 className="text-lg font-bold text-slate-800">Compliance Status</h3>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div className="grid grid-cols-1 md:grid-cols-2 gap-4">
|
||||||
|
{Object.entries(analyzeResult.risk_assessment.compliance_risks)
|
||||||
|
.filter(([key]) => ['gdpr', 'ccpa', 'hipaa', 'ecoa'].includes(key))
|
||||||
|
.map(([regulation, data]: [string, any]) => {
|
||||||
|
if (!data || typeof data !== 'object') return null;
|
||||||
|
|
||||||
|
return (
|
||||||
|
<div key={regulation} className={`p-4 rounded-lg border-2 ${
|
||||||
|
data.status === 'COMPLIANT' ? 'bg-green-50 border-green-200' :
|
||||||
|
data.status === 'PARTIAL' ? 'bg-yellow-50 border-yellow-200' :
|
||||||
|
data.status === 'NOT_APPLICABLE' ? 'bg-slate-50 border-slate-200' :
|
||||||
|
'bg-red-50 border-red-200'
|
||||||
|
}`}>
|
||||||
|
<div className="flex items-center justify-between mb-2">
|
||||||
|
<span className="text-sm font-bold text-slate-800 uppercase">
|
||||||
|
{regulation}
|
||||||
|
</span>
|
||||||
|
<span className={`text-xs font-bold px-2 py-1 rounded ${
|
||||||
|
data.status === 'COMPLIANT' ? 'bg-green-100 text-green-700' :
|
||||||
|
data.status === 'PARTIAL' ? 'bg-yellow-100 text-yellow-700' :
|
||||||
|
data.status === 'NOT_APPLICABLE' ? 'bg-slate-100 text-slate-700' :
|
||||||
|
'bg-red-100 text-red-700'
|
||||||
|
}`}>
|
||||||
|
{data.status}
|
||||||
|
</span>
|
||||||
|
</div>
|
||||||
|
{data.score !== undefined && (
|
||||||
|
<div className="text-xs text-slate-600 mb-2">
|
||||||
|
Compliance Score: {(data.score * 100).toFixed(0)}%
|
||||||
|
</div>
|
||||||
|
)}
|
||||||
|
{data.applicable === false && (
|
||||||
|
<div className="text-xs text-slate-600">
|
||||||
|
Not applicable to this dataset
|
||||||
|
</div>
|
||||||
|
)}
|
||||||
|
</div>
|
||||||
|
);
|
||||||
|
})}
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
)}
|
)}
|
||||||
</div>
|
</div>
|
||||||
) : (
|
) : (
|
||||||
<p className="text-sm text-slate-600">Upload and analyze a dataset to see risk assessment.</p>
|
<div className="text-center py-12 bg-slate-50 rounded-xl border-2 border-dashed border-slate-300">
|
||||||
|
<span className="text-4xl mb-3 block">🔒</span>
|
||||||
|
<p className="text-slate-600 mb-2">No risk analysis results yet</p>
|
||||||
|
<p className="text-sm text-slate-500">Upload a dataset and click "Analyze" to see comprehensive risk assessment</p>
|
||||||
|
</div>
|
||||||
)}
|
)}
|
||||||
</div>
|
</div>
|
||||||
);
|
);
|
||||||
|
|||||||
Reference in New Issue
Block a user