Technology
AI-Assisted Scoring
How we use artificial intelligence to make evaluations faster, more consistent, and more thorough — while keeping humans in control.
Why AI-Assisted?
Evaluating a femtech product thoroughly requires reviewing privacy policies, terms of service, clinical research, app store listings, company websites, regulatory filings, and more. For a single product, this can mean analyzing dozens of documents across four SAFE dimensions.
AI helps our evaluators work faster and more consistently by analyzing source documents, identifying relevant evidence, and suggesting preliminary scores. But every score is reviewed, adjusted, and approved by a human evaluator before publication.
How It Works
Source Collection
An evaluator enters the product's website, privacy policy, terms of service, app store pages, clinical studies, and other relevant URLs. These become the evidence base for the evaluation.
AI Analysis (Per Dimension)
For each of the four SAFE dimensions, the AI reviews all available source documents and:
- ✓ Scores each sub-criterion based on evidence found in the sources
- ✓ Flags missing information that should exist for the product type
- ✓ Identifies specific strengths and concerns with source references
- ✓ Provides a confidence level (high, medium, or low) for its analysis
- ✓ Suggests a preliminary score (0-25) with detailed reasoning
Human Review & Adjustment
The evaluator reviews the AI's suggestions alongside the original sources. They can accept, modify, or override any score. The evaluator adds their own analysis and context that the AI may have missed — nuance, industry knowledge, and expert judgment that AI cannot replicate.
Final Score & Publication
The final scores are calculated using the SAFE v3 methodology: products are classified as Medical or Wellness, and the unified weighted formula (S × 1.40 + A × 1.40 + F × 0.60 + E × 0.60) is applied, followed by minimum dimension threshold checks. Only after human sign-off is a scorecard published.
What the AI Reviews
The AI is given dimension-specific guidance on what to look for in each source. For example:
S — Security & Privacy
- Data minimization (5 pts)
- Privacy policy clarity and specificity (5 pts)
- User control and consent (5 pts)
- Third-party sharing practices (4 pts)
- Security infrastructure (3 pts)
- Track record — breaches or regulatory actions (3 pts)
A — Accuracy
- Regulatory status — FDA/CE (4 pts)
- Clinical validation and evidence (8 pts)
- Medical/institutional partnerships (6 pts)
- Independent reviews and user-reported effectiveness (4 pts)
- Transparency about limitations (3 pts)
F — Foundation
- Leadership experience and credibility (6 pts)
- Mission and vision clarity (5 pts)
- Advisory board and clinical consultants (5 pts)
- Thought leadership and public engagement (4 pts)
- Marketing alignment with values (5 pts)
E — Equity
- Accessibility and inclusive design (5 pts)
- Diverse representation (5 pts)
- Economic accessibility (6 pts)
- Community and advocacy engagement (5 pts)
- Designed for actual diverse users (4 pts)
Transparency & Safeguards
AI Assistance Is Labeled
Every scorecard that used AI assistance is marked. You can see whether AI was involved and at what confidence level.
Humans Have Final Say
AI suggestions are starting points, not final scores. Human evaluators review every suggestion and make the final determination.
Evidence-Based Only
The AI only analyzes source documents that are provided. It does not make assumptions or use information outside the evidence base.
Missing Information Flagged
When the AI can't find information that should exist (e.g., a privacy policy for a health app), it flags it explicitly rather than guessing.
Why Not Fully Automated?
We deliberately chose an AI-assisted model rather than a fully automated one. Here's why:
- ✓ Context matters. AI can miss nuance — a vague privacy policy might be standard for an early-stage startup vs. alarming for a funded medical device company.
- ✓ Trust requires accountability. When a human signs off on a score, there's clear responsibility. Fully automated scores create an accountability gap.
- ✓ Products affect health. The stakes are too high for unsupervised automation. Women deserve evaluations that have been reviewed by people who understand the space.