5 Red Flags in an AI Radiology Vendor Proposal (and How to Spot Them)

Vague accuracy claims hide unvalidated modelsMissing PACS/DICOM docs signal integration riskFractify provides validated data before contract signing

Why AI Radiology Vendor Proposals Are Difficult to Evaluate

AI radiology procurement is structurally different from buying traditional medical devices. A CT scanner's specifications are standardised and verifiable. An AI radiology platform's claims—97% accuracy, real-time processing, seamless PACS integration—are asserted in prose and rarely backed by documentation at the proposal stage. A hospital that lacks a structured evaluation framework will sign a contract based on a vendor's best-case scenario rather than clinical evidence.

Fractify, deployed across multiple clinical environments, has been on both sides of this process. The following five red flags are drawn from the documentation gaps and contractual ambiguities that most commonly create post-deployment disputes. Each flag has a corresponding question that any procurement officer can ask before the proposal advances to legal review.

Expert Insight: The Proposal Is the Vendor's Best Foot Forward

Every vendor puts their best-case data in the proposal. The question is not whether the proposal is accurate—it is whether the data is verifiable. An AI radiology vendor who cannot provide a peer-reviewed citation, a hospital reference letter, or a live de-identified dataset run within 10 business days of a request is not ready for clinical deployment. Fractify's 97.9% brain MRI tumour detection and 97.7% bone fracture detection figures are documented with methodology. Demand the same from every vendor you evaluate.

Clinical AI analysis: 5 Red Flags in an AI Radiology Vendor Proposal (and How to S — Fractify diagnostic engine workflow — Fractify in practice: 5 Red Flags in an AI Radiology Vendor Proposal (and How to S — AI-assisted radiology review

Red Flag 1: Accuracy Claims Without Modality-Specific Breakdown

A proposal that states "our platform achieves 95% accuracy across all imaging modalities" contains almost no useful information. Accuracy varies enormously between modalities, between pathology types within a single modality, and between demographic subgroups. A chest X-ray model may achieve 96% accuracy for Pneumonia detection and 78% for subtle Pneumothorax—aggregating these into a single figure conceals the weakness.

What to ask: "Please provide per-modality, per-pathology sensitivity and specificity data from a clinical validation study conducted on patient data you did not train on." Any vendor who responds with a further aggregated number or cannot specify a validation dataset is signalling that their published accuracy is marketing data, not clinical data. Fractify publishes separate validated figures: 97.9% for brain MRI tumour detection, 97.7% for bone fracture detection across long bones and spine, and 18+ pathology categories detected in chest X-ray. Request this level of specificity from every vendor.

Red Flag 2: No PACS Integration Documentation Before Contract

The phrase "seamlessly integrates with your existing PACS" appears in nearly every AI radiology proposal. Seamless integration is not a technical specification. It is a marketing statement. Real PACS integration requires: DICOM node configuration documentation, supported DICOM service class users and providers, HL7/FHIR version compatibility, latency benchmarks under expected study volume, and a documented rollback procedure if integration causes PACS disruption.

What to ask: "Can you provide your technical integration specification document, including DICOM conformance statement and a reference contact at a hospital where you have completed PACS integration?" If the vendor responds that integration details are finalized post-contract, the hospital will discover during implementation that "seamless" means "requires a six-month custom integration project." Fractify provides DICOM conformance documentation and integration architecture diagrams before any procurement discussion advances to legal.

Red Flag 3: Clinical Validation on Research Datasets Only

Many AI radiology models were developed and validated on public research datasets such as NIH ChestX-ray14 or MIMIC-CXR. These datasets have known limitations: they contain retrospectively labelled images from specific hospital systems, cover limited demographic ranges, and were not acquired under the equipment protocols used in the procuring hospital. A model validated only on research data may perform substantially worse on the procuring hospital's own DICOM studies.

What to ask: "Was your clinical validation conducted on prospective data from live hospital deployments? Can you provide the institutional ethics approval and study protocol?" A vendor who has only research-dataset validation cannot guarantee performance in your specific clinical environment. Fractify's validation includes live deployment data across multiple clinical sites with results documented separately from research-phase accuracy figures. Ask whether a vendor can run your own de-identified cases through their model as a pre-contract proof of concept.

Red Flag 4: Urgency Scoring and Critical Finding Alerts Not Specified

In emergency radiology, the difference between a good AI platform and a dangerous one is not sensitivity—it is prioritisation. A system that correctly flags Tension Pneumothorax, Aortic Dissection, Intracranial Hemorrhage, and Acute Stroke but does not surface these findings immediately in the radiologist's work queue provides no clinical benefit over a system that processes studies in arrival order.

What to ask: "Does your platform include a validated urgency scoring system? What urgency categories does it use, what pathologies trigger the highest urgency tier, and what is the documented response time from AI flag to radiologist notification?" Fractify operates a 1–5 urgency scoring system where urgency-5 findings—including the four critical conditions above plus all 6 intracranial haemorrhage subtypes—generate immediate alerts with Grad-CAM heatmap overlays and bypass the standard work queue. If a vendor cannot specify their urgency classification methodology, their platform was not designed for emergency clinical deployment.

Red Flag 5: Support SLA Defined by Response Time, Not Resolution Time

A support SLA that guarantees "response within 4 hours" is nearly worthless in a clinical setting. A radiologist whose AI platform stops processing studies at 02:00 during a night shift does not need a response in 4 hours—they need resolution. The distinction between response time and resolution time is the most commonly exploited ambiguity in AI medical software contracts.

What to ask: "What is your contractual resolution time for a Severity 1 issue—defined as complete AI processing failure during clinical hours? What is the compensation mechanism if resolution time is breached? Who is the escalation contact at 02:00?" Fractify's support structure distinguishes clearly between response time and resolution time with explicit Severity 1 protocols and clinical-hours coverage defined in the service agreement before contract signing.

The Vendor Evaluation Matrix: What to Request Before Proposal Advances

Evaluation Item	What a Ready Vendor Provides	Red Flag Response
Accuracy data	Per-modality, per-pathology sensitivity/specificity with methodology	"Our platform achieves X% accuracy across modalities"
PACS integration	DICOM conformance statement + reference hospital contact	"Integration is finalized after contract signing"
Clinical validation	Live deployment data, ethics approval, study protocol	Research dataset citations only (NIH ChestX-ray14, etc.)
Urgency scoring	Defined classification tiers, pathology triggers, notification time	No urgency system or unspecified alert criteria
Support SLA	Resolution time for Sev-1, compensation clause, 24/7 escalation	Response-time-only SLA, no resolution commitment
Brain MRI accuracy	Validated figure with methodology (Fractify: 97.9%)	Aggregated or unspecified brain MRI accuracy
Fracture detection	Validated figure by bone type (Fractify: 97.7%)	Generic "skeletal imaging" accuracy claim

Beyond the Red Flags: What a Transparent Vendor Proposal Looks Like

Validated Accuracy Per Modality

Fractify publishes separate validated accuracy figures: 97.9% brain MRI tumour detection, 97.7% bone fracture detection, and 18+ pathology categories screened in chest X-ray. Each figure comes with the study methodology, not just the number. This is the documentation standard every vendor should meet.

DICOM Conformance Statement

Full DICOM conformance documentation—service class users, service class providers, supported transfer syntaxes, and RBAC integration architecture—is provided by Fractify before any contract discussion reaches legal review. If a vendor cannot supply this document, PACS integration risk is unquantified.

Live Clinical Validation Data

Fractify's clinical performance data is drawn from live hospital deployments, not retrospective research dataset analysis. The validation methodology distinguishes between training data performance and prospective clinical deployment performance—a distinction most research-phase AI systems cannot make.

Defined Urgency Classification

Fractify's 1–5 urgency scoring system with explicit pathology triggers for critical findings—Tension Pneumothorax, Aortic Dissection, Intracranial Hemorrhage including all 6 ICH subtypes, and Acute Stroke—is documented in the platform specification sheet provided at proposal stage.

Developed by Databoost Sdn Bhd, Fractify was engineered specifically for clinical environments where documentation, audit trails, and governance accountability matter as much as algorithmic performance. The platform's integration with PACS via standard DICOM, with clinical workflows via HL7/FHIR, and with compliance systems via RBAC reflects a deployment philosophy that assumes scrutiny rather than avoiding it.

According to procurement guidance published by WHO on medical device procurement, health facility procurement officers should require clinical performance evidence that matches the specific use context of the procuring institution. A vendor who cannot produce this evidence has not completed the validation process required for clinical deployment.

What accuracy data should a hospital require from an AI radiology vendor?

Require per-modality, per-pathology sensitivity and specificity data from a validation study conducted on patient data not used in training. Aggregate accuracy figures are insufficient. For reference, Fractify provides 97.9% brain MRI tumour detection and 97.7% bone fracture detection as separate validated metrics with study methodology documentation.

How do you verify that an AI radiology vendor's PACS integration claim is real?

Request the vendor's DICOM conformance statement and ask for a reference contact at a hospital where PACS integration is already live and in use. Any vendor who cannot provide both items has not completed a full clinical PACS integration. "Can be integrated" and "has been integrated" are materially different claims in procurement due diligence.

What is the difference between research dataset validation and clinical validation for AI radiology?

Research dataset validation uses retrospectively labelled public or institutional datasets (NIH ChestX-ray14, MIMIC-CXR) under controlled conditions. Clinical validation uses prospective data from live hospital deployments with defined patient populations, equipment protocols, and radiologist reference standards. Only clinical validation predicts performance in your specific environment.

Why does urgency scoring matter in an AI radiology procurement evaluation?

An AI system that detects critical findings but does not prioritise them in the radiologist's work queue provides no clinical benefit over a standard PACS. Urgency scoring—such as Fractify's 1–5 system that flags Tension Pneumothorax, Intracranial Hemorrhage, Aortic Dissection, and Acute Stroke at the highest tier—is the mechanism that converts AI detection into time-critical clinical action.

What should a hospital look for in an AI radiology support SLA?

Require resolution time commitments for Severity 1 incidents (complete processing failure during clinical hours), not just response time. The SLA should specify 24/7 escalation contacts, compensation mechanisms for resolution time breaches, and planned maintenance windows that avoid peak clinical hours. Response-time-only SLAs are insufficient for clinical environments.

Can a hospital run its own data through an AI radiology platform before signing a contract?

Yes—a pre-contract proof-of-concept run using de-identified cases from the procuring hospital is the most reliable validation method. Request that the vendor process 100–200 de-identified studies from your own PACS and provide sensitivity/specificity results against your radiologists' reports. A vendor who refuses this request has something to hide about real-world performance.

What contractual protections should a hospital include in an AI radiology agreement?

Minimum contractual protections include: accuracy performance benchmarks with breach remedies, PACS integration delivery timeline with penalties for delay, data residency and patient anonymisation obligations under HIPAA or GDPR, Severity 1 resolution time SLA with defined compensation, and a 30-day exit clause if clinical performance benchmarks are not met within the first quarter.

How many AI radiology vendors should a hospital evaluate before selecting one?

A minimum of three vendors should complete a full structured evaluation before selection. This provides a comparison baseline for accuracy data, integration documentation quality, and SLA terms. Evaluating fewer vendors creates negotiating disadvantage at contract stage and limits the hospital's ability to identify outlier claims in any single vendor's proposal.

See Fractify working on your own scans — live demo takes 15 minutes.

Request a Free Demo →

5 Red Flags in an AI Radiology Vendor Proposal (and How to Spot Them)