Evidence Grading Framework

Microbiome science moves quickly, but not every finding is equally trustworthy. This page describes how we grade evidence and why uncertainty is common in this field.

Evidence hierarchy

Strong — replicated and clinical

Findings supported by multiple independent studies, including randomized or well-controlled human trials where relevant. Effect sizes are consistent enough that expert reviews cite them with confidence. Example: established roles of specific microbes in defined biochemical pathways under controlled conditions.

Moderate — associative

Repeated observational links in human cohorts, or strong animal data with some human corroboration. Direction of association is fairly consistent, but causation is not established. Example: correlations between certain taxa and metabolic markers across population studies.

Weak — exploratory

Single studies, small samples, post-hoc analyses, or novel claims awaiting replication. May include promising mechanistic work that has not been tested in diverse human populations. Treat as hypothesis-generating, not actionable.

Why microbiome research is uncertain

Variability between studies

Different cohorts, diets, geographies, and health statuses
Inconsistent sample collection, storage, and DNA extraction methods
Changing sequencing platforms and bioinformatics pipelines
Publication bias toward positive associations

Compositional data issues

Microbiome data are compositional: if one taxon increases as a fraction of the total, others must decrease. Standard statistics that ignore this structure can produce misleading associations. Labs also differ in detection limits, so “zero” or “low” often means below detection, not truly absent.

Claim grading system

We use a simple A–D scale for statements on this site. Grades describe evidence quality, not importance to your health.

Grade	Meaning	Typical source
A	Strong evidence	Replicated human or mechanistic studies; clinical support where applicable
B	Moderate evidence	Consistent associations or robust models with limited clinical translation
C	Limited evidence	Early trials, conflicting cohorts, or indirect inference
D	Insufficient for claims	Speculation, single reports, or not valid for individual test interpretation

How to use these grades

Prefer acting on clinical advice and replicated outcomes over report flags
When a species page shows grade B or C, treat it as context—not instruction
Grade D items are included to flag common over-interpretations
As new studies appear, grades may be updated; uncertainty is expected

Principle: Clear grading builds trust. We label evidence explicitly so readers can separate what is known, what is plausible, and what is not yet supported.