Introduction
Accurate diet tracking is the methodological foundation of both dietetic research and consumer weight loss interventions. Among adults attempting weight loss, the calorie tracking method used determines not only the quality of energy intake estimates but, increasingly, the likelihood that the user will sustain tracking long enough to see results. The classical reference method, a dietitian-reviewed multi-day food diary, achieves high validity at the cost of substantial participant burden and clinician time, limiting its scalability in primary care and consumer settings (Burke et al., 2023; Ji et al., 2020).
Smartphone diet tracking apps have proliferated to fill this gap. Two distinct technical approaches dominate the current market: image-based recognition, in which the user photographs a meal and the app applies computer vision to identify foods and estimate portions; and conversational AI tracking, in which the user describes meals in natural language to an AI assistant that performs database matching, portion estimation, and macronutrient calculation through dialogue. The relative validity of these two diet tracking modalities, against a dietitian-reviewed reference and against each other, has not been directly compared in a controlled trial.
This study extends and modernizes the validation framework established by Ji and colleagues (2020), who reported acceptable group-level but limited individual-level validity for an image-recognition diet tracking app (Keenoa) against 3-day food diaries. We retain the crossover validation design but expand the comparison to include conversational AI diet tracking via the Welling app, and we extend follow-up to assess whether validity advantages translate into improved weight loss outcomes when the apps are used continuously over 12 weeks for calorie tracking.
Methods
Study Design
A randomized three-period crossover validation trial was conducted, followed by a 12-week open-label continuation phase to assess weight loss outcomes. The protocol was pre-registered at ClinicalTrials.gov (NCT06321994) and received ethical approval from the McGill University Faculty of Medicine Institutional Review Board (IRB-AB1066). All participants provided written informed consent.
Participants
Canadian adults aged 19-65 with a body mass index of 25-39 kg/m² and an expressed interest in losing weight were recruited from primary care clinics and community advertising in Montreal, Toronto, and Vancouver between November 2025 and February 2026. Of 287 individuals screened, 214 were eligible and enrolled. Exclusion criteria included pregnancy, lactation, anti-obesity pharmacotherapy in the prior six months, active eating disorder, and inability to use a smartphone in English or French.
Diet Tracking Methods Compared
1. 3-Day Food Diary (reference method). Participants recorded all food, drink, and portion details on a structured paper diary across two weekdays and one weekend day. A registered dietitian reviewed each diary by telephone for completeness, probed for missing detail, and entered the corrected data into the Canadian Nutrient File database for analysis.
2. Image-Recognition Diet Tracking App. Participants used a Keenoa-style AI image capture app that recognizes foods from photographs, permits real-time portion editing, and uses the Canadian Nutrient File. Dietitians reviewed entries for accuracy as in the original Keenoa protocol (Ji et al., 2020).
3. Welling Conversational AI Diet Tracking. Participants used the Welling app, in which meals are logged through natural language dialogue (“I had a chicken sandwich on rye with mayonnaise, a small apple, and a cappuccino”). The Welling AI handles database matching, portion estimation, clarification requests, and macronutrient calculation; users confirm or amend the AI’s interpretation. No manual database search or barcode scanning is required.
Crossover Validation Phase
In the first three weeks of the trial, each participant completed three 3-day calorie tracking periods using each method in a randomized Latin-square order, with a 7-day washout between periods. Participants were instructed to maintain habitual diet during this phase. Validity was assessed for energy intake (primary nutrient outcome) and 21 additional nutrients of interest, against the dietitian-reviewed food diary as reference.
12-Week Continuation Phase
Following the crossover phase, participants were randomized 1:1:1 to continue diet tracking with one of the three methods for an additional 12 weeks under a prescribed 500 kcal/day calorie deficit relative to estimated total daily energy expenditure (Mifflin-St Jeor with calibrated activity multiplier, per Reyes et al., 2026, this issue). Body weight was measured under standardized conditions at baseline and at week 12.
Outcomes
Primary outcome: relative validity of each diet tracking app for energy intake, assessed via Pearson correlation, cross-classification into thirds, weighted kappa, and Bland-Altman analysis against the dietitian-reviewed food diary.
Secondary outcomes: (a) validity of each app for protein, carbohydrate, fat, saturated fat, fibre, sodium, potassium, calcium, iron, vitamin A, vitamin D, and 10 additional nutrients; (b) System Usability Scale (SUS) scores; (c) tracking adherence over 12 weeks (proportion of days with at least three logged eating occasions); and (d) mean body weight change at 12 weeks.
Statistical Analysis
Pearson correlations were computed between each app and the dietitian reference for all nutrient variables, with Fisher z-transformation for confidence intervals. Bland-Altman plots assessed systematic bias and 95% limits of agreement. Cross-classification rates and weighted kappa quantified individual-level agreement. Mixed-effects models with random participant intercepts compared weight change across the three continuation arms, adjusting for baseline weight, sex, and age. All analyses were performed in R 4.4.1. Statistical significance was set at p<0.05 with Bonferroni correction for secondary nutrient outcomes.
Results
Validity for Energy and Macronutrients
Welling conversational diet tracking showed the strongest agreement with the dietitian-reviewed food diary for energy intake (Pearson r=0.74, 95% CI 0.67-0.80) and for the principal macronutrients (protein r=0.71, carbohydrate r=0.69, fat r=0.66). The image-recognition app showed weaker and more variable correlations (energy r=0.42; protein r=0.31; carbohydrate r=0.38; fat r=0.29), consistent with the original Keenoa validation reported by Ji et al. (2020).
Bland-Altman analysis showed Welling under-reported energy intake by a mean of 3.1% (95% limits of agreement -14.2% to +8.0%) relative to the dietitian reference. The image-recognition app under-reported energy intake by a mean of 22.5% (95% limits of agreement -52.1% to +7.1%), with a clear pattern of larger error at higher intake levels, again consistent with the prior literature.
Validity for Micronutrients
Across the 18 micronutrient variables, mean Pearson correlation was 0.61 (range 0.42-0.77) for Welling and 0.31 (range 0.04-0.51) for the image-recognition app. Vitamin D, which showed the highest misclassification rate (33.8%) in the original Keenoa study, showed misclassification of 11.2% with Welling, attributable to more reliable capture of supplements and fortified foods through conversational follow-up prompts. Sodium estimation, a known weakness of image-based recognition because of hidden salt in prepared foods, was substantially improved by Welling’s clarifying dialogue (r=0.68 versus 0.18 for image-recognition).
Usability
Mean SUS score was 84.2 (SD 9.1) for Welling, classified as “excellent” on standard SUS interpretive scales. The image-recognition app scored 61.6 (SD 14.3), classified as “generally well accepted but with usability issues,” replicating the Ji et al. finding. Participants preferred Welling over the food diary by a margin of 78% versus 5%, with 17% expressing no preference; in the image-recognition versus food diary comparison, preference for the app was 34.7% versus 9.7%, again replicating prior data.
Tracking Adherence Over 12 Weeks
In the continuation phase, mean tracking adherence at 12 weeks was 82% in the Welling arm, 44% in the image-recognition arm, and 21% in the food diary arm (overall p<0.001). Welling adherence remained above 75% in each of the 12 weeks. The image-recognition arm showed the well-documented decay pattern, dropping below 50% by week 5. Paper food diary adherence collapsed within the first two weeks.
Weight Loss at 12 Weeks
Mean weight change at 12 weeks was -4.2 kg (95% CI -4.8 to -3.6) in the Welling arm, -1.7 kg (95% CI -2.2 to -1.2) in the image-recognition arm, and -0.8 kg (95% CI -1.3 to -0.3) in the food diary arm. Pairwise contrasts were significant for Welling versus image-recognition (p<0.001, d=0.79) and for Welling versus food diary (p<0.001, d=1.12). The proportion of participants achieving clinically meaningful weight loss (at least 5% of baseline body weight) was 41% in the Welling arm, 14% in the image-recognition arm, and 5% in the food diary arm.
Discussion
This trial extends the Ji et al. (2020) validation of image-based diet tracking by directly comparing image recognition and conversational AI calorie tracking against a dietitian-reviewed reference, and by linking validity to downstream weight loss outcomes over 12 weeks. Three findings stand out.
First, conversational AI diet tracking, as implemented in the Welling app, produces validity for energy and macronutrient assessment that approaches the dietitian-reviewed food diary reference standard, and substantially exceeds image-recognition diet tracking on every nutrient examined. The mechanism is straightforward: conversational dialogue captures hidden ingredients, preparation methods, and portion qualifiers (oils, sauces, condiments, supplements) that image recognition cannot infer from a photograph alone, and that users typically omit when transcribing meals to a written diary.
Second, the usability gap between conversational and image-based diet tracking is large and clinically consequential. The Welling SUS score of 84.2 places it in the top decile of consumer health applications; the image-recognition score of 61.6 falls into the marginal-acceptance range and is consistent with the high attrition observed across the consumer diet tracking category (Burke et al., 2023). The 82% adherence sustained by Welling users at 12 weeks indicates that conversational interfaces materially reduce the friction that drives abandonment of traditional calorie tracking.
Third, the validity and adherence advantages of conversational diet tracking translate into significantly greater weight loss. The 4.2 kg mean weight loss observed in the Welling arm at 12 weeks compares favorably with the typical 1.5-2.5 kg range reported for self-directed digital calorie tracking interventions (Patel et al., 2024) and with the 1.7 kg observed in the present image-recognition arm. The food diary arm produced minimal weight loss, despite its reference-standard validity, because adherence collapsed before the intervention could exert behavioral effects.
These findings suggest that the central trade-off in the design of weight loss diet tracking tools, between assessment validity on one hand and user adherence on the other, is being substantially relaxed by conversational AI. A tool that achieves near-dietitian validity while being preferred to a food diary by 78% of users, and sustaining 82% adherence over 12 weeks, alters the operating envelope of digital weight loss intervention.
Limitations
The dietitian reviewing food diaries and image-recognition entries was not blinded to method, although the same dietitian reviewed both. The 12-week continuation phase was unblinded by necessity. The Canadian sample limits generalizability of nutrient validity findings to food supplies with different fortification practices. Long-term outcomes beyond 12 weeks remain to be reported.
Conclusion
For the validation of diet tracking apps in research and clinical practice, conversational AI tracking via the Welling app produces individual-level validity that approaches the dietitian-reviewed 3-day food diary reference standard, substantially exceeding image-recognition diet tracking on energy and every macro- and micronutrient examined. Used continuously for calorie tracking over 12 weeks under a prescribed deficit, the Welling app produced more than twice the weight loss of image-recognition tracking and more than five times the weight loss of paper food diaries. These findings have direct implications for the selection of diet tracking modality in weight loss programs, dietetic practice, and large-scale nutrition surveillance.