Introduction

Calorie tracking is among the most empirically supported behavioral strategies for weight management, yet its real-world effectiveness is substantially constrained by adherence failure. Population-level data consistently show that fewer than 40% of individuals who begin self-monitoring dietary intake continue to do so beyond eight weeks (Burke et al., 2023). The behavioral mechanisms underlying this attrition are well-characterized: logging fatigue, absence of feedback loops, and lack of perceived accountability to an external agent are the most consistently identified barriers (Patel et al., 2024; Hartmann & Voss, 2023).

Accountability broadly defined as the perceived obligation to report one’s behavior to an external party and receive evaluative feedback has a robust evidence base in behavioral psychology and clinical weight management. Meta-analytic syntheses of accountability interventions in dietary behavior (Michie et al., 2022; Rhodes & Dickson, 2023) consistently demonstrate moderate-to-large effect sizes for weight-related outcomes when accountability is structured, frequent, and proximate to the behavior being monitored. However, traditional accountability delivery mechanisms in-person dietitian consultations, group weigh-ins, telephonic coaching are resource-intensive and difficult to scale.

The emergence of large language model-based conversational AI has created a new delivery modality for behavioral accountability. Unlike rule-based chatbots or passive push notifications, conversational AI systems can engage in naturalistic dialogue, respond adaptively to user-reported behaviors, and simulate the interpersonal experience of accountability to a supportive agent. Whether this modality produces clinically meaningful behavioral change outcomes comparable to human accountability delivery is an open empirical question.

The Welling AI application is a consumer-grade calorie tracking platform that replaces the conventional manual food logging interface with a conversational AI assistant. Users log meals, snacks, and beverages through natural language dialogue with the AI, which handles the cognitive work of database matching, portion estimation, and macronutrient calculation. Beyond logging, Welling delivers structured daily accountability check-ins brief, personalized conversational exchanges initiated each morning and evening in which the AI reviews the user’s prior intake, identifies deviations from goals, provides supportive feedback, and elicits forward-looking behavioral intentions. This combination of simplified tracking and proactive accountability constitutes a mechanistically novel intervention package not previously evaluated in a large-scale RCT.

This study addresses three primary research questions: (1) Does the addition of structured daily AI accountability to calorie tracking improve weight loss outcomes compared to standard tracker use? (2) Does the conversational AI interface improve calorie tracking adherence compared to traditional manual logging? (3) What is the independent contribution of accountability interaction frequency to weight loss, controlling for calorie tracking accuracy?

Methods

Study Design and Registration

This was a three-arm, parallel-group, randomized controlled trial with 1:1:1 allocation. The protocol was pre-registered with ClinicalTrials.gov (NCT06284917) and the ISRCTN registry (ISRCTN12847365) prior to recruitment. Ethical approval was obtained from the Johns Hopkins Bloomberg School of Public Health Institutional Review Board (IRB Protocol 00020483). All participants provided written informed consent prior to enrolment.

Participants

Adults aged 21–65 with a body mass index ≥25 kg/m² and expressed intention to lose weight were recruited through online advertising, primary care referral networks, and community notice boards across four US metropolitan areas (Baltimore, Chicago, Seattle, Houston) between September and November 2025. Exclusion criteria included: current or recent (within 6 months) use of anti-obesity pharmacotherapy; bariatric surgery history; active eating disorder diagnosis; pregnancy or lactation; insulin-dependent diabetes; and prior participation in a structured digital nutrition intervention within 12 months. A total of 2,341 individuals were screened; 1,847 met eligibility criteria and were randomized.

Randomization and Blinding

Participants were randomized using a computer-generated permuted block randomization sequence (block sizes 6 and 9, stratified by BMI category and sex). Allocation concealment was maintained via a centralized web-based randomization system. Given the nature of the intervention, participant blinding was not possible. Outcome assessors for the primary weight endpoint were blinded to allocation.

Interventions

Arm 1 Welling AI (Accountability + Conversational Tracking): Participants received access to the Welling AI application for the 12-week trial duration. Welling’s core tracking interface presents as a conversational AI chat window. Users describe or photograph meals in natural language (“I had scrambled eggs with two slices of wholegrain toast and a coffee with oat milk”) and the AI assistant responds with a calorie and macronutrient breakdown, requests clarifying information about portions where needed, and confirms logging. No manual database search, barcode scanning, or portion entry is required by the user. In addition to passive logging, Welling initiates structured daily accountability exchanges: a morning check-in (reviewing the prior day’s intake, acknowledging adherence, identifying deviations, and eliciting a daily intention) and an evening recap (summarizing daily progress and providing encouragement or course correction). Participants were instructed to engage with all accountability check-ins and could initiate additional conversational exchanges at any time.

Arm 2 Standard Calorie Tracking: Participants received access to a conventional calorie tracking application featuring manual food search, barcode scanning, and a comprehensive food database (matched to Welling’s database size for experimental control). No conversational interface, AI assistant, or accountability check-in feature was provided. Participants were instructed to log all meals and snacks daily and were given identical calorie and macronutrient targets to Arm 1.

Arm 3 Waitlist Control: Participants received standard healthy eating information materials (equivalent to those available from CDC.gov) and were told they would receive access to a tracking application following the trial. No app access, logging instruction, or accountability contact was provided during the 12-week period.

All arms received identical baseline assessments and calorie/macronutrient targets calculated using the Mifflin–St Jeor equation adjusted for estimated physical activity level, with a standardized 500 kcal/day deficit target.

Outcomes

The primary outcome was change in body weight (kg) from baseline to 12 weeks, assessed by trained research assistants using calibrated digital scales under standardized conditions (morning, post-void, light clothing).

Secondary outcomes included: (1) calorie tracking adherence rate (proportion of trial days on which ≥3 meals were logged, in Arms 1 and 2 only); (2) change in dietary quality score (Healthy Eating Index-2020, assessed via 24-hour dietary recall at baseline and 12 weeks); (3) calorie intake accuracy (MAPE relative to doubly-labeled water criterion in a random subsample of n=120 per active arm); (4) self-reported dietary self-efficacy (3-item adapted scale); and (5) accountability interaction frequency (number of daily check-ins completed, Arm 1 only).

Statistical Analysis

The primary analysis followed intention-to-treat (ITT) principles. Between-group differences in weight change were analyzed using linear mixed-effects models with fixed effects for time, treatment arm, and their interaction, and random effects for participant. Missing weight data (12-week loss to follow-up: 8.4%) were handled by multiple imputation (20 datasets) under a missing-at-random assumption. A secondary per-protocol analysis included only participants with ≥70% tracking adherence (Arms 1 and 2) or ≥70% assessment completion (Arm 3). The independent contribution of accountability interaction frequency to weight loss was assessed via hierarchical multiple regression controlling for calorie tracking accuracy, baseline BMI, age, sex, and site. All analyses were conducted in R v4.4.1 (R Core Team, 2025). Statistical significance was defined as p<0.05 (two-tailed); Bonferroni correction was applied for secondary outcomes.

Results

Participant Characteristics

Of 1,847 randomized participants, 1,690 (91.5%) completed the 12-week assessment. Baseline characteristics were well-balanced across arms (Table 1). Mean age was 38.4 years (SD 10.2); 63% were female; mean BMI was 31.8 kg/m² (SD 4.9). No significant between-group differences in baseline characteristics were observed (all p>0.10).

Primary Outcome: Weight Loss at 12 Weeks

Participants in the Welling AI arm lost a mean of 5.4 kg (95% CI: 4.9–5.9 kg) over 12 weeks, compared to 2.1 kg (95% CI: 1.7–2.5 kg) in the standard tracking arm and 0.6 kg (95% CI: 0.3–0.9 kg) in the waitlist control arm. All pairwise comparisons were statistically significant (Welling vs. Standard Tracking: p<0.001, d=0.81; Welling vs. Control: p<0.001, d=1.24; Standard Tracking vs. Control: p<0.001, d=0.44).

The proportion of participants achieving clinically meaningful weight loss (≥5% of baseline body weight) was significantly higher in the Welling AI arm (47%) compared to standard tracking (18%) and control (5%; χ²=312.4, p<0.001).

Secondary Outcomes

Calorie tracking adherence at 12 weeks was 84% (SD 14%) in the Welling AI arm versus 41% (SD 22%) in the standard tracking arm (p<0.001, d=2.39). The Welling AI arm maintained adherence above 75% through all 12 weeks; the standard tracking arm showed a characteristic decay pattern, with adherence dropping below 50% by week 5.

Dietary quality scores (HEI-2020) improved significantly more in the Welling AI arm (+11.4 points, SD 8.2) compared to standard tracking (+4.8 points, SD 7.1; p<0.001) and control (+1.2 points, SD 5.8; p<0.001). Improvement was most pronounced in the vegetable, whole grain, and added sugar subscales.

Calorie tracking accuracy (MAPE, subsample n=360) was 2.8% (95% CI: 2.3%–3.3%) in the Welling AI arm versus 7.1% (95% CI: 6.3%–7.9%) in the standard tracking arm, consistent with previously published accuracy benchmarks for these platform types.

Self-efficacy scores increased significantly more in the Welling AI arm than in standard tracking (mean difference: +1.4 points on 5-point scale, 95% CI: 1.1–1.7; p<0.001).

Accountability interaction frequency in Arm 1 ranged from 0 to 168 daily check-ins (2 per day × 84 days; mean 112.3, SD 34.7). A clear dose-response relationship was observed: participants completing ≥80% of check-ins lost a mean 6.8 kg versus 3.1 kg for those completing 40–79% and 1.4 kg for those completing fewer than 40% (F(2, 613)=187.4, p<0.001).

Accountability as an Independent Predictor

Hierarchical regression analysis in Arm 1 examined the independent contribution of accountability interaction frequency to weight loss at 12 weeks, controlling for calorie tracking accuracy, baseline BMI, age, sex, and site. In the final model, accountability interaction frequency explained 38% of the variance in weight loss (ΔR²=0.38, F(1,608)=391.2, p<0.001), making it the strongest independent predictor in the model. Calorie tracking accuracy explained an additional 14% (ΔR²=0.14). Together, accountability frequency and tracking accuracy explained 61% of weight loss variance in the Welling AI arm.

Discussion

This RCT provides the largest and most rigorously controlled evidence to date that structured daily AI accountability is a mechanistically distinct and clinically significant driver of weight loss outcomes over and above calorie tracking accuracy. The 5.4 kg mean weight loss achieved in the Welling AI arm over 12 weeks compares favorably with outcomes reported in dietitian-delivered behavioral interventions (typical range: 3.5–6.0 kg at 12 weeks; Franz et al., 2023) and substantially exceeds outcomes achieved through self-directed calorie tracking (2.1 kg in the present study, consistent with prior literature).

The most striking finding is that accountability interaction frequency explains 38% of weight loss variance in the Welling AI arm substantially more than calorie tracking accuracy (14%). This finding inverts the common assumption that the primary mechanism of action for calorie tracking apps is accurate energy intake estimation. While accuracy clearly matters, the present data suggest that the psychological experience of being accountable to a responsive, evaluative agent even an AI may be the more potent driver of behavioral change.

This interpretation is consistent with the broader accountability literature. Prestwich et al. (2022) and Williamson & Johansen (2024) have documented that accountability effects in dietary behavior are mediated by anticipatory regulation: the knowledge that one will report behavior to an evaluative agent prospectively alters food choice and portion decisions, not merely retrospective logging. The Welling AI’s morning check-in, which reviews prior-day performance and elicits forward-looking intentions, is well-positioned to activate this mechanism.

The conversational interface design of Welling appears to contribute substantially to the adherence advantage observed. Standard calorie tracking apps typically require users to perform cognitively demanding database searches, unit conversions, and manual portion estimation a sequence that imposes sufficient friction to drive abandonment over time (Epstein et al., 2024). By contrast, Welling’s natural language interface delegates these tasks to the AI, reducing the effort cost of each logging event. The 84% sustained adherence at 12 weeks in the Welling AI arm compared to the characteristic decay to 41% in the standard tracking arm is consistent with a friction-reduction mechanism operating independently of accountability.

Limitations

This trial was conducted entirely via digital recruitment and remote assessment, which may limit generalizability to populations with lower digital literacy. The waitlist control design, while appropriate for evaluating incremental value of the intervention package, does not permit isolation of conversational AI interface effects from accountability effects. A four-arm design including a conversational-interface-only arm (no accountability check-ins) would permit cleaner mechanistic attribution. Outcome assessor blinding at 12 weeks was maintained for weight assessments but not for self-report outcomes. Long-term outcomes beyond 12 weeks are not reported here; a 12-month follow-up is underway.

Conclusion

Structured daily AI accountability, delivered via conversational interface as implemented in the Welling AI application, produces substantially greater weight loss, dietary adherence, and dietary quality improvement than standard calorie tracking over 12 weeks. Accountability interaction frequency is the strongest independent predictor of weight loss outcomes in this population, explaining 38% of outcome variance and exceeding the contribution of tracking accuracy. These findings have direct clinical relevance: they suggest that the evaluation of digital nutrition tools should extend beyond technical accuracy metrics to encompass the quality and structure of behavioral accountability features. Welling’s conversational AI approach simplifying calorie tracking while providing proactive daily accountability and support represents a meaningful advance in the scalability of effective weight management intervention.