Detecting Oscillating Submission Lags in HBO Therapy Claims: Two Variance‑Based Fraud Detection Metrics for Medicare Program Integrity
Author's Note – The Birth of Submission Lag and Its Two Derivative Metrics
Medicare claims contain approximately 19 different date fields. Most analysts focus on the standard ones—service date, admission date, discharge date. I found myself staring at two in particular: the claim-line from date (the date the service was actually provided) and the submission date (the date the provider sent the claim to Medicare). The difference between them wasn't a standard metric—it didn't exist in any CMS database. So I created it: Submission Lag.
Why did I create it? Because I asked a simple, intuitive question that no one else seemed to ask:
"How long did it take for the provider to ask for their money?"
The time value of money is a fundamental principle of finance. A dollar today is worth more than a dollar tomorrow because it can be invested to earn interest. For healthcare providers, this means there is a clear financial incentive to submit claims as quickly as possible. Delaying submission—even by a few weeks—costs the provider potential interest income.
So when I saw a provider waiting 100 days to submit a claim, my reaction was immediate:
"Hold up! This dude asked for his money 100 days later?!"
That's not just a data point—it's a behavioral signal. No rational business delays payment without a reason. A provider who intentionally submits claims with long, irregular delays is behaving in a way that is economically irrational—unless they are gaining something else from the delay. That "something else" might be avoiding prepayment edits, obscuring billing patterns, or hiding excessive sessions.
This economic lens made the alternating lag pattern immediately suspicious. It wasn't just a statistical anomaly—it was a deliberate evasion tactic, betrayed by the very thing that should have been financially irrational.
At first, I included submission lag in my queries for Part B outpatient claims—allergy serum tests, for example. A beneficiary takes a test, finds out what they're allergic to, and doesn't take a series of those tests. There was no series of claims for the same patient over time. Submission lag was present, but it didn't reveal anything meaningful—there was no pattern to see.
Then I turned to long-term therapies—treatments like Hyperbaric Oxygen Therapy (HBO), where a patient might receive up to 60 sessions over a 365‑day period. Now I had what I needed: a series of claims per patient, ordered by service date.
When I calculated submission lag for each claim in that series, something emerged: an alternating pattern.
Provider A: 45, 55, 45, 55...
Provider B: 0, 100, 0, 100...
The mean lag was identical across providers (50 days)—but the rhythm was completely different. One pattern looked like normal batching. The other looked deliberate—and economically irrational.
That's when I realized: submission lag wasn't just a curiosity—it was a behavioral signal.
But I didn't stop there. I derived two variance-based metrics from it:
Variance of submission lag – to measure the amplitude of oscillation.
Variance of reordering – to measure when claims are submitted out of service order.
I asked a PhD biostatistician (10x published cancer researcher) what metric to use to capture this oscillation. He said: "Mean."
I used variance. Why? Because I wasn't looking for an average. I was looking for a rhythm—a pattern that didn't make economic sense. Oscillation is about dispersion—how far things swing around the center. That's variance, not mean.
He looked at my work and said:
"I've been curious as to how variance was used in models our co-competitors developed, and you figured it out on your own."
That day, he endorsed me for Statistics and SAS on my LinkedIn profile. He made me earn it—and I respect him deeply for that.
This is intuitive analytics: building your own tools, testing them, recognizing a pattern that defies basic economics, and applying the right statistic—not by default, but by insight.
Definition of Key Term – Submission Lag
Throughout submission lag is defined as:
Submission Lag = Submission Date – Claim‑Line From Date (in days)
Where:
Claim‑Line From Date = the date the service was actually provided (e.g., date of HBO therapy session).
Submission Date = the date the provider submitted the claim to the payer (e.g., Medicare).
Examples:
Service on Jan 1, submitted on Jan 1 → lag = 0 days.
Service on Jan 1, submitted on Feb 20 → lag = 50 days.
This lag is not inherently suspicious; providers may batch claims weekly or monthly. However, certain patterns of lags can indicate manipulation.
Executive Summary
The Issue
Medicare limits Hyperbaric Oxygen Therapy (HBO) to 60 sessions per 365 days based on service dates. Some providers manipulate submission dates – not service dates – to evade prepayment edits and hide excessive sessions. They create an alternating pattern of submission lags (e.g., 0 days, then 100 days, then 0, then 100…). Both the mean lag and simple batching rules miss this pattern.
The Solution
Two variance‑based metrics calculated from sorted claim sequences:
1. Variance of submission lag – measures the amplitude of oscillation.
2. Variance of reordering – measures how often claims are submitted out of service order.
Together, they flag providers who are “gaming” the submission timing.
Impact
In a pilot review of 45 HBO providers, two had extreme values on both metrics. Audit confirmed one case of backdating (90+ days) and one case of exceeding the 60‑session limit (85 sessions). Both were referred for recovery.
Personal note – earning the endorsement
I had asked Dr. Zhenhua Huang (PhD in biostatistics) for a LinkedIn endorsement for SAS and Statistics for nearly a year. He never responded to my request. After showing him this variance‑based approach – which he himself had been trying to figure out how others used variance in similar models – he finally gave the endorsement. This paper is dedicated to that principle: earn it.
1. The Scam – “Submission Lag Offsetting”
The rule: No more than 60 HBO sessions in any rolling 365‑day period (by service date).
The cheat:
Deliver 60 sessions legitimately (service dates Jan–Jun).
Submit half on time (lag=0), half with a long lag (e.g., 100 days).
Deliver a second block of sessions later in the year, but submit those with the opposite pattern.
Why?
Many payers run prepayment edits only on claims submitted within 90 days. The alternating pattern ensures half the claims skip prepayment checks. Also, when sorted by submission date, the two blocks interleave, hiding the true service date density.
The clue: Normal providers have low‑variance lags (e.g., all 45 days). Alternating schemes produce high variance and scrambled submission order.
2. Metrics – Technical Definition
Let a provider have n claims for a given patient (or aggregated at provider level). Sort claims by service date (oldest to newest). Assign service_order = 1,2,…,n.
Define submission lag for claim i :
Metric 1 – Variance of lag
Metric 2 – Variance of reordering
Sort claims by submission date (ties broken by service date). Assign submission_order = 1,2,…,n. For each claim, compute the absolute difference:
Then calculate:
Rule of thumb (based on simulation, n≥15):
Low risk: below peer median AND ≈0
Medium risk: either metric above 75th percentile
High risk: both metrics above 90th percentile (flag for audit)
3. Results from Pilot Data
Using simulated data that mirrored real patterns (n=10 per provider, as in the attached Excel file):
Provider B would be flagged for high var_lag alone. Provider D (random, chaotic submission) would be flagged for both. In real data, high var_reorder without high var_lag might indicate a different issue (e.g., frequent resubmissions). The two‑metric approach reduces false positives.
4. Discussion
Why variance beats mean:
Mean lag is blind to oscillation. Variance captures the amplitude. This is what distinguishes suspicious alternating patterns from normal batch billing.
Why reordering matters:
A provider who batches every 45 days will have zero reordering variance. A provider who alternates will scramble submission order, producing positive reordering variance. The combination is powerful.
Limitations:
Small claim counts (<15) give unstable variances.
Trends (e.g., linearly increasing lags) also increase variance; detrending may be required.
Not diagnostic – flags only indicate need for audit.
Extensions:
Add autocorrelation at lag 1 to explicitly test for alternation.
Use peer‑group benchmarking (specialty, region) instead of fixed percentiles.
Integrate into automated monthly monitoring dashboard.
5. Conclusion
A simple, explainable metric – variance – can uncover a sophisticated submission timing scam that mean‑based statistics miss. The dual‑metric approach (lag variance + reordering variance) is easy to implement in SAS, requires no machine learning, and has already led to real recoveries. For program integrity analysts, it’s a new tool in the toolkit.
Acknowledgments
My esteemed and dear friend, Dr. Zhenhua Huang, who made me earn every bit of praise, and whose honesty and rigor I deeply respect.
Code and methodology are open for reuse. Contact me for collaboration or questions.
SAS Implementation
*** Step 1: Sort by provider and service date;
proc sort data=claims out=step1;
by provider_id service_date submission_date; run;
*** Step 2: Create service_order;
data step2;
set step1;
by provider_id;
if first.provider_id then service_order = 0;
service_order + 1; run;
*** Step 3: Sort by provider and submission date to get submission_order;
proc sort data=step2 out=step3;
by provider_id submission_date service_date; run;
data step4;
set step3;
by provider_id;
if first.provider_id then submission_order = 0;
submission_order + 1; run;
*** Step 4: Sort back into service order for variance calculation;
proc sort data=step4 out=final_aligned;
by provider_id service_order; run;
*** Step 5: Compute metrics using PROC SQL;
proc sql;
create table provider_metrics as
select provider_id,
count(*) as claim_count,
mean(lag_days) as mean_lag,
var(lag_days) as var_lag,
var(abs(service_order - submission_order)) as var_reorder
from final_aligned
where calculated claim_count >= 15
group by provider_id; quit;
***Step 6: Flag outliers (example: top 10% by var_lag);
proc univariate data=provider_metrics noprint;
var var_lag var_reorder;
output out=pctl pctlpre=P_ pctlpts=90 75; run;
data flagged;
if _n_=1 then set pctl;
set provider_metrics;
flag_lag_high = (var_lag > P_var_lag_90);
flag_reorder_high = (var_reorder > P_var_reorder_90);
flag_audit = (flag_lag_high and flag_reorder_high); run;
Notes on the code:
var() in PROC SQL returns sample variance (denominator n-1).
Ties in submission date are broken by service_date in the second sort, matching the ROW_NUMBER behavior.
Minimum claim count (15) ensures stability.



















