Just a Random Film @justarandomfilm - Tumblr Blog

Bayesian Linear Regression

Normal Linear Regression

It has been a while since last post as I was very busy with thesis stuff, and I’ve just finished the Bayesian course. Today, I wanna try something new which is Bayesian Linear Regression; the model that I just found out from the online course that I enrolled in lol

Data in used; the simulated data contains 2000 records and 5 attributes; spending (our target attribute), age, gender (Male, Female), salary, and status (Single, Married)

Let’s start with normal Linear Regression (MLE)

The model yield Adj R squared at 0.55. The overall model is significant (F-statistics p-values < 0.05) and each variable is also significant (t-test p-value < 0.05)

Now let’s look at diagnosis results

Linearity and Heteroskedasticity

The residuals are randomly clustered around 0 on y-axis and from turkey test shows that there is no significant evidence of non-linearity case.

2. Autocorrelation

Using Durwin-Watson test we can see that there is no evidence that the residuals are related to each other

3. Independence

Using VIF to check whether own input attributes are related to each other, since VIF of each variable < 5 we can conclude that each variable is independent

4. Normality

Shapiro-wilk (W = 0.99; p-value = 0.53) and the histogram of residual confirmed that the residual is normally distributed. (I have never seen the most perfect normal distribution of residual like this before lol)

As our model can pass all assumptions, we can say that the Linear Regression model is good enough to predict or interpret

Y_hat = 391.7 + 3.633⋅age − 40.80⋅genderMale + 0.003946⋅salary − 46.92⋅statusSingle

We can say that

Older customers tend to spend money more than the younger customers

Female customers spend more money compared with their gender counterpart

The higher salary, the higher spending

Married people tend to spend more money compared to single people

Under the assumption of a linear relationship

But we can use Bayesian Linear Regression which produces full posterior distributions for parameters, allowing credible intervals and probability statements.

Bayesian Linear Regression

Let’s start with prior distributions (what we believe the parameter distributions look like)

As I don’t know my parameter distributions look like yet so I use weak prior information instead (normal with high variance) and my target y prior is t-distribution so the model can cover the case of outlier better than normal prior y. We also have to set the prior of precision (inverse variance, we use this instead variance because it has close form) and nu which is the degree of freedom of t-distribution.

Then I set up the initial distribution that our simulation and I run the MCMC using n.chain which reduces the time of running MCMC (it works as the number of workers that run parallel)

Also, we need to set up the burn-in state and number of run on each chain too.

In this example, I set burn-in state 1000 and run 5000

We can check the convergence of models through convergence analysis

The trace plot shows no pattern = converge. If it shows the curve/no random = need to run more

Gelman diagnosis also confirms the convergence of the model (near 1 = converged)

But upon checking affective size, we might need to run more since the b[1], b[2] and b[4] size are quite small.

The results of model

The performance metrics suggest LR is slightly better than Bayesian LR

This plot shows the observed vs fitted with 95% credible interval

The predictive mean with 95% credible interval also provides a similar conclusion

Older customers tend to spend money more than the younger customers

Female customers spend more money compared with their gender counterpart

The higher salary, the higher spending

Married people tend to spend more money compared to single people

Every credible interval does not cover 0 = every variable is significant for predicting the amount of money spent

Here is a cool thing about Bayesian Linear Regression

Support we have 2 customers:

customer 1 -> age 30, male, income 75000, married

customer 2 -> age 35, female, income 70000, single

What’s the probability that customer 2 spends more money than customer 1

We can use Monte-Carlo simulation to find the answer

And here’s the result

We can conclude that there is 76.36% chance that customer 2 spends more money than customer 1, so as the business manager we might value customer 2 more such as giving them the coupon or suggesting the new product.

To sum up, even though normal LR can yield better performance, the interpretability of Bayesian model is something that we can’t ignore. Also, I believe that if we set prior more proper and represent the prior knowledge that we (may) have it will yield better performance than classical LR

Link : https://github.com/Filimize/Film_S_blog/tree/main/2026_05_18_Bayesian_Linear_Regression

#statistics #bayesian statistics #SoundCloud

•18+ Adults Only

Watch Anya Live on Cam

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.

✓ Live Streaming✓ Interactive Chat✓ Private Shows✓ HD Quality✓ Free Actions

Free to watch • No registration required • HD streaming

First Blog and it's all Churn Analysis using survival model😭

We normally use survival analysis in healthcare field. However, it can be useful in business contexts also, especially churn analysis and Recency-Frequency-Monetary (RFM) analysis.

Survival analysis is better than normal churn analysis methods (classification) because it accounts for the effect of time; the probability changes over time. However, since it depends on time, the data requires time-related variables, unlike standard classification methods. Therefore, it may require more resources (time and funds) during data-collection process.

In this article, I would like to demonstrate the churn analysis using a survival analysis framework. The code and interpretation are adapted from the tutorial by Emily C. Zabor (https://www.emilyzabor.com/survival-analysis-in-r.html)

Let’s start

We use synthetic data about subscription churn in the US; it contain 500 rows and 7 columns (sex, plan, country, age, churn, first_signup_date, last_active_date)

Next, I tried to create Kaplan–Meier survival model with 95% confidence level (CI) without any dependent variable to capture the trend of churning overtime. The plot indicates the fall of not churning probability as time goes by. The 95% CI has widened with time, indicating that the estimated probability has fluctuated.

However, in real life, not only time-related variable, but also the other factors impact the hazard. Therefore, I used Cox regression to check the effect of variables:

The likelihood ratio test is significant, thus there are dependent variables that affect the independent one. However, sex = male is not significant (p = 0.570 > alpha = 0.05), we might have to drop it

To examine the effect of gender on churn rate furthermore, I use log rank test to investigate it

The test also confirms that there’s no difference in churn rate between genders.

Since we only have one variable that is significant, I decided to use Kaplan-Mier where x = plan, for better visual comparison.

From the results, the basic has the highest churn rate, while premium plan is the lowest among all 3 plans. It is worth noticing that even though premium plan has the lowest churn rate, it has winder 95% CI than others, indicating higher fluctuations/uncertainty. Therefore, the company should try to launch the promotion to keep premium members, other than trying to persuade the customer to go for the premium tier.

Extra

The survival analysis can show the characteristics of each class in specific time frame. For example, after 1 year of subscribing to the services, the premium members have higher chance of not churning the service, compared with basic subscribers by 14%

We also can observe the median customer lifetime of each group. For example, people who subscribe to the standard plan use the service for about 409 days before churning.

Full code: https://github.com/Filimize/Film_S_blog/tree/main/2025_12_12_Chuch_and_Survival_Analysis

#statistics #survival analysis #churn analysis #data analytics #data science #SoundCloud

Trending Blogs

Last Seen Blogs

Just a Random Film