π₯Ύ

Category
Statistics π
Published on
May 1, 2023
Updated on

# Introduction

When running an experiment, sometimes the randomisation unit is different from the analysis unit. In this case, the assumption of independence between each observation may not hold anymore.

Since the independent and identically distributed (i.i.d.) assumption is violated, it is therefore impossible to run a standard test on raw data.

Several options are possible. One is to estimate the true variance with the Delta method, explained in a previous post:

πΌDelta Method for A/B testing

Another option is to perform bootstrapping, which we discuss in this article.

# Python implementation

1. As experiment data, we have a DataFrame of users that were randomly assigned to the Control or Target group. We recorded their sessions in the app, and the number of sessions that generated a conversion.
2. The table contains one row per `user`, with their `group`, total number of `sessions`, total number of `conversions`, and `conversion_rate` calculated as conversions over sessions:

``````| group   | user_id          | sessions | conversions | conversion_rate |
|:--------|:-----------------|---------:|------------:|----------------:|
| Control | b0cc6b25669f1cfb |      150 |          62 |        0.413333 |
| Target  | 1cc2f0c081cff495 |       20 |          11 |        0.550000 |
| Control | 0dfa929aa7cea87a |       31 |           6 |        0.193548 |
| Target  | 0dfa929aa7cea87a |       39 |           9 |        0.230769 |
| Control | e1916d7a661d210f |        3 |           2 |        0.666667 |``````

We check summary statistics, with the conversion rate of each group:

``````# Summary stats
df_summary = (
df
.assign(conversion_rate=lambda x: (x['conversions']/x['sessions']))
.groupby(['group'])
.agg({'user_id': 'count', 'sessions': 'sum', 'conversions': 'sum'})
.assign(conversion_rate=lambda x: x['conversions']/x['sessions'])
)
df_summary``````
 group users sessions conversions conversion_rate Control 488 37689 7662 0.2032 Target 493 45106 8134 0.1803
3. Define a function to calculate the difference in conversion rates between groups.
4. Note: another possible option would be to look at the difference in average (unweighted) users conversion rates between groups.

``````# Function to get the statistic
def calculate_difference(data, numerator, denominator, group):
conv_rates = (
data
.groupby(group)
.agg({numerator: 'sum', denominator: 'sum'})
.assign(conv_rate=lambda x: x[numerator]/x[denominator])
['conv_rate']
)
return conv_rates.iloc[1] - conv_rates.iloc[0]``````
5. Calculate the difference for the observed data, not yet resampled. In this example, there is a -2.29% difference in the Target group vs Control, as seen from the summary stats in step 1.
6. ``````# Actual observed difference
observed_difference = calculate_difference(df)
observed_difference``````
``-0.0229``
7. Perform bootstrapping by randomly assigning groups n times. At each iteration, the difference in conversion rates between the random groups is returned and appended to an array.
8. ``````# Boostrap n times over all users, with replacement
n_bootstrap = 10000
bootstrap_difference = []

for i in range(n_bootstrap):
df['boot_group'] = 'A'
df.loc[df.sample(frac=0.5).index, 'boot_group'] = 'B'
bootstrap_difference.append(calculate_difference(df, 'conversions', 'sessions', 'boot_group'))

bootstrap_difference = np.array(bootstrap_difference)``````
9. Finally, compute the p-value as the share of bootstrap samples where the absolute difference (because weβre running a two-tailed test) in conversion rates is greater than the global observed difference.
10. This is the very definition of p-value: if we sampled the data repeatedly from two random groups, how often would we get a more extreme difference than the observed difference?

``````# Calculate p-value
p_value = (np.abs(bootstrap_difference) >= np.abs(observed_difference)).sum() / n_bootstrap
print("p-value: {:.3f}".format(p_value))``````
``p-value: 0.465``

It happens to be non significant, with a p-value much higher than 0.05.