Building the future of AB testing at carwow
Our product teams are filled with brilliant people, aligned to a world class product development strategy, however we do not always know what is best for our customers. A humbling moment in every product owner’s early career is when an idea they generated, that they and their team believed in and worked so hard to implement fails an AB test. This is the users of the product letting the product owner know that that implementation of their idea did not work for them. This is an unavoidable and uncomfortable truth of building products for consumers.
At carwow we value consumer feedback and want to ensure we release products and services our customers and partners will love. We run many AB tests, which enable our product teams to be bold, to take risks and to gain critical feedback to learn more about our products and our customers.
The Analytics and Data Science team are responsible for ensuring that our Experimentation process is as efficient and effective as possible. In this post we will talk about our plans to develop a new in-house analytical solution that will ensure we can operate at the pace carwow requires.
AB testing at carwow
We run many AB tests at carwow across our product and marketing teams. To do this we work with a number of third parties who help us implement our tests, whether that be online via client side or feature toggle methodologies, or across our many marketing channels. However, due to our complex business model, all our analytics is performed in house.
Our current in house AB testing tool was built a number of years ago and has provided great value to our product teams and wider stakeholders. However, as we continue to grow our product and our number of users, we now require a new solution.
There are a number of key criteria we want to build into our new solution, these include:
- Any results must be Trustworthy. Our metrics must be robust and reproducible, and the statistical analysis must be accurate and well presented
- Any reports must be Usable. We will require our dashboards to be used by all of our product teams and enable them to effortlessly analyse experiments and make clear informed decisions. We see visualisation and segmentation as vitally important
- We are a business that is evolving, entering and creating new markets, so the dashboards must be Extendable. It must be easy to introduce new metrics, segments, experimentation methods and analysis and new visualisations.
- We are a business that is growing massively, therefore it must be Scalable. We may need to collect and analyse data from tens of millions of users. Therefore we must ensure scalability is at the heart of our tool.
- It must run Efficiently. We should be able to refresh the data quickly, easily and routinely. We may want to perform deep dive analysis to quickly validate a new hypothesis. It also must store data and compute in the most efficient and effective way to minimise cost.
We have therefore decided to build our new AB test tool with a focus on the enablement of more experiments, analysing new metrics and utilising best in class statistical modules presented to the end user with easily digestible visualisations. Every tool has a name and we are calling ours LEAP (for now).
LEAP (Learning & Experiment Analytics Platform) is being built using a microservice type approach and will have five key functions.
Meta: Our Experiment Meta Data
This function will house all the experiment information including test name, start dates, end dates, trigger page, hypothesis, key metrics (split by success, feature, guardrail and debug) etc. This will be populated prior to the experiment launching. This will enable our product teams to choose the metrics they care about and want to observe for their test.
Metrics Dictionary
A dictionary managed by the Analytics and Data Science team that houses all our metrics and user segmentation levels. This dictionary will include the SQL code required to calculate each metric and segment. New metrics or segments will be easy to add, whenever the need arises.
Data Collection
The data collection step is where the magic really begins. Utilising the information from the Meta and Metrics Dictionary we will create python generated SQL to capture all the required metrics for each experiment at a daily and cumulative level. This process will run on a schedule each morning, ready for when we start our day, however we will also enable this step to be manually triggered for certain tests, with an ability to add additional metrics for when a deep dive is required.
Statistics Engine
At carwow we have a stats package built in python that includes a number of different statistical functions. These include standard frequentist stats calcs, bayesian calcs and bootstrapping. The bayesian calcs allow us to use pre-experimental data to increase our test sensitivity for certain experiments. The package also includes the ability to perform pseudo experiments. This will allow us to run the most appropriate stats calc for each experiment metric and populate our final table which will be presented in the final step. These stats calcs are also available to the wider business to use — aiding consistency and rigour in all our analysis.
Visualisation
The final step in the process is the visualisation, which will see the greatest level of user interaction. We plan to create two approaches. A dashboard which will be accessible to all and be built using plotly. We will enable the product teams to make clear informed decisions based on the experimental data with best in class data visualisations.
We will also enable our analysts and data scientists to deep dive using jupyter notebooks and provide access to our data visualisation templates in order to maintain consistency and efficiency. This will also provide us with a playground to assess previous experiments, research new statistical models and consider new metrics and segments. We will also use this to advance our understanding on a number of key experiment principles.
It is our hope that building a tool in this way will ensure it delivers value and also makes future evolutions of the tool relatively simple. As a company we are growing rapidly, increasing the number of product teams and increasing the number of products and users on our sites. Therefore, when we build this new infrastructure we must do so with scale in mind.
As we are beginning our journey in building our new tool, we know our thoughts and approach will evolve over time. As we build and utilise LEAP we will write further articles in the hope that it helps others considering or starting this process, but also create a community where we can share best practise and learn from others who are on this journey.
At carwow our Product Analytics and Data Science team are focussed on delivering impactful analysis to enable our product teams to deliver innovative products, identify key opportunities to improve our products and help drive exponential growth. If you would like to join us, please check out our jobs page.
— by Gareth Holder, Head of Product Analytics and Data Science @ carwow