How you create rapid machine learning prototypes (with sample use cases)

by Andre Bappert Aug 8, 2019

10 min. reading time

Businesses, decision makers and machine learning practitioners share a common struggle: How do we identify business problems worth turning into data problems, so we can create data-empowered solutions that result in business value? Read further to get an answer to that question!

This blog post shares insights into the so-called “5-day data sprint”, which you can use to address this challenge. It explains how you run such data sprints to create successful machine learning prototypes. At the end of this post, you will find some inspiration in the form of exciting sample use cases that can be achieved with data science and machine learning practices.

Here is how you create rapid machine learning prototypes by using the 5-day data sprint

The 5-day data sprint

The 5-day data sprint sets out the following key results:

break down business problems into data problems
identify use cases and opportunities worth solving
collect and explore datasets
feature engineering
iterate with algorithms and machine learning models

all in preparation to end the sprint with the objective of engineering the first functional prototype in just 5 days.

How does the data sprint work?

A data sprint usually has up to eight participants, ideally four small or medium enterprise specialists (SMEs) with domain expertise and four data scientists. During the sprint, no distractions are allowed while a full focus on the business problem, the opportunity, and the solution are encouraged. Without further ado, let's have a look at the process.

Note: Experience shows that an exploration phase to create a shared understanding of the business context, to align on objectives, and to start preparing data sets, greatly increase the success of the data sprint.

The exploration phase

As with successful endeavors, before you start moving towards the desired destination, you need to understand, prepare, align, and plan the mission.

In other words, you are well advised to run a series of workshops together in order to

... create a shared understanding of the business context and problem,
... explore and collect potential use cases to solve these problems,
... collect resources and materials needed for the sprint,
... align on clear objectives for the sprint and beyond,
... finalize the plan before you head off for battle.

You should end the preparation with a one-page challenge description, describing, aligning, and communicating expectations and desired outcomes for all participants.

The one-page challenge description

The result of the preparatory phase, the one-page challenge description, usually answers the following questions:

What is the business problem you do want to solve?
How can this problem be solved using data science and machine learning practices?
What data do you need to collect for a successful sprint and machine learning prototype?
How might desired outcomes look like?
What algorithms do you want to test?
How do you interpret the model outputs?

Now that you have a shared understanding of the problems, you want to tackle, a brief collection of desired outcomes as well as clear measurements on how to evaluate the results, you are ready to begin the actual sprint.

Day 1 – Introduce the challenge

You begin the sprint by introducing all participants to the one-page challenge description. You introduce the main business problem, the desired outcomes as well as promising use cases that might solve the challenge.

During the challenge introduction, participants are encouraged to share ideas, insights and potential customers that might benefit from the desired solutions.

At the end of the collaboration session, you begin to set up the project, libraries, and online workspace to begin the initial data exploration phase.

Day 2 – Use domain knowledge for feature engineering

From day one, all participants have a shared understanding of the challenges and the data opportunity at hand. You embed your extensive domain knowledge to begin the feature engineering process of the data sets. Choose important variables, reduce dimensions and begin to engineer a data set of relevant features.

The shared understanding of the problem space, the extensive domain knowledge combined with experienced data exploration techniques of your data scientists empowers you to usually end the second day with a feature-rich data set, ready to empower your machine learning models.

Day 3 – Test various algorithms and machine learning models

Based on the results, you further engineer your data sets and try to reduce the complexity and dimensionality further. Even though the saying “the more data the better” is true, having more, relevant data in a data set is key to successful outputs. It is not uncommon for data sets to have up to 1.000 features at the beginning of the sprint. After day three, you might have reduced the data set to 100 remaining, powerful features.

Day 4 – Finalize the ML model and summarize the findings

On day four you finalize the adjustments on the machine learning model and start summarizing all the findings you had during the data exploration and model testing phase. This summary includes insights into the model performance, techniques used, as well as an evaluation of the results in one document.

Then, you explore further and see how the machine learning prototype can be improved and further integrated into an end-to-end use case.

Day 5 – User testing, presentation and handover of the results

Time for the ultimate test of your machine learning prototype: Schedule 5 neutral users / potential customers to test the prototype you created. These user testing sessions are pretty straight forward. Let each user see the prototype on a real device with the purpose of using the application.

Best-practice is to record the session so you are able to carefully observe verbal and emotional reactions, potential pain points and delighters to gain deep insights into the user experience of your prototype. This creates invaluable artefacts for further development of your successful machine learning application.

Time for a light lunch – great to diggest the results of the data sprint. Almost there!

You will finish the week with presenting and discussing the results of the sprint with your companies stakeholders. This is a great opportunity to share patterns you found in the data, uncover valuable business insights from the user testing session with potential customers as well as discussing the further exploration of the application with stakeholders and team members.

Usually, this day is full of excitement and ideas on how to further improve and mature the prototype into a fully deployable machine learning solution.

We have created a clear info graphic for you, with which you can always keep an eye on the individual phases respectively days of the 5-day data sprint.

As a next step you plan how to iterate, integrate and deploy the machine learning prototype into a fully mature solution that delivers tangible business value.

Sounds exciting, right? It is!

Get in contact today and get a free consultation session to find out whether or not a data sprint might help your business to accelerate.

Still not sure what to do with machine learning?

Explore these three machine learning prototypes to get inspired about potential machine learning solutions.

1. Colorize old, grey images of your family with DeOldify

The first application is named “DeOldify”, an application that enables you to colorize old, grey scale images to give life and color to your old family memories. Use it for fun, to empower your next Christmas presents or simply to experience the power of deep learning and generative adversarial networks.

Full credit goes to jantic for open-sourcing his work on DeOldify.

2. Use an object detection model in 5 simple steps

The second application is an object detection algorithm provided by a Tensorflow API. The field of computer vision and object detection has a wide range of potential use cases. It has demonstrated potential use for corporations in the fields of quality assurance on a production line, for security and surveillance purposes, for autonomous driving vehicles as well as for applications in medical imaging.

Full credit goes to Nicolas Bortolotti and Google.

3. Scrape Twitter data and analyze trending topics

The third application walks you through scraping data from the Twitter API, analyzing its content to understand how people resonate with a topic of choice.

To use this, you need a Twitter developer account, which you can create here.

Full credit goes to Bilal Tahir for open-sourcing his work on towardsdatascience.com.

If you liked this post, be sure to share it with your colleagues.

Review

This blog post gave you practical insights into a methodology we call Data Sprint - which we abstracted from experience with google ventures famous design sprint and practice it with our partners to create rapid data science prototypes.

Next, we learned about three sample use cases which include a GANs to color greyscale images, an object detection model to identify humans, animals and more objects in images (this possible for videos, too). The last, more advanced example includes data mining from twitter, beginners level data preprocessing as well as natural language processing techniques as named entity recognition and sentiment analysis to mine and categories opinions and emotions in tweets of the topic of your choice.

Now it is time for you to get creative.

What game-changing ideas can you generate? What use cases can you see in your business and beyond? - or you may already be prototyping? In any way, make sure to bring your business to the forefront by taking advantage of data science and machine learning techniques.

Did you find value in this post?

Be sure to share it with a friend that needs to know about Data Sprints to create rapid machine learning prototypes.