Article by Eswar Nagireddy, Senior Product Manager Data Science, Machine Learning and Advanced Analytics at Exasol
For a long time, data science and machine learning (ML) - although pioneering technologies – have been viewed as ‘blackboxes’. The reach of data has often been contained in the hands of a few data scientists, who had the skills and understanding to organise, crunch and interpret it for their organisation. This was largely out of necessity, as tools didn’t widely exist to make data shareable and consumable for non-data analysts. Now, the likes of Automated Machine Learning (AutoML) are making data easily interpretable to many.
Explaining AutoML
AutoML frameworks are ready-made solutions that are no-code or low-code in nature, taking away the complexity involved in building and training machine learning (ML) models.
Training ML models requires various time-consuming steps from data preparation, through to exploratory data analytics, experimentation, model selection, and more. Moreover, there’s then further infrastructure that typically needs to be developed depending on what an organisation already has in use. And that’s not all, there’s also various business KPIs to measure each ML model against and the model performance metrics needs to be monitored throughout the life cycle. Handling and executing on all of these steps requires a high level of technical experience and knowledge, and takes a lot of time.
AutoML takes the majority of this hassle away from non-technical users with very little need for human intervention. It democratises data and data processes to enable non-data analysts/scientists to leverage ML without comprehending the exact logic behind the technology.
Users can quickly realise the value of applied ML to data with full transparency into various experiments and outcomes and the best model/parameters for a given business case. Most commonly employees use AutoML to automate their most repetitive and time-consuming tasks and to add value to their role with data-driven decision making.
AutoML in practice
The use case: Let’s imagine a customer services team that handles customer complaints and feedbacks about their products. The team need to direct these complaints to different teams in the company to address the issues and want to keep high customer satisfaction.
The challenge: This team is working part-time and review and redirect these complaints manually to respective departments – managing the associated conversations with this too. Product lines are growing as well and it’s getting increasingly difficult to serve customers within a reasonable timeframe. Added to this, customer’s incoming requests are from multiple channels such as company portals, emails, phone calls, tweets, etc., and not every message from them is a complaint.
A potential solution: Speed up the process of reading inbound customer messages and directing them with automated processes. The image below depicts the optimised solution on the right.
Now we have the basics of the solution, we can ask questions to find the desired level of automation for the process.
- Can we get all the messages into a single database?
- How do we determine which department receives a complaint?
- How do we trust the results of this automation?
- If we scale the business to serve millions of customers what complications might there be?
- Do we have enough data to apply machine learning?
It’s best to start small and split complex decision making processes into tiny chunks to analyse and address each part of the puzzle efficiently. For the above use case, the whole problem can be broken into the following stages:
- Data collection strategy
- Data storage and associated technology
- How can we automate the above step?
- What is the scope of automation and is ML required?
- How can ML help us and what kind of ML problem is it?
- How do we model the data and show the value to businesses?
- Start with AutoML technology to realise if ML assists in solving the issue
- Engage with the engineering team to build end to end infrastructures
- Address and automate smaller chunks
Experimenting with AutoML
Once smaller segments of the challenge have been identified, repetitive tasks can then be automated. In our example we would apply an ML classification to incoming messages to tag particular text as complaints. Once a complaint has been identified, we can apply another ML classification model to tag where the message should go next. If this had satisfactory results, we’d then make this process standard and scalable.
The importance of central data repository
For AutoML to help users quickly reach a state where they can apply ML, they need data that is refined and easily accessible. Extracting raw data and storing it in a centralised analytics database is a crucial first step for any organisation.
A central repository where all a company’s raw data exists can help to enable robust, scalable, flexible, and maintainable data pipelines – breaking down data silos and enabling true data democratisation. Only then can multi-department analytics environments, performance of ML tasks, integrations and virtualisation of data for easy access be realised.
One such integration to run AutoML experiments easily is with AWS Sagemaker and Exasol, which brings ML processes a step closer to the data and provides easy, intuitive access to ML capabilities. Models can be developed for an organisation’s common ML use cases such as identifying at-risk customers to reduce churn rates, stock level forecasting to anticipate supply requirements from vendors, fluctuations in demand by product or to help detect fraud.
The power of data democratisation
There is an increasing need for high-power data functionalities to address increasingly complex business demands and there’s no time for organisations to slow down. Transforming raw data into insights at speed to optimise specific business scenarios has never been more important. There are no limits to what can be achieved once every employee has the power to access data-driven insights to deliver business-critical insights day in, day out.