Data 8 October 2025

Get Ready for Cleaner, Fresher Data

with the second edition of Between the Spreadsheets

Link copied to clipboard!

Who’d have thought so much could change in four years? When I published the first edition of Between the Spreadsheets, GenAI was a mischievous twinkle in Sam Altman’s eye. It’s now everywhere. And this makes it more critical than ever to clean our data, keeping it fresh and tidy. Yet, data cleaning remains an overlooked discipline; it’s rarely covered in academic studies and continues to be neglected in even the largest organisations.

But wait, I hear you cry. GenAI can do our data cleaning for us now, can’t it? Perhaps. But only if it uses clean data in the first place. (After all, garbage in, garbage out.)

Why did I write a second edition of this book?

As organisations adapt to leverage AI technology, many are ignoring a dangerous truth – that those AI models will only deliver high-quality outputs if they’ve been fed rigorously cleaned inputs.

This is just one of the subjects in my new edition of Between the Spreadsheets. And given I’ve spent much of the past few years untangling and cleaning up AI-related messes in data, you can be confident of learning lots.

What is this book about?

The second edition of Between the Spreadsheets reinforces the fundamentals of data cleaning and classification. Read it and you’ll learn topics that range from basic data classification, normalisation and my proven COAT framework. There are fresh case studies, even more use cases and… plenty of real-life application examples for you to learn from.

Can I share an example?

Sure! As AI is such a top-of-mind subject, here’s an example of the kind of thing you can expect to learn from the new edition of Between the Spreadsheets.

This example focuses on one of my favourite topics – cleaning data for AI.

Everything stems from clean data. If you’re working with an off-tlf GenAI tool, ask yourself questions such as:

Where is this information from?
Based on what we know, is it accurate?
Could the data be biased?
How can we exclude biased data?
Is there a chance of hallucination?

If you’re familiar with the data you’re working with, run a spot check consistency (this is essential). You’ll need to run a more in-depth fact check for bias and hallucinations.

If you’re working with, or building, an in-house GenAI tool, you’ll have a fantastic opportunity to influence its success by having clean and accurate training data. Don’t just leave it to the IT guys or data people to build it. Get involved as much as you can, no matter what your department. No-one will know your data as well as you do.

The most important thing for you to do is to get your data COAT on and make sure your data is Consistent, Organised, Accurate and Trustworthy. And decide on your standards, before you start normalising, classifying or cleaning your data:

When Data Strategy Leads AI

Why leaders must treat data quality, ownership and governance as the first phase of AI delivery, not a back-office cleanup task.

To abbreviate or not?
Specify your units of measurement.
Upper case or lower case?
Language.
Classification or normalisation rules.

These will all have a bearing on your training data and the output. Use the learnings from elsewhere in the book (chapters 2, 4 and 5) and apply them to your training data. Then use your spot-check learnings to check the output.

Although getting your training data correct is essential, spot checking and flagging errors from the output comes a close second; never trust that you’ll get the results you need, just because your training data is correct. Check, check and check again!

Who is this book for?

This book has been written for those who work with data, particularly in data management, classification, cleaning, and decision-making. It’s especially useful for:

Data professionals who regularly handle data quality and classification challenges.
Business users such as procurement, finance, marketing and sales, who use spreadsheets to support decision-making.
Educators and students with an interest in data science and business analytics.

That said, whatever your level of data experience, all readers will learn how to support efficiency, reliability and organisational impact.

Fireworks, Family, and the Framework for Data Governance

Here’s what others have said about the book:

'Between the Spreadsheets is an outstanding guide that brings clarity, practicality, and energy to one of the most overlooked aspects of working with data. It is practical, insightful, and refreshingly engaging. Susan Walsh has a unique ability to demystify the often-overlooked but mission-critical task of data cleaning and shows with remarkable skill why clean, consistent, trust-worthy, and reliable data is the foundation of every successful decision and provides readers with accessible methods and real examples that make the subject come alive.

Are you enjoying the content so far?

Why not support Susan Walsh by giving this content a like

This second edition not only deepens her original insights but also brings timely perspectives on Al and Generative Al, real-world case studies, and her proven COAT methodology. This book will benefit anyone who wants to understand data quality more deeply, from students and early career professionals to leaders and decision makers. It is practical, insightful, and enjoyable to read, making it an essential resource for today's data driven world This book will change the way you think about data quality and give you the tools to act on it. Susan makes data approachable, human, and - dare I say - fun. A must-read for anyone serious about unlocking the true value of their data.’ - Alaa Marshan, Senior Lecturer and Researcher, University of Surrey

“With her trademark humour, Susan takes the headache out of data with fresh insights, real-world stories and a toolkit you’ll enjoy using. If you want consistent, trustworthy data without falling asleep over a spreadsheet, this is the guide you’ve been waiting for. This book will leave you better equipped, more confident, and maybe even laughing along the way.” - Caroline Carruthers, Chief Executive, Carruthers and Jackson and author

Content Bans and Platform Risk

Tumblr’s decline exposes the governance, privacy and community risks when content crackdowns collide with marginalized users’ trust.

‘The second edition of Between the Spreadsheets seamlessly expands a world in which author Susan Walsh is showing us not only the uncomfortable truth around data, but also approaches and methods of how to get rid of data in an effective and sustainable way. The new chapter on breaking myths around how GenAI can help with data cleaning is especially timely and enlightening, and the data horror stories are scary but also painfully reflective of data issues in this day and age. Susan's writing style is wonderfully reflective of her fun and approachable personality, and I can only recommend anyone interested in creating and maintaining clean data to read this book!'

Tiankai Feng, author of Humanizing Data Strategy and Humanizing Al Strategy

Where can I get my copy?

UK readers can buy Between the Spreadsheets from Facet and Waterstones.

US readers can head to the American Library Association and Barnes & Noble.

Readers from the rest of the world can buy a copy from Amazon.

Susan Walsh

Founder & MD at The Classification Guru Ltd

Message

Susan Walsh, The Classification Guru, is a globally recognised expert in spend data classification, cleaning, and transformation, with over a decade of experience helping organizations fix their "dirty data." Founder of The Classification Guru Ltd in 2017, she has supported over 90 clients worldwide. Susan is an influential thought leader, global speaker, and advocate for clean data, as well as a Key Influencer for the Accounts Payable Association. She authored "Between the Spreadsheets: Classifying and Fixing Dirty Data" and shares her expertise through online courses and her TEDx talk, "Say NO to NO." Developer of the COAT framework (Consistent, Organised, Accurate, Trustworthy), Susan launched Samification, a self-service supplier normalisation tool, in October 2024. She frequently speaks at data and procurement events globally, dedicated to helping businesses achieve clean and effective data. She has attended well over a hundred events in the data and procurement space, whether that be as a speaker, an exhibitor, or as an influencer, and has been active in tackling your dirty data!

Did you enjoy the content?

Why not support Susan Walsh by giving this content a like

Comments ( 0 )

Written by

Susan Walsh

Founder & MD at The Classification Guru Ltd · EM360

View profile →

Get Ready for Cleaner, Fresher Data

Why did I write a second edition of this book?

What is this book about?

Can I share an example?

When Data Strategy Leads AI

Who is this book for?

Data Governance as Architecture

Here’s what others have said about the book:

Content Bans and Platform Risk

Where can I get my copy?

Comments ( 0 )

When Data Outranks The Model

Inside Data Quality Stacks

Synthetic Data’s Compliance Trap

More from EM360

Data Lakehouse Architecture and the Shift to Unified Data Platforms

What Happened to Google+? Why Google’s Social Circles Failed

What is Snapchat’s My AI and is it Safe to Use?

Something About AI Still Doesn’t Feel Right

Why did I write a second edition of this book?

What is this book about?

Can I share an example?

When Data Strategy Leads AI

Who is this book for?

Data Governance as Architecture

Here’s what others have said about the book:

Content Bans and Platform Risk

Where can I get my copy?

Comments ( 0 )

More from EM360

Data Lakehouse Architecture and the Shift to Unified Data Platforms

What Happened to Google+? Why Google’s Social Circles Failed

What is Snapchat’s My AI and is it Safe to Use?

Sign up for the EM360Tech Newsletter