em360tech image

Who’d have thought so much could change in four years? When I published the first edition of Between the Spreadsheets, GenAI was a mischievous twinkle in Sam Altman’s eye. It’s now everywhere. And this makes it more critical than ever to clean our data, keeping it fresh and tidy. Yet, data cleaning remains an overlooked discipline; it’s rarely covered in academic studies and continues to be neglected in even the largest organisations.

But wait, I hear you cry. GenAI can do our data cleaning for us now, can’t it? Perhaps. But only if it uses clean data in the first place. (After all, garbage in, garbage out.) 

Why did I write a second edition of this book?

As organisations adapt to leverage AI technology, many are ignoring a dangerous truth – that those AI models will only deliver high-quality outputs if they’ve been fed rigorously cleaned inputs.

This is just one of the subjects in my new edition of Between the Spreadsheets. And given I’ve spent much of the past few years untangling and cleaning up AI-related messes in data, you can be confident of learning lots. 

What is this book about?

The second edition of Between the Spreadsheets reinforces the fundamentals of data cleaning and classification. Read it and you’ll learn topics that range from basic data classification, normalisation and my proven COAT framework. There are fresh case studies, even more use cases and… plenty of real-life application examples for you to learn from.

Can I share an example?

Sure! As AI is such a top-of-mind subject, here’s an example of the kind of thing you can expect to learn from the new edition of Between the Spreadsheets.

This example focuses on one of my favourite topics – cleaning data for AI. 

Everything stems from clean data. If you’re working with an off-tlf GenAI tool, ask yourself questions such as: 

  • Where is this information from?
  • Based on what we know, is it accurate?
  • Could the data be biased?
  • How can we exclude biased data?
  • Is there a chance of hallucination?

If you’re familiar with the data you’re working with, run a spot check consistency (this is essential). You’ll need to run a more in-depth fact check for bias and hallucinations.

If you’re working with, or building, an in-house GenAI tool, you’ll have a fantastic opportunity to influence its success by having clean and accurate training data. Don’t just leave it to the IT guys or data people to build it. Get involved as much as you can, no matter what your department. No-one will know your data as well as you do.

The most important thing for you to do is to get your data COAT on and make sure your data is Consistent, Organised, Accurate and Trustworthy. And decide on your standards, before you start normalising, classifying or cleaning your data:

  • To abbreviate or not?
  • Specify your units of measurement.
  • Upper case or lower case?
  • Language.
  • Classification or normalisation rules.

These will all have a bearing on your training data and the output. Use the learnings from elsewhere in the book (chapters 2, 4 and 5) and apply them to your training data. Then use your spot-check learnings to check the output.

Although getting your training data correct is essential, spot checking and flagging errors from the output comes a close second; never trust that you’ll get the results you need, just because your training data is correct. Check, check and check again!

Who is this book for?

This book has been written for those who work with data, particularly in data management, classification, cleaning, and decision-making. It’s especially useful for:

  • Data professionals who regularly handle data quality and classification challenges.
  • Business users such as procurement, finance, marketing and sales, who use spreadsheets to support decision-making.
  • Educators and students with an interest in data science and business analytics.

That said, whatever your level of data experience, all readers will learn how to support efficiency, reliability and organisational impact.

Here’s what others have said about the book:

'Between the Spreadsheets is an outstanding guide that brings clarity, practicality, and energy to one of the most overlooked aspects of working with data. It is practical, insightful, and refreshingly engaging. Susan Walsh has a unique ability to demystify the often-overlooked but mission-critical task of data cleaning and shows with remarkable skill why clean, consistent, trust-worthy, and reliable data is the foundation of every successful decision and provides readers with accessible methods and real examples that make the subject come alive.

This second edition not only deepens her original insights but also brings timely perspectives on Al and Generative Al, real-world case studies, and her proven COAT methodology. This book will benefit anyone who wants to understand data quality more deeply, from students and early career professionals to leaders and decision makers. It is practical, insightful, and enjoyable to read, making it an essential resource for today's data driven world This book will change the way you think about data quality and give you the tools to act on it. Susan makes data approachable, human, and - dare I say - fun. A must-read for anyone serious about unlocking the true value of their data.’ - Alaa Marshan, Senior Lecturer and Researcher, University of Surrey

“With her trademark humour, Susan takes the headache out of data with fresh insights, real-world stories and a toolkit you’ll enjoy using. If you want consistent, trustworthy data without falling asleep over a spreadsheet, this is the guide you’ve been waiting for. This book will leave you better equipped, more confident, and maybe even laughing along the way.” - Caroline Carruthers, Chief Executive, Carruthers and Jackson and author 

‘The second edition of Between the Spreadsheets seamlessly expands a world in which author Susan Walsh is showing us not only the uncomfortable truth around data, but also approaches and methods of how to get rid of data in an effective and sustainable way. The new chapter on breaking myths around how GenAI can help with data cleaning is especially timely and enlightening, and the data horror stories are scary but also painfully reflective of data issues in this day and age. Susan's writing style is wonderfully reflective of her fun and approachable personality, and I can only recommend anyone interested in creating and maintaining clean data to read this book!' 

Tiankai Feng, author of Humanizing Data Strategy and Humanizing Al Strategy

Where can I get my copy?

UK readers can buy Between the Spreadsheets from Facet and Waterstones.

US readers can head to the American Library Association and Barnes & Noble.

Readers from the rest of the world can buy a copy from Amazon.