Cloudera: How an EDC might remedy your data headache
CIOs have a lot of plates to spin. From actioning digital transformations to delivering unbeatable customer experiences, the pressure is on them to spearhead initiatives that can bolster a company's position in competitive markets. Across the numerous and varied projects happening at once, there is one common denominator underpinning them all: data.
Data is regarded as many things: the new oil, the new gold, and a lifeline for businesses, among other metaphors. However, if there's one thing data is not, it's simple. Despite being abundant in it, businesses will know too well the struggle of actioning data and putting it at the centre of their initiatives.
The usual challenges include the following:
- Data silos. Businesses today are hoarding data across all sorts of disparate systems, leaving them no option but to run analytic workloads independently. In turn, analytics becomes more complicated, and businesses are left unable to see the bigger picture.
- What's more, companies with siloed data systems are often unable to be consistent in their security and governance, simply because there are too many systems to keep track of. Maintaining security and governance on each of these also guzzles time, resources, and money.
- The ubiquity of data. Data is generated in various locations, whether that be on-premise data centres, public or private cloud, or at the edge. Its dispersed nature necessitates extra steps for management and collation, subtracting from the higher-value tasks that data scientists and engineers would ideally be working on.
- Traditional data analytics tools. Organisations today have outgrown traditional tools, and even some of the more modern, cloud-driven offerings available too. Commercial software forces data to exist in proprietary formats and could also reinforce data silos. Furthermore, traditional software vendors don't always rate highly in terms of speed and agility.
The trials and tribulations of data are many, but the desire to become more data and insights-driven is unwavering. To support them in their journey, businesses need an effective cloud strategy that can address the aforementioned challenges, and then some. A relatively new but highly promising way to do so is to leverage an enterprise data cloud (EDC).
In a recent podcast with EM360, Paul Mackay, EMEA Cloud Lead at Cloudera, outlined the four key principles that EDC is based on. Curious to find out more, we asked Paul to delve deeper into these four tenets and why they are important to an effective cloud strategy.
Flexibility of choice
“The whole point of an EDC is to give you a platform that allows you the flexibility of choice over where you want to put your data, workloads, and analytics, without having to play by all the rules you would associate with a multi-cloud environment.” Paul explains.
Therefore, EDC gives businesses the ability to run their data in a place that's suitable for them. In particular, companies can analyse and experiment in any cloud environment, whether that's within on-premise data centres, public or private cloud, multiple public clouds, or multiples of those in a hybrid.
Until now, application placement has been somewhat complicated. Businesses will each have their own considerations to take into account, but more typical deliberations include financial implications and the usual security and governance caveats. However, they may also need to take data gravity into account so as to not decouple compute and storage between applications and the data services that go alongside it.
However, these decisions are likely to change over time. Therefore, organisations need to be able to amend these decisions moving forwards, which is exactly what an EDC (specifically its hybrid capabilities) allows.
What's more, EDC tilts the scales so that businesses are in control. In particular, it is consistent in functionality and data management capabilities on and off-premise. Thus, EDC gives companies the support and flexibility to decide where to put their applications and workloads in a way that's easy and cost-effective.
End-to-end data lifecycles
Another notable EDC capability is end-to-end lifecycle management. In particular, companies can perform various activities all within the same platform and framework. This includes the collecting and ingesting of data, as well as carrying out analysis and deriving value from it. Once they are confident to do so, organisations can also use an EDC to plan for machine learning (ML) and artificial intelligence (AI)-driven predictive models.
For many businesses, harnessing AI is the ultimate end goal. By leveraging AI, ML, and predictive modelling, companies can analyse an even wider range of data and derive more value from it than ever before. With the support and functionality of an EDC, this end goal becomes much more tangible.
Security and governance
Security and governance measures must always underpin any activity a business does with its data. The consequences of dropping the ball can be detrimental to a company, costing them not only in fines but also in reputation.
Simultaneously, regulations are becoming increasingly stringent all over the world as newer regional data privacy laws are rolled out. In turn, security and governance have become a top concern for organisations, heightening every time a data scandal is featured on the news. However, peace of mind is possible, and it comes in the form of an EDC, which promises consistent security and governance across the data lifecycle.
Firstly, EDC can minimise the security risks associated with data silos. In the absence of a common platform, organisations end up building silos when they input datasets into different locations. When the time for an audit comes, trying to prove the lineage of the data, as well as who has touched it and what for, is frankly a hopeless case. Each silo will require a different skillset or toolset to investigate it, becoming, as Paul puts it, “an absolute nightmare.”
EDC also takes the fear out of audits by putting an operating layer over the datasets in those various locations. In doing so, it provides businesses with a single view of data that they then have the luxury of setting consistent security and governance policies for.
“This gives the CEO or CIO the capability to fully understand where their data is, who's touched it, who has access to it, and what it's been used for.” Paul explains. “That plays hugely into not only compliance and auditing, but if you think about rules such as GDPR and not being able to provide that in terms of the fines you can be given or the impact on your business, there are even bigger ramifications.”
Embracing open source
In the podcast with EM360, Paul divided openness into two meanings, outlining that an EDC should “[one,] be based on open-source technology because the rate and pace of innovation of open source is second to none. Two, it should be open and available to integrate into your other systems.”
Starting with the former, the beauty of open source is the constant innovation it enables. With so many communities contributing to it, the product is always evolving for the better. Then of course, with so many eyes on the product, problems are identified and resolved much more quickly.
Regarding the latter, having an open platform enables the functionality of connecting your tools and services that exist within your environment. Paul describes how having open APIs enables you to connect to higher-level management tools or lower-level systems, allowing the platform to fully integrate into existing IT operations.
Cloudera Data Platform
EDC sounds like quite the data engineer's dream, but thanks to Cloudera, it can very much become their reality. The Cloudera Data Platform (CDP) is the industry's first ever EDC, and comes with a host of unique capabilities that businesses will want to take advantage of.
CDP provides a single platform that allows businesses to deliver data and analytics in the form of private cloud, AWS, and Azure (Cloudera also recently announced a strategic partnership with Google Cloud Platform to bring CDP to mutual customers, poised to become available later in the year).
“We provide a software layer that abstracts you from those locations and all the tools and skills that you need to [deliver data and analytics].” Paul tells us. “If I'm spinning up a data warehouse on my private cloud or in AWS, the process to do so is exactly the same. This is because the software goes underneath and talks to either the AWS compute or the hardware and infrastructure within your data centre to spin up a service.”
As a fully automated and orchestrated offering, organisations can skip the hassle of building a data warehouse. “Instantly, you're saving the resources, time, people, and money that you need to deliver a platform.”
Optimised for hybrid and multi-cloud, CDP provides powerful self-service analytics in these environments, while delivering granular security and governance that span the entire lifecycle of the data. With only one set of data shared out within each environment, users can set security and governance that defines who has access to the data, where it's been, and what it's being used for, keeping cybersecurity threats at bay while ensuring preparedness for an audit.
Paul outlines that users only need to set these policies once, and then whatever functionality they choose to deliver on top of that (ingestion, ML, etc) will inherit those same settings. What makes CDP so special and unique is that these functionalities can also be delivered at any point in the data lifecycle, with watertight security and governance integrated into them at all times.
Better still, security and governance is not the only burden that CDP alleviates. Its open nature means that users will not be subjected to the dreaded vendor lock-in too.
Revenue opportunities with CDP
CDP's capabilities certainly make a very compelling case for it. As Paul put it, it's like the “easy button for data analytics,” and quickly gives businesses the peace of mind they've been seeking. However, while its features speak for itself, the burning question is always this: what new revenue opportunities does CDP create?
“CDP allows you to monetise and start delivering value faster,” Paul tells us. As a fully automated and orchestrated platform, businesses can quickly deliver and get started with CDP. Data scientists and engineers can instantly deliver its features and functionalities, putting businesses on a fast track to reaping value.
Paul also explains that CIOs need to ensure their IT professionals (particularly data analytics and infrastructure teams) are able to focus their efforts on higher-value tasks instead of, for example, building and patching platforms. In doing so, “IT starts to deliver real value back to the business rather than being a cost centre.” With inherent automation and orchestration, CDP takes care of the grunt work so that your IT teams don't have to, acting as a shortcut to value and innovation.
After speaking with Paul about CDP, our immediate thoughts were that it really is a no-brainer. If you'd like to find out more about CDP, then head over to the Cloudera website. Otherwise, be sure to tune into the latest episode in EM360's podcast series with Cloudera: Cloud Technology with Jan Kunigk.