Hybrid Data Warehousing: Best Practices for IT Integration

Hybrid data warehousing can provide immense value when managing cross-enterprise data compliance, privacy, and risk by ensuring sensitive data is housed appropriately. However, the complexity of managing data across hybrid clouds and warehouses can lead to data integration complexity.

To build best-in-class analytics that drive a business, analysts must have quick and comprehensive access to enterprise-wide data. If data is spread across multiple domains with varying levels of data governance, a substantial effort in accessing, transforming, and using that data for downstream analytics can become a lengthy (or even impossible) process. 

Benefits and Challenges of Hybrid Data Storage

Tom Traubitz, senior director for SAP, says one advantage with hybrid systems is that existing on-premises systems can continue uninterrupted without disturbing existing processes and use cases, allowing for a gradual and more orderly transition to cloud-based technologies.

“The main challenge with a hybrid approach is that it fragments the model of the data across several information stores and adds attendant complications that different stores may rely on different access and storage methodologies,” he adds.

This can be further exacerbated, since the data in different stores is not necessarily in sync (replication delays, source-storage delays, etc.), which can lead to inconsistencies in attempting to analyze data across models.

A more basic issue lies in discovering the right data assets and combining them into a consistent representation of the truth.

“Done incorrectly, a hybrid scenario can greatly complicate data access and demand new skills and technology for the user community,” Traubitz says. “Without a unified governance, the strategy for managing data security can be complicated.”

Brian Platz, CEO of Fluree, says to solve real-time integration needs across hybrid data warehouses, organizations should look to systems that promote data sharing across domains through universal data vocabularies and data-centric governance.

“Organizations must ensure data is understood in a consistent and standardized way, regardless of its source or location,” he says.

To account for authorization and access, a data-centric governance approach ensures that data is managed, protected, and compliant with regulations throughout its lifecycle, regardless of where it resides.

“A data-centric governance strategy involves building immutable policies into data products that enforce read and write access as it is created, updated, and shared,” Platz explains.

Investing in Company Culture

Dima Spivak, COO, products for StreamSets, points out that IT infrastructure strategies are as much about company culture as they are about technology. “Adopting a hybrid data warehouse approach is a great way to build trust and good will between internal IT groups and cloud providers and establishes a solid foundation for future innovation,” he says.

However, with hybrid data warehousing strategies, organizations often must make compromises. “One example we see regularly centers on the software companies use for data integration,” Spivak says. “This traditional tooling for moving and transforming data across platforms would often focus on either legacy, on-premises systems or on more modern cloud-native ones.”

He adds being mindful about picking tools that don’t “take sides” and support both patterns is a great way to bridge this gap.

“In my opinion, the financial services industry seems particularly well-suited for adopting a hybrid data warehousing approach,” he says. “As an industry, it’s always had to weigh regulatory compliance with technological innovation to stay ahead of the competition and a hybrid strategy aligns really well with this.”

It’s also important to note hybrid data warehousing has an impact on data security and compliance. “It definitely makes things more complex because the conventions and security abstractions defined for traditional on-premises systems and those in the cloud often differ,” Spivak says.

He suggests companies consider working with consultancies specializing in cloud data migration to help overcome these hurdles.

“It may sound cliche, but successfully adopting a strategy like this requires buy-in from the whole organization,” Spivak says. “Senior leadership needs to understand the rationale and value of a hybrid data warehousing strategy and central IT needs to have the technical skill to develop and drive implementation.”

He adds even members within the lines of business need to be aware of what’s happening and how it will affect them. “Without commitment and alignment across the entire company, these efforts can languish or become disruptive,” he cautions.

Building a Proper Data Catalog

Thiago Costa, senior software engineer at Backblaze, says the most important thing to account for (not only when thinking about going hybrid) is having a proper catalog.

“Well-defined and documented models are a proven way to trust the lineage of the data,” he says. “That doesn’t mean you have to solve all the problems. Documenting issues is also part of a catalog.”

He adds there are many aspects of storing data that need to be considered when moving into hybrid storage, particularly in the case of user data, sensitive data, and personal identifiable information.

“Those aspects to be considered mirror those of general application development, and the answer is the same in both cases: pragmatic wins the race,” he says. “Find the appropriate partners by consulting the right professionals when designing your hybrid data system.”

This will give organizations confidence on how the data is stored and reduce any surface for possible leaks or loss.

From his perspective, one of the best use cases for a hybrid model is financial. “In financial industries, you must store data for a given number of years — the amount of time varies depending on the geographic location — but it’s likely you won’t be reading 20-year-old data on a daily basis,” Costa says. “This is where a cold storage solution — cloud or not, you decide — would be useful and compliant.”

What to Read Next:

Structured Data Management for Discovery and Insight

Beyond the Data Warehouse: What Are the Repository Options Today?

Top 10 Data Science Tools and Technologies

  


Posted

in

by