Master Data Management Viewpoints
Welcome to the Master Data Management Blogosphere.
I'd like to begin with some thoughts on the current legacy paradigm for Master data management (MDM) and the new capabilities for Virtual Master Data Management.
Lets review what we know, identify the issues, and highlight useful information that can help all of us be more successful.
Michael Zuckerman, Chief Marketing Officer
Staging Data for MDM Using Advanced Data Virtualization
Written by Michael Zuckerman Wednesday, September 28, 2011 09:36 AM
The issue of staging data is a large one. The constant and wholesale movement of files and rows of data back and forth across the enterprise.
This is still almost 100% the domain of legacy ETL tools. Before you can get data into the MDM solution you still need to stage the data and get it into a data mart or warehouse somewhere. Data staging for business intelligence (get it into the domain ... all of it) and for data warehousing (get it into a staging area ... normalize ... find common keys to tie pieces together) is a huge issue everywhere.
We just did a new video on this topic. This video speaks to this in the context of Data Warehousing but you can see the detailed functionality enough to know this will also have significant impact in support of legacy MDM. We don't need to move files and we don't need to even move entire records (one row).
Hope this is useful for you: http://www.youtube.com/watch?v=9tJM3RTahy0
The Master Data Model
Written by Michael Zuckerman Monday, February 14, 2011 05:36 PM
The common data model, or the master data model, is the underlying model that every application must transform to (and from) in some way. This is the master data model of record. Depending on the architecture, most applications must transform data to and from this common data model. This is the main point of master data management and wrapping a reasonable scheme around the planning for the common data model is a huge challenge for any enterprise.
Also, different data will need to flow differently through the system. Customer, product and vendor master information do not really flow the same. In most organizations, customer information flows from the bottom up, from the various operational systems. The injection of elements of “truth” from sources like Acxiom™ or D&B360™ can happen at any level. But mostly the flow is bottom up within various operating entities. Product catalog information usually flows from a single source of truth. Even as it is globalized many of the underlying characteristics of the pricelist, for example, for the Japanese market is a transform of the pricelist and catalog used domestically in the U.S. Vendor information can come from multiple sources and it really depends on each organization as to how much data they try to assemble centrally.
On a very good day, the master data model is a political football on all fronts. There is tremendous wrangling to work through, especially on a global basis. The challenge is that most organizations want their data representations to define the common or master data model. The reason is obvious – they have to do little or no transform work. Data quality is an issue they think they can tackle locally.
Meanwhile, your vendors want their data representations to define the common or master data model. The words “we’ve selected SAP® as our master data element format” probably strike fear into the hearts of other vendors. When major software companies sit around and strategize about your account, there are two strategic repositories on their target list: (1) the data warehouse for all of your transactional data; and (2) the master data management repository for all of your non-transactional data. They view this as a life and death battle and they won’t give up easily.
That’s part of the reason all the big companies acquired legacy MDM companies. They know that if they can control and hold the key elements of your corporation’s critical master data they have almost bought a seat at the table nestled right between your CFO and your CIO. That’s perhaps part of the reason that one of the major companies is giving away a free copy of their MDM solution. As I will say now and later on, the cost of an open source or “free” implementation of master data management awaits millions more in professional and integration services. There is nothing free about it.
Whoa. I'm late. Save and run. I am off to have dinner with a venture capitalist friend. He wants me to review a new company he might fund. Something else "in the cloud." What's noteworthy to all of you is that investment is back on. Some funds have new dollars coming in. Its happening up and down 101. We'll all have a very good spring. And it is about time - we've all paid our dues. Anyway ... I'm off. Talk to you later this week.
Complexity Part 2 - The Saga Goes On
Written by Michael Zuckerman Sunday, February 13, 2011 03:22 PM
There is another very large area of complexity. The complexity of systems integration is quite large. There is an architectural black hole called system integration services. It is the black hole because an infinite amount of time and money can flow into ETL creation and yet very little can flow out when you want to know what it is doing. How will you look at a given ETL link later and know what was transformed and how? How will you audit this? How does an application act when you have three different ETL links connected to it, one for MDM and two others for various data integrations? Depending on what software chosen, how will you know the transformations you documented in the metadata will actually be the ones coded? How can you tell? Every link that requires interpretation or implementation again adds error and misalignment.
The mechanics of data movement is always an issue. Do you move data a file at a time (the legacy of ETL) or use metadata to move just the records that were updated? Do you have to use ETL and move files? Is this automated or manually coded? Do you take the schema information and code it all? Do you have to build out complex ETL, EII or basic Data Virtualization links? What comes from your MDM vendor and what integrations tools do you need to buy separately? How will you make sense of this when you have dozens of ETL links for your MDM project and need to figure out how to manage and maintain this later?
Existing integrations abound – how will you understand and manage the impact of the new integration and transformation points to the huge array of existing data integration points between various existing operational systems, the data warehouse, data marts in place today? Is the line between non-transactional data and transactional data that clean and crisp? What about existing business process transformations coded into enterprise application integration connections? What about the multitude of existing ETL and EII links? Data is already moving between systems and being aligned today on a point solution basis. You need to take this into consideration on a system by system basis. Connecting multiple ETL links to one application system starts to raise more questions about compliance and data integrity. Do you have visibility to all of this data transformation? How do you manage all of this? What is the impact?
The number and complexity of enterprise-wide integration points has increased by more than an order of magnitude yet the basic tools such as ETL and EII have not change much in 10 to 15 years. Remember that extract/transform/load (ETL) and even the reborn enterprise information integration (EII) variants (federated views of integration) are really single use “pipes.” You can have one or two sources on the other end. Think of it this way, you have to connect 20 buildings together and ETL really requires that you run 19 sets of “pipe” to each and every other house. Every ETL connection is really application specific just for that implementation.
Scale for basic ETL links or various master data management hub configurations take on complexity of large proportion as connectivity scales. If you grow the number of “pipes”, your master data management implementation becomes unwieldy and perhaps unreliable. Instead of building your own hub and spoke points, you can plug these building Virtual Data Manager™ blocks together, as you need them. It all works automatically to harmonize your data. All of your integrations, transforms and so much more is visible and easily managed.
You also need to understand the specific and detailed impact of every transform, the lineage of these transforms, the current operation and be able to audit these later. This is an area where advanced data virtualization and Virtual Master Data Management™ will have significant and positive impact. Virtual Master Data Management™ will bring every transform, every linkage, every piece of every data dictionary and semantic content together, visible from top to bottom of the enterprise by authorized data architects and governance teams. This can really be helpful to your organization in a variety of ways.
Complexity of integration is also impacted by your commercial software vendors. Vendors change their schemas for their internal data structures all the time. When this happens, how will you handle it? How will you even know about it? Will it impact your architectural decisions?
In the final analysis, all of this complexity, unmanaged, creates risk and increases expense for you. I'm just trying to get you to ask your legacy vendors the right questions. At the end of that road, I think you'll appreciate the potential benefits of Virtual Master Data Management and consider it as a potential solution for your project.
MDM and Complexity
Written by Michael Zuckerman Monday, February 07, 2011 05:23 AM
Complexity is one the largest areas to review because there are so many areas of impact and so many issues to consider. Complexity starts with the planning of the entire project effort and continues, given the current paradigm, for the years of maintenance and support required to sustain the overall master data management efforts. The architectural considerations, implementation model and functional objectives for the project require tremendous planning and painstaking execution.
Let us just take a 10,000 foot view of some of the areas of complexity.
Complexity of Scope. What is the scope of the project – what area of data management are you trying to align and manage? What are you going to do with the data, exactly, and how will you do it? The data could include product catalog data, customer information or both. This can also be segmented by geography given the situation of the company and the nature of the customer base. Sometimes product catalogs are strictly local, not global. This can also apply to the customer depending on the markets you serve. You can use the system to produce operational alignment, also known as synchronization (data harmonization™) between various systems. The process of implementation should reduce duplicate entries and improve data quality. There is return on investment in all of this. But then what will you do with this physical data? Will you use the data for business intelligence type queries? How complex will your queries be? Will you want to query real-time operational data or older data (daily, weekly, monthly) and to what end?
Complexity of Timeline. What is your timeline? Remember the “Agile” principle. The longer a project, the far more likely it will fail, delay, miss, etc. Remember that the current paradigm for master data management cannot be isolated from other systems and software easily. You are depending on other people in other organizations to get various portions done, many of which are quite complex as well. The political considerations and necessary alignment may take up to 50% or more of your project timeline. The completion of the project is in the eyes of the beholder. The great majority of your company will view the effort as incomplete until all of the operational systems are synchronizing with this master data on a highly frequent near-time or real-time basis.
Complexity of Architecture. What is your chosen system architecture? There are a lot of different architectures in use for the current MDM paradigm. Part of your architectural choices depend heavily on your model for synchronization – if, when and how improved (scrubbed, cleaned, enhanced, etc.) data is returned to operational systems and with what frequency. Most vendors cannot even agree on the few major approaches. I just did a quick audit of the perspective that many major vendors publish and they do not align.
Are you going to have one architecture for product master data, another for customers and then something else for all of your other data integration? Are you going to have one repository for master customer data from European operations and another from the U.S.? Or one that works for both? How will that effect architectural decisions? Are you going to have a repository architecture? A registry? Perhaps an analytical, hybrid, transactional or something else? There are a lot of varying proprietary approaches to consider.
Some architectures require direct integrations with all the applications such that they write data directly into an “MDM central hub” of some kind and then share and use this data real-time for operational purposes. Others write to the MDM system for later access by BI applications. Others write and retrieve from the MDM system but do not use it for real-time transaction support. Regulatory issues can come into play for life sciences (healthcare), law enforcement and other areas. So what do you do?
Virtual master data management, a very new architecture, is a powerful alternative in many ways, to other architectural frameworks. It allows any team, anywhere to begin aligning data in operational systems and deciding on the local elements of data that should propagate and be integrated between systems. For example, it allows any data integration in the organization, anywhere, from a single point integration to be easily integrated into the enterprise view of master data alignment.
Virtual master data management™ is a networked architecture and can easily scale cost effectively from the largest enterprise in the world to the very smallest. The building blocks of Virtual MDM™ are components of data integration, data quality, data governance, data compliance and data performance (dashboards and administration). You can use some or all of this to execute locally but roll up governance globally. It all works together. Virtual master data management brings a high degree of automation with it. You can substantially invert the mix between technology and manual process so that automation of software process, not repeated manual processes, drive your implementation model. You reach your return on investment sooner, with less error and far less risk.
Most interesting about the architecture that Virtual MDM™ brings is the extreme flexibility. You can make a decision to align and integrate your master data elements in your division, line of business, or operating entity using virtual master data management and yet integrate easily into most other architectures for legacy MDM if the corporate entity decides to go in that direction. You also stand to hold most of the local return on investment associated with master data alignment and enhanced data quality even if the global project uses legacy MDM technology and fails. Alternately, if you are managing a global implementation and struggling with these issues you can decide to use Virtual MDM™ in some of your larger operating entities where legacy integration is troublesome. That will help you get the project back on track and successful.
Data Discovery Goes On
Written by Michael Zuckerman Thursday, February 03, 2011 07:14 AM
Now you have your list of applications, some of the existing transform points and data structures. Data discovery requires that you make sense of all of this and store this in an orderly fashion. So now this is entered into some form of document, or metadata store. The good data is entered with the errors, some already in the data and some introduced during the data entry.
You have also been working on your data dictionary. Your internal business terms and how they are applied, hopefully in a consistent way already, across your organization. It would be nice if these terms for customer, product and so forth readily correspond to the various data elements within your major applications. But of course, they do not. That’s part of the challenge. The data dictionary is another piece of manual, semi-automated or automated yet disconnected application. The right goal is to have one place to automatically discover the full structure of data, understand the best semantic fits to line various data elements up, and bring in the data dictionary components into this automated system, in just the right places.
The mix of your applications will require compatibility with data structures that could be relational, object oriented, xml or something else. Most of your legacy on-premise applications use relational databases structures. Much of the public cloud stores object data. How will you bridge this divide?
Many of your new and commercial cloud vendor applications, like Salesforce.com, use objects. How will you work with these different data structures? Can you see various data structures from this mix side by side so you can identify and align the elements of truth?
Even when you get Discovery right, you need to realize that the underlying data structures continue to evolve. Data integration goes on across the enterprise every day. Point connections are being made and key data elements are migrating across applications and into new forms. So you end up between the proverbial rock and the hard place, first with an unwieldy but improving store of metadata information that you continue to iterate and update manually, and second with an ongoing stream of data integrations and data structure changes that you cannot keep up with that constantly change this information.
The underlying data structures, even when they are documented and handed to you, do not always make it easy. I recall looking at an SAP® data table and noting the field named “strasse.” I had to see the underlying data to know that this was “address.” Now that is just a language barrier. Of course, the German programmers will program in German. But 97% of the pharmaceutical companies in the United States have SAP® implementations and they need extreme precision in their data integration. Life sciences compliance absolutely demands it and your industry is likely the same. How many of your in-house applications have cryptic entity field and column names? I think you understand. Right there, in the view to that data source, I needed to rename that entity field/column so that I and other team members could make sense of it later. Or add the elements of a data dictionary right there next to it. You need the tools to be able to do that.
Even with the legacy paradigm, there should be some real automation to help with data discovery. Perhaps you use a tool to organize metadata. Perhaps you use another tool for your data dictionary. The tool is automated but is the data flowing in manually or automatically from the source systems?
In the perfect world every drop of effort you put into data discovery should flow into self-sustaining automation. Metadata from all applications, everywhere, should be automatically catalogued. Data dictionary information should be added to these automated repositories and automatically tagged to the metadata. Semantic tools should exist that automatically help you understand the “alignment” between elements of potential master data or master data flow. Architects and data governance teams need to be able to browse the actual data catalogs to confirm semantic content for given structures.
Respectfully submitted, all of this needs to work. I've received many emails with questions about our technology - try these white papers on for size:
6.23.2011 - IdealNet - Analysis of Data Integration Technologies
12.20.1010 - White Paper Thinking of Master Data Management? Don't Do It! Find Out Why?
11.22.2010 - White Paper Advanced Data Virtualization - Function, Capability, Benefits
10.2010 - White Paper 2010 Report - Application and Data Integration for Cloud and On-Premise Applications - Tools and Technologies
10.20.2010 - White Paper Overview of Advanced Data Virtualization - Differentiation Verses Basic Data Virtualization
More Articles...
Page 1 of 2

