Features > Big Data
Data and System Integration through Big and Master Data Governance

Data Management

The IT industry has been persistently working on System Integration (SI) under the table in order to create a successful new digital business model, which emphasizes participation and sharing, while providing better services.

System Integration basically means improving mutual visibility for the system itself and its environment by having business regulations and data shared throughout a company-wide supply chain. What is it trying to integrate and why is such integration needed?

Background image with media diagrams and graphs

Data Integration

First of all, let’s think about data integration based on master data management (MDM), which is related to enterprise data management in the age of big data. This is a topic which covers a wide range of issues, but integration in IT is required on two different levels.


System and data integration in the manufacturing industry (Source, edited: marketplace.cincom.com; www.abcontrols.com; www.seguetech.com; en.wikipedia.org)

Data Integration

Data integration is designed to manage data shared and referred to among different systems and applications from a source within a company and to keep its coherent and complete state. This integration can be implemented based on data governance, which consists of big data processing and master data management.

System Integration

System integration connects diverse computing platforms, businesses (B2B), software applications (A2A) logically, physically, and functionally, so that each can be operated within the whole system as a part. This procedure includes designs using business process and enterprise application integration (EAI) architecture for better compatibility, as well as communication platforms, separate interface tools, and methods to utilize the standard data format.

The amount of data dealt with in the digital world is doubling each year and it is expected to reach 40 trillion gigabytes by 2020. This gigantic amount of data is what would be used if each human being on earth were to somehow use 5,200 gigabytes per year.

What’s also expected is that companies and organizations are going to use a lot more data than individuals. This is because big data including both already accumulated structured data and unstructured data is increasing.

The future strategy for the survival of all corporations not to mention manufacturing companies will deal with how to effectively manage and operate formless data/information supply chains consisting of data and related services, which are beyond product supply chains we are currently familiar with.


Data/Information supply chain flow chart in SMAC environment (Reference: ibmdatamag.com) ※Click image to enlarge※

Let’s take a look at the data/information supply chain flow chart. The data gathered through diverse sources of data as you see from the flow chart is saved either on Hadoop or a data warehouse according to its form and quality. It is then made into meaningful information through analysis and delivered to help make better decisions. The data/information supply chain resembles a product supply chain’s valuation stage.

Data in the digital world will play a crucial role in every digital-based organization, like blood in living organisms. If data is the blood of a digital organism, it also has to eliminate impurities but keep the needed oxygen and nutrients to be delivered throughout the body.

Analytics is in charge of this kidney-like function that filters impurities, and data governance such as master data management acts as the heart of this digital organism, which takes care of the blood’s supply and circulation.

Then what vessels in this organism would deliver blood to each and every part of the body? This is the function that can be taken care of by connecting and integrating systems and applications.

Data and

What’s needed first for IT integration among manufacturing systems is to sync structured datacurrently being managed. Then the consistent reference information should be given to various parts of the company that need it. This job can be done through the ‘single source of the truth’.

Many of you must have at some point had to buy something you already owned because your house wasn’t organized and you couldn’t seem to find it anywhere. The purpose of data management is also to avoid this kind of situation.

Systems on the business level such as ERP, CRM, PLM, and data warehouses have their own data bases. Enterprises operate business applications based on various large distributed data bases, and this is why it isn’t very easy for them to manage data as we clean house.

Even if the discordant data scattered around is transferred through API query/ETL (Extract, Transform, Load), going through this process in each data source is not only very ineffective, but also impossible to actually integrate ultimately.

There are

First we can start by keeping the integrity between data related to products, clients, and providers that are considered most important in and outside of the system.

This type of data has been defined and managed as ‘master data’ in each system. The problem is that there is too much data that call themselves masters just like we see in standardization. This is like having bad copies of the same house key all around the living room.


Various distributed/separate DBs and difficulties in integration caused by the severalty of master data (Source: beyondplm.com) ※Click image to enlarge※

The previously mentioned business systems in most companies are expected to have at least 10-20% of overlapping data. Reports also show that over 80% of enterprises are having difficulties caused by diverse low-quality sources of data stuck in separate storage spaces.

The difficulties many enterprises experience can be understood as business inefficiencies like inaccurate reports and confusion as to which data is to become the standard, not to mention false data definitions.

Let me share a case as an example concerning this confusion.


An extreme example of data sets being stored and used in different application systems (Reference: SAS) ※Click image to enlarge※

A single client’s information can be used in diverse business systems as you see from the chart above. Situations may arise, however, where the same client’s information is put in different data sets, overlapped with different values, or are even omitted. There are also times when the content in the same data field may show different values or forms.

This is because separate data bases have stored and managed different data sets as they pleased at different times. Such problems can be critical to applications like CRM, which handles key client information.

That’s why these distributed data bases recently started being integrated again with centralized storage such as in data warehouses. The purpose of master data management is to get rid of overlapping data and to keep data integrity through a unified management system.

Companies also run campaigns so that more people can hear about their products, and this involves tens of millions of emails, brochures, coupons, and advertisement mails. Supposing about 15% of them don’t reach their customers or go to the wrong address, and sometimes one customer ends up getting more than a couple of the same advertisement. It’s easy to understand why this happens after having a look at the image above.


The purpose of business-level systems was to automate the business and make cooperation between different departments and teams easier. It was mostly about organizing and managing each other’s work online to improve business efficiency, since having everyone gathered in a meeting room is not very easy once a company gets bigger. If the data isn’t coherent throughout these systems, however, they may end up gathering to talk about how to deal with these different types of data anyway.

For this reason, master data management has multiple processes and tools required for consistent definition and management of data with attribute data that doesn’t change unlike transaction or analysis data which holds information on clients, accounts, products, facilities, metadata, and reference data. A structured data set provides coherent reference information to multiple applications through product management, data quality management, client data unification, and purchase data mergence.


MDM data flow and structure (Source: www.capgemini.com; searchdatamanagement.techtarget.com)

Master data management also performs data governance by preventing incoherent data from being accumulated on multiple data bases and getting rid of unnecessary data. You can see the stratified structure of MDM and its data flow in the chart above.

MDM basically plays roles in the table below according to its level of construction.


IT companies (IBM InfoSphere, Oracle, SAP, Informatica, Tibco, SAS, etc.) that develop and operate DBMS also provide master data management solutions for clients and products. Distributed master data on multiple data bases can be integrated through the process explained below.



One thing that’s as important for a manufacturing company as their client data is coherence of their product data.

Design, development, production, marketing, and service departments perform various types of valuation activities while creating their products. The problem is that the product data they deal with during these activities can entail errors, and sometimees it doesn’t sync properly.

This happens when inaccurate data is put in during the process or wrong data is transferred among different applications. Overlapping data is another source of the problem.

The business application called PLM was recently created, which is an advanced version of PDM. The original purpose of the application was to integrate data from diverse systems (ultimately CAD/PDM vs. ERP) on the business level through a system so companies can be provided with a unified record focusing on their products. Although the application shows some positive effects, it ended up submitting to the existing ERP based on sales which ended up creating a conflicting situation.

For example, there is BOM (Bill of Material) data which is key information for the entire product master data. The structure of the product is in a tree shape, which contains the information starting with the finished product on top, specific components necessary to assemble it underneath, and then configurations and quantities are recorded and stored in a DB. This information then is provided to the internal systems for related departments, providers, and contractors.

This BOM, however, has many different types as each department has their own way to create and utilize BOM. The difference in views on BOM creates inevitable obstacles.

In other words, there are CAD, PDM and PLM which were created to suit the needs of departments in development and design, while ERP is for the staff in the production and service departments. This is why there are different agents of creation/management/utilization of BOM even though they are all working on the same product.


Incoherence in BOM data between EBOM and MBOM (Source: http://www.aras.com; http://beyondplm.com) ※Click image to enlarge※

This is basically how it works: The PLM system generates configuration BOM (cBOM) and engineering BOM (eBOM) based on CAD, then ERP modifies the manufacturing BOM (mBOM) and service BOM (sBOM). Many enterprises still communicate with each other by attaching spread sheets to emails to exchange changes in data.

Clearly the better way is to either unify BOMs to master enterprise BOM through an MDM solution or to sync them for easier management. Now that there is increasing demand for various and individualized products, it seems especially inefficient to have separated BOM master data for each department. Developing and testing a new product becomes much harder as well when it’s difficult to trace the information.

The issue we face here is data ownership. Most PLM and ERP solution vendors also have MDM solutions. Even if the third party tries to integrate the data with a different MDM solution, it is difficult to figure out which data is going to be the standard.


Form of Synchronized BOM (Source: http://plmtwine.com)

The client data can be processed using any available tool as it is external information anyway. For product data which involves different standpoints and ownerships, however, technological and semantic integration is not as easy.

An example of this can be found with Airbus, the European aircraft provider. This company makes the schematics of their planes using the CATIA CAD application. The product data utilized for schematics is brought through Enovia from Dassault Systèmes. Then they use a PLM application called Windchill from PTC to create eBOM based on the schematics. Then SAP ERP publishes mBOM for the actual production.

This sounds complicated, but a complicated process itself is not much of a problem here. What’s more problematic is when there are modifications on the data in reverse.

Companies like Siemens, Dassault, and PTC.GE Intelligent Platform developing PLM-MES solutions focusing on products chose to share mBOM that synchronizes completely with eBOM and other systems through master data management which transfers the ownership of mBOM to PLM.   SAP and Oracle on the ERP side that represent the departments that have been using the data are completely against this idea, arguing for the ERP based MDM. One example of these conflicts is with Global Synchronization Network (GDSN), a master data management tool for product data from Tibco created in order to oppose the MDM based on products.

In Korea, aside from the conflicts mentioned, there have been cases where MDM developed by LG CNS was applied to the LG Display 7 Mega Process along with MES and the supply chain management system. A Korean data management solution company called Data Streams is actively providing their MDM to the domestic market.

Today, we had a chance to think about how important master data management and integration of distributed data bases and master data are in creating better manufacturing systems. In the next posting, we will go over the limits of MDM in the age of big data and some new forms of MDM that bring big data and MDM together.

Written by Seung-yup Lee, Researcher

Post navigation

'Features > Big Data' Category Post
  • Raj Nath

    I appreciate your work on Blockchain. It’s such a wonderful read on Blockchain.Keep sharing stuffs like this. I am also educating people on similar technologies so if you are interested to know more you can watch this:-

  • IoT
  • Cloud
  • Big Data
  • Security
  • Data Center
  • e-Government
  • Transportation
  • Energy
  • Manufacturing
  • Finance