Features > Big Data
Integration That Must Be Attained in the Big Data Era


Last time we took a look at the role of MDM (Master Data Management) that combines distributed data bases and master data for better manufacturing systems. Today we will look at the limits of MDM in the big data era, new MDM methods that combine big data and MDM and we will also examine a few real-world examples.

Close up of businessman touching virtual panel with finger

The Big Data

Last time, we discussed that data ‘quality’ and the ‘relationships’ between data sets are most important for managing the unification and integration of standardized master data in basic MDM.


Management of the increasing amount of data (Source: SAS Data Reference Reorganization)

As shown in the image above, the data that companies must manage now is not limited to the standardized and semi-structured data saved in existing RDBMS (Relational Database Management Systems). This is because companies have an inflow of non-standardized data at irregular intervals from a diverse array of sources (SNS, devices).

In this process, management of data from SQL, ETL[1] etc. through existing RDBMS and data warehouses requires more complex operations with more diverse processing methods.

A new data query language is currently being added to deal with the limits of existing big data management systems. This language is Cypher, which is used in HDFS (Hadoop Data File System), NoSQL and other DB format systems as well as for management of real-time streaming through In-Memory DBs and ETL batch administration in other Hadoop formats.

3 basic characteristics of big data 3V (Source: www.tech360ng.com)

Therefore, it is not only the quality of the data accumulated in existing data warehouse that is important but developers are currently working to link existing data through improvement of various newly introduced source data. A ripple effect is anticipated in solving this issue.

For example, let’s suppose there is a certain product. The atypical data, such as SNS and consumer evaluations that help us understand the demand for the product, that is stored as transaction data (Sales amounts, POS data, VMI data etc) is linked and new product demand data will also be generated.

In the same way, various technologies and solutions are being introduced to make it possible to integrate big data for finding ways to use it in MDM as well.


Compatibility between MDM and big data (Source: www.capgemini.com)

As shown in the graphic above, MDM can be a standard for managing types of big data based on the structure and relationships of defined data entities. Also, organizing the complex resources of big data makes providing different types of big data possible. Inversely, through the collection of big data analysis, data and new situations previously not understood in MDM can be discerned.

According to a recent report, approximately 20% of companies in the US are using data bases optimized by management of existing RDBMS or even NoSQL as referenced above and the 3 basic characteristics of big data.[2] It is also expected that this percentage will continue to gradually increase. The possibility of interaction between traditional databases using API etc. is increasing.

Databases once distributed according to data applications are now stored and used via distribution through diverse data storage systems such as RDBMS, data marts, data warehouses, HDFS and NoSQL depending on the frequency of use and format. The data management supply chain runs a lot smoother due to these changes.

New Tools

Groups such as Forrester Research and Gartner have been giving a lot of attention to a new format MDM methodology. This method implements something known as ‘data graphing’.

Data graphing is a method that manages both master data and big data more intuitively, analytically and intelligently than the existing MDM methods described above. It is the basis of RDBMS and begins with creating the structure of relationship data controls the focus of data ‘relationships’. However, the 2 dimensional nature of these relationships causes there to be some limits.

The standardized 2 dimensional ‘relation matrix’ of data is changing and becoming a more multidimensional relationship structure.


In addition, the IT strategies that businesses are pursuing now require more complex platforms. This not only goes for the data itself, but multidimensional relationships are also being formed in many applications, databases, partner supplier platforms and consumer platforms.

Of course, as existing RDMBS still make up large parts of business databases, there is the advantage of a simplified method of establishing relationships. IoT (Internet of Things), which has become a hot topic as of late, ultimately connects physical objects. But what will be the structure of the network when these objects are connected. It will have a ‘net structure’ like the social network shown in the image to the left. Information will be sent and received within that structure.

Also, the MDM that was developed from management of this type of RDBMS is the basis of the administration and hierarchical relationships of master data involved in centralized databases. While each overlapping distributed relationship formed on the network can be handled in this way, it seems that this had not yet been successful.

So, in order to deal with this issue, ‘graph DB’ based on NoSQL used in social networks is implemented as a benchmark. Graph DB uses the graph theory on databases for graph structure and network data pattern matching, and includes intuitive, intelligent semantic queries.


NoSQL types (left) and a basic graph DB format right) (Source: kvaes.wordpress.com)

As shown to the right of the graphic above, the nodes and lines linking the nodes and their properties make up each of the properties of the final node. Through this process, the RDBMS hierarchal index is replaced by the relationship between the 2 adjacent nodes.

Each node can include any diverse data entity the user wishes to trace such as a person , business, item, organization, account or application regardless of the size or type. The properties carry the appropriate information related to the nodes or lines. As shown in the graphic above, this includes a range of properties such as name, type and role.

Also, the lines connected to the nodes can query the most important relationships and patterns in detail. For a more detailed example of this process, let’s look at the chart below.


Simple example of relationships that define the categories and types of master data used in businesses


Complex formation of relationships between physical businesses and IT in an hierarchal structure

Looking at the graphic above, we can visualize the structure of this system. This is a description of the existing centralized structure. The graphic shows that the relationships between connected organization and the master data are difficult to achieve using a hierarchal structure. It seems that businesses and IT are distinctly separate.

We can see that there are limitations to using MDM based on a hierarchal structure for sharing of information between a diverse set of companies. There are also various external applications and organizations that introduce non-standardized big data, so additional relationships and other unestablished factors can change over time.

While there are more complex factors involved, RDBMS are essentially connected through graph DB, queries and REST/API and existing MDM are defined through the edge involved in the data matching process. Each master data is stored on the graph DB where various angles and units are semantically categorized and the RDBMS and the graph interact through various established rules.

Data types that nodes can access from other nodes such as ‘gremlin’ and ‘cypher’ are play a role in graph queries. Multiple databases can be managed by bundling data together in graph groups or tier groups.


Graph DB using MDM (Source: Towards NoSQL Graph Based Master Data Management Systems: Building a Generic and Collaborative Solution, 2014)

What we have covered up until this point can be summed up by the image above. Currently, companies such as Pitney Bowes and Reltio had structured their MDM solution using this method and larger database companies such as IBM and Oracle are gradually moving in this direction as well.



Social graph for business networks (Source: www.gxsblogs.com)

In September of last year, an incident at a factory in China caused DRAM memory chip prices to skyrocket. Looking at the image above we can see that it is a large-scale supply chain graph of electronics manufacturers, contract manufacturers, subcontractors, OEMs and distributors that require DRAM components. It is not much different from a social network.

in each connected relationship in this type of graph, it is very important for each company to acquire this exchange of information as soon as possible. This is because the unit price of products is dependent on it. If one company quickly grasps this information before the prices skyrocket, they would be able to find another supplier and increase their stock by buying product at the existing unit price before their competitors.

Also, if there was a business included in the network, they would be able to acquire information directly from other business nodes. But thin information is actually used only internally within the other companies so it is impossible to understand the data.

So, this is a good example of a case where it is necessary for a business to acquire information in real-time concerning changing suppliers and external environments to connect with non-standardized information and make decisions on what actions to take.

Functions have been introduced into platforms such as SAP Hana to make this sort of system possible. In 2014 Gartner gave a lot of attention to a start-up called Elementum, which offers product supply network management IT services.


Smart device platform for management of supply network and data management offered by Elementum (Source: www.forbes.com;www.supplychain247.com)

While the integration of MDM and big data has been well short of perfect, Elementum is a company that now offers an application that makes the process easy to manage. The platform makes it possible to manage the supply chain in a social network format using Neo4j, which utilizes document database types such as mong DB and graph DB based on NoSQL.

Elementum are receiving support from the world’s second largest manufacturer, Flextronics. With support from Flextronics, they are able to connect to the platforms of other companies that have made with which they have signed contacts. This system is the basis for expansion through access to companies that join the network.

Through this platform, users can connect to various types of data sources and distinguish sources using URI (Uniform Resource Identifier) to collect data through diverse query types on various distributed data bases. Companies can also acquire streaming processing data in real-time from factories or systems.

This diverse set of data can then be stored through mongo DB and NoSQL and the data relationships in the graph DB can be mapped in previously defined nodes.

But Elementum is not alone. There are an endless number of fields where similar technology could be implemented. There are already more ‘overlapping graphs’ in the world today than can be counted.

However, not all entities on the network are sharing perfect information. There are small hints in the changing of entities and environments that we must use in the decision-making process to interrupt certain tendencies. It seems that even as time goes by, people will not be sharing information easier than they can now.

There are also still few examples of proper modeling for interaction between upper tier and lower tier companies in the supply chain. There are still cases of large-scale companies that have not been able to expand master data for their component supplier list. Even though there are a few companies that possess this type of data, they do not have a lot of data on their suppliers.

The reason for this is that most companies do not reveal information about their customers and suppliers easily. So, there is another restriction to the social network structure.

We have now taken a look at manufacturing information for integrating production and services as well as strategies for integrating systems. As endless information continues to flow in this big data era, it seems it is not longer possible for us to hold onto information for ourselves. Instead, we must mutually share information. It will take us time to figure out how to integrate and implement this information.

Will it be possible to properly integrate all of these elements in to this type of system? It will require a combination of open minds, sharing and wisdom.

Written by Seung-yup Lee, Researcher

[1] ETL is a term used when constructing a data warehouse used to refer to processes related to extracting data from data management systems. ETL (Extraction, Transformation and Loading) [back to the article]

[2] Source: ‘The Steadily Growing Database Market Is Increasing Enterprises’ Choices,’ Forrester Research, 2013 [back to the article]

Post navigation

'Features > Big Data' Category Post
  • Raj Nath

    I appreciate your work on Blockchain. It’s such a wonderful read on Blockchain.Keep sharing stuffs like this. I am also educating people on similar technologies so if you are interested to know more you can watch this:-

  • IoT
  • Cloud
  • Big Data
  • Security
  • Data Center
  • e-Government
  • Transportation
  • Energy
  • Manufacturing
  • Finance