Financial information systems combine data to offer consolidated analysis to provide data according to the needs of a diverse level of individuals.
With the explosive increase in data volumes that came with the advent of big data, also came the need for analysis not only RDBMS (Relational Data Base Management System) standard data but also atypical data such as log data and audio data.
The open source technologies such as Hadoop have matured to decrease concerns about open source technology and have been developed to be used through entire companies in order to overcome the limitations (high-priced systems, software licenses of existing information systems, expansion of performance etc) of existing information systems.
According to the global research group, Gartner, big data can be implemented in increasing marketing and business, improving management and finance, innovating new products and service, information sales and reducing risk and fraud.
The are many examples of the success of big data in various industries and we can see them in global finance through the press, seminars and in big data vender promotional material. There are a few examples of big data at work in Korea as well but on close examination, the scope of use is limited and the data is incomplete or below standards for satisfying the requirements of company resources.
HIA (Hybrid Information Architecture) is a system architecture that supports environments to sync and balance existing big data and information systems in order to structure big data systems.
In 2014, Gartner forecasted that logical DW (Data Warehouse) architecture will expand with HIA and the market will continue to grow.
Architecture systems such as BDA (Big Data Appliance) from Oracle, Analytic Platform System from Microsoft and Unified Data Warehouse from Teradata all support HIA. HI based systems are under construction or being developed in certain sectors of the finance world.
Let’s take a look at some big data trends in the world of Korean finance.
In CRM (Customer Relationship Management), VOC (Voice of Customer) systems or subdivided customer trade data analysis systems are constructed through machine learning, customers are subdivided according to the size and scale of banks to provide appropriate suggestions for financial services, and all log data is analyzed to develop integrated security systems.
In the insurance industry, companies are implementing machine learning and statistical methods to improve existing FDS (Fraud Detection System). In the infrastructure industry, companies are introducing the Hadoop platform to accrue atypical data that was not previous possible to accrue for future use.
Currently, companies are transitioning from DW and BI to HIA relying heavily on existing information systems and they are at the verification stage testing the functionality, capability and security by applying the Pilot form in a Hadoop environment.
HIA combines DW and Hadoop platforms and includes data sourcing, data collection, data storage and data analysis.
The data flow indicated by the solid lines is high-volume data that is transferred typically at night and the data flow indicated by the dotted lines is query or reference data.
A, B, C, D, E in the diagram mean the following:
- ETL tools, Sqoop, Flume data collection
- Interface between the DW Platform and Hadoop Platform
- Archiving DW Platform data into the Hadoop Platform
- Data virtualization. Method of archiving virtual data in one place like DB data
- Data analysis through OLAP, data visualization and reporting tools
If we look closely at each element, the structured data on existing operating systems (operational, term, account, work load etc) is collected on DW platforms with ETL (Extraction Transformation Load). Collecting data on a Hadoop platform is done with Hadoop Eco System tools (Sqoop, Flume etc). ETL also offers methods for collecting non-structured data.
The non-structured data necessary in a DW platform is used in analysis by summarized in analytical standards or the distributed parallel processing results from a Hadoop environment are processed in a DW platform for analysis. DW platform data may also be required in a Hadoop platform.
Master data such as organizations, customers and products are organized in the DW platform and this data can be used manage data on the Hadoop platform using detailed level data from the operating system.
Also, since there is a large volume of data, low-use data is archived on the backup system or ILM (Information Lifecycle Management) system. In this case, the Hadoop platform can be used as a DW platform archiving system.
Data virtualization makes it possible to query required data from the DW platform or Hadoop platform for analysis without processing physical data. However, when it is necessary to manage large volumes of data or when processing non-standardized data, other processes may be required.
An analytics platform uses tools such as OLAP (On-Line Analytical Process), data visualization, EIS (Executive Information System), MIS (Management Information System) and reporting to support data analysis.
Let’s look at how HIA can be implemented in finance through some examples in the production and communications industries. There are many examples, but let’s discuss 2 examples cited by Gartner.
The first example is the Hadoop system archiving of career history data managed on tape. Because it is expensive to store high-volume processing data on RDBMS, when data is backed up to tape after 3 months, the data can then be restored from tape later.
However, it can take up to 10 days to restore and analyze data from tapes and this makes it difficult to process that data in a timely manner. Once this data has been uploaded to a Hadoop system, the data can then be queried within 1 day with no loss of quality. This process makes the workflow and analysis more efficient.
Let’s take a look at an example of batching accounting using high-volume data in a Hadoop environment in telecommunications. CDR (Call Data Record) is data that comes from audio telephone calls and text messages. CDR is used to calculate fees based on call times and number of text messages.
On existing DW systems, account calculation batching programs are used. This method can take a long time and utilize memory and CPU resources, which effects other operations.
In the batching process, only required data is processed by Hadoop and this significantly reduces batch analysis times. As subordinate CDR batch operations are no longer done on the existing DW system, the DW system resources are free up to reduce batch times and improve the efficiency of the server.
Let’s now look at how HIA can be implemented in the finance industry based on the examples above. Storing data on existing information systems can be a burden and storing this data on a common RDBMS can be expensive. Implementing a Hadoop system greatly reduces the cost by establishing a query environment.
All financial systems have a ILM (Information Lifecycle Management) system. Even if when implementing a low-cost disk infrastructure ILM software, it is still more expensive than implementing a Hadoop system. So some of the ILM workload is being processed using a Hadoop system instead.
If we look at an overnight batching program, complex high-volume data is needed for the critical path and this has an effect on the entire batching process.
Batching times can be reduced by transferring the batch process to a Hadoop environment and managing only the resulting data on a DW system. (ex. List system customer credit evaluation, customer grading on a CRM, fraud detection logic on a fraud detection system, etc)
Also, it is possible to add customer log data (web logs, mobile logs, click stream etc), which was difficult to use in the past, and have all customer data in order to increase marketing and other business operations. So, customer purchasing pattern record data can be analyzed through machine learning and services can be recommended more accurately.
Some credit card companies used a pilot system to test the possibility of providing more refined service recommendations. Also, similar patterns concerning fraud can be analyzed to improve the efficiency of insurance FDS (Fraud Detection System). HIA can also be implemented in the following fields.
HIA is expected to have a an influence on diversification, reducing operational costs, accuracy and processing speeds.
If we look at the potential effects of HIA, the big data based infrastructure, Hadoop, can be connected with x86 when expansion to x86 Linux is required. So, this reduces the burden of expanding to a Unix based system and since there are many OSS (Open Source Software) available, software license costs can also be reduced.
A Hadoop system uses a scale-out method. In order to improve the common server capabilities, server memory can be expanded, CPUs can be added or a more powerful CPU can be installed. Improving performance in this way is called scale-up.
Scale-up can be limited by factors such as limited memory space available in an older CPU or a CPU model being discontinued. Scale-out allows the capabilities of a CPU system to be improved by implementing x86 servers in parallel. So, when high-level processing is required, x86 servers can be added to achieve the processing capability needed.
Hadoop systems are structured to be managed with parallel systems. When handling high levels of data, batch times can be reduced by implementing Hadoop parallel processing. For example, when a program is used to evaluate customer credit ratings through a list scoring calculation program or CRM on a list management system, processing must be done using all customer data records and a considerable amount of subordinate factors occur. A Hadoop system can manage these processes in a shorter amount of time.
The purpose of big data analysis is to analyze non-structured data that could previously be analyzed. With an inexpensive Hadoop infrastructure, non-structured data can be accrued and the existing DW system data can be analyzed. Data can also be used more efficiently.
A USD 1M pilot project is currently underway to verify the roles and functionality of big data in finance. As big data is being implemented in real-life scenarios, an increased number of applications are being discovered.
As Hadoop related technology is sweeping away concerns about open source software, some companies are developing spark technology for memory to improve security, resource management and solubility. IT developers are contributing to expanding the Hadoop environment by developing SQL on Hadoop to allow SQL to be used in a Hadoop system.
Examples of HIA based systems are expected to increase over the next few years. The era of big data is not far away!
Written by Hyundo Kim, LG CNS