Is everyone familiar with the data analysis solution of LG CNS known as SRA (Smart R Analytics)? From the name, we can tell that it is an R based analytics solution. Today we will take a look at LG CNS SRA.
Where did SRA begin? Before we discuss SRA, let’s find out what the ‘R’ stands for. People who have experience in data analysis will have heard of the R language before. For those interested in data analysis but doesn’t know where to begin, this posting will be helpful.
R was first released as a freeware package in 1996 by professors Robert Bentleman and Ross Ihaka at the statistics department of Auckland in New Zealand. R became particularly popular with statisticians. The reason for this popularity was that it offered many built-in functions necessary for data analysis such as data organization, calculations for analysis of information and graphical representation of data sets. However, R is not only popular to statisticians anymore; everyone trying to perform data analysis is now utilizing it. By looking at various survey results, we can see that the R language ranks high in popularity.
What is the main reason analysts (statisticians, engineers, scientists) like R so much? That reason is CRAN (https://cran.r-project.org). CRAN is a FTP/Web server network that stores and updates documents for various R packages (codes). With CRAN, packages for various analysis can be obtained and utilized. There are currently (March 2016) over 8,000 packages registered in CRAN.
From an analyst’s perspective, being able to freely access, modify and implement this number of packages for various uses is, of course, appealing. The packages range from statistical algorithms to specialized algorithms for very specific jobs.
Over the last 20 years, R has continued to be popular but users have a wide range of opinions on R. Most of all, R is quite difficult to learn.
From the perspective of a user, R cannot help but be difficult for those who are accustomed to using GUI analysis tools such as SAS and SPSS. The reason is due to the fact that programming must be done in order to implement the powerful R packages. Beginners cannot help but struggle in the programming environment. For example, when a beginner is first learning R, it is difficult to find packages, install them and run them without any GUI interface.
Let’s take a look at R in an industrial environment. Let’s suppose that an analyst at a company uses R to understand a problem, discover specific data, and then analyze that data to resolve the problem by creating an analysis model. Most analysis process must be repeated regularly. Data must be regularly cut, pasted, and altered. This task can be simplified and shared between multiple analysts to improve the efficiency of the work. In addition, if a certain type of analysis needs to be done regularly, it should be managed by scheduling it to run when needed. Of course, all of these jobs can be controlled by programming them within the R language, but wouldn’t there be significant costs?
How can we manage specific jobs in each field? For example, at call centers for various insurance or credit card companies, they listen to customers’ opinions and analyze those opinions to develop products. The companies can analyze the contents of the interactions with customers in text form to understand if they are selling an incomplete financial product and use the data to provide statistics on what type of product their customers want. How difficult would it be to conduct all of this analysis process in the R language alone?
LG CNS SRA is an analysis solution that maximizes the strong points of R and minimizes the weaknesses. Since the existing R language has various statistics and analysis functions and is open source, the most recent analysis algorithms can be implemented. However, the analysis of the high volume of data generated can be difficult. By improving on these types of issues, SRA has made high volume data analysis possible and increased its flexibility for connecting with outside entities and expanding. Analysis of uncommon data is also possible.
SRA can manage the workflow base through predefined analysis components for querying and managing (modification, correction, deletion) data from various data sources (DB, text, HDFS). This management process makes it easy even for beginners to use and makes it more intuitive to understand and manage. Also, if there are R language scripts that are already written, they can be added and implemented into SRA components through a SRA R-User-Defined component.
Other functions necessary for an enterprise environment such as authorization, scheduling, system resource monitoring and connecting with external API are also provided. SRA is being implemented in various fields and receiving attention from customers because of these advantages.
SRA will not stop here. SRA is expected to evolve into an analysis platform that can combine multiple analysis engines and support various R based analysis tools and languages such as Python and Julia. There are still points where SRA is lacking and it will take time to resolve these issues, but strides are still being made in SRA development.
Written by In-chul Jung, LG CNS
* Introduction to SRA