Data is a trending technology in this age. Various organizations are slowly
realizing the true potential of evaluating various unused stored data and that
by utilizing the analysis they can get ahead in the competition. Among the
other types of Big Data Analytics, Real-Time Analytics is certainly the fastest
of them all as data is analyzed within a fraction of a second. Organizations
such as IBM in collaboration with Kinetics are researching and developing new
systems that would take less time to analyze huge amount of data than
Table of Contents
of Real-Time Data Analytics. 4
of Real-Time Data Analytics. 5
Research and Development in Real-Time Data Analytics. 7
Big Data is a huge collection of data sets that is very complex to
analyse through traditional data processing and analysis tools and methods (John 2014).
Organizations are applying big data to get accurate market information
pertaining to their market competition, share market analysis, games and maps
that handles data in real-time and many more (Kelly et al. 2014). The purpose of this report is to understand the use
of real-time data analytics in Big Data processing. This report elaborates on
stream data analytics that is used for the analysis of real-time data and the
current research and development in this type of analytics.
Collecting data and analysing it is rising exponentially. Organizations
have been collecting vast amount of data every second but only recently that
they have understood the true value of the data that they have collected and
how it can be used to boost their business approach. All the stored data needs
to be effectively mined that will help predict the future (Larose 2014.).
Real-time data analytics thus transforms a company’s reactive problem handling
approach to automated real-time learning environments.
Utilization of Real-Time Data Analytics
Real-time data analytics processes data as soon as it enters the system
(Hatley and Pirbhai 2013). The data is processes and a feedback is given to the
user in milliseconds. Real-time Interactive Analytics is used to support
queries that are interactive in nature. The information is indexed for quick
access and thus response to such queries becomes fast. Tools like Apache drill,
Druid, VoltDB and SAP Hana stores all the indexed data in memory to make the
process very fast.
Real-time data analytics is also used where the queries are fixes or
static but the solution needs to be given in real-time that is within
milliseconds. Such instances include online game servers where multiple users
are interacting with each other in real-time. The queries are thus fixed as the
players are in a repetitive environment.
Some of the stream processing systems include Apache Storm and Apache
Samza. Real-time football analytics is an example of stream processing
analytics (Stensland et al. 2014).
Stream processing refers to the style of data inflow. Here small quantities of
data are recorded and processed in the system every millisecond over the
duration of the football match. This helps in analysing the match and keep a
track on the match and its players at the same time.
Applications of Real-Time Data Analytics
The applications of real-time data analytics is growing every day. Other
common applications are:
· Customer Relations Management (CRM)
is one of the primary sectors where this analysis is used. Real-time data
analytics can be used in CRM to provide ‘up to the last minute’ updated
information about a customer to an enterprise (Khodakarami
and Chan 2014). With
good infrastructure, analysis can be provided on a shared information within
seconds of a customer interaction.
· Corporate dashboards in another
place where real-time analytics can be used to display the most updated
information that reflects the changes to their business at day’s end.
· A data warehouse is combination of
hardware and software resources that are specifically designed to process data
(Kimball and Ross 2013). Using a data warehouse, real-time data
analytics is able to provide support for analysis and processing of queries
that are ad hoc and unpredictable.
· Analysis of scientific data can also
be performed through this form of analytics. For example, data can be collected
on the path, wind field and the intensity of a hurricane and then this data can
be used to predict the hurricane’s movement in advance.
Data Storage Infrastructures
There are two big data
storage options, Premise Storage and Cloud Storage. A comparison between the
two is provided as follows:
On Premise Storage
Distributed File System (HDFS) is primarily used for on premise big data storage (Hildmann and
· The first advantage to using
HDFS is data can be stored in heterogeneous types of storage that contains
some spinning disks and some SSD type of storage (Song, Park and
· These storage devices can be
either independent or attached .
second advantage is that the end-to-end encryption is transparent. This means
that the speed of storing and retrieving data is high.
disadvantage to using HDFS would be that it works best for storing and
processing of data in a single data centre. WAN connection is absent and thus
data is neither globally stored nor accessed.
· A huge
problem occurs in data recovery in case of a disaster in the data centre.
This disadvantage makes the use of on premise storage inefficient and prone
The best options for storing data in the cloud are using popular
Object Stores such as S3, Google Cloud Storage and Azure WSDL/ADLS (Jamshidi et al. 2015).
The advantage to using cloud storage is that data can be stored and
analysed from any place.
There are various capabilities of Object Stores that can be utilized
to get the optimum result for different situations. S3 Standard, S3 Reduced
Redundancy Storage, S3 Standard Infrequent Access and Glacier. Each of these
serve different and the storage option should be chosen depending on the
On disadvantage of Cloud Object Stores would be the inconsistence of
objects. The user needs to special care to ensure that they have very
predictable data pipeline while storing data in the cloud storage system.
Operating across multiple data centres can cause inconsistencies in
Current Research and Development in Real-Time Data Analytics
There has been various research and development in the field of data
IBM and Kinetica
The most notable one is that IBM and Kinetica developed a proactive GPU
accelerated database that runs on OpenPOWER LC servers of IBM (Hater et al. 2016). The in-database analytics
of Kinetica uses Artificial Intelligence and Business Intelligence workloads
that is applied on a single database platform. Kinetics has developed faster
analytics by utilizing Graphics Processing Unit (GPU). This enables it to
handle huge dataset containing of multi-billion rows, in milliseconds. The
indexed database is provided in the memory with location-based analytics that
provides Natural language processing, database operations that are
GPU-accelerated and Native Geographic Information System and Internet Protocol
address object support. The system features deep integration with leading open
source frameworks such as Apache Spark, Hadoop, Accumulo, H2O Wireless and
Nifi. The system claims to be hundred times faster than any traditional or legacy in-memory databases. Thus, advanced
analytics can be calculated in a period of less than a second at a fraction of
the cost. IBM’s OpenPOWER LC servers and Kinetica provides solutions for
real-time problems, manages the workload level efficiency and the scalability
factor at the data centre.
PlanetSense is a real-time streaming platform with spatial-temporal
analytics for collecting geo-spatial information from various open sources of
data (Thakur et al. 2015). The
platform consists of four main components:
· GeoData Cloud is a data architecture that serves the purpose of
storing as well as managing different datasets.
· Built-in mechanism for real-time
· A superb data analytics framework.
· The data can be presented and
visualized through the web interface and Representational State Transfer that is also
known as RESTful services.
Thus, it can be concluded that big
data is a revolutionary technology in the field of information and data
analysis Real-time analytics serves a huge field of applications ranging from
games to customer support. Natural disasters like hurricanes and typhoons can
be predicted and thus millions of lives can be saved. Organizations are
spending billions in the research and development Big Data Analytics to get the
lead in the market.
Hater, T., Anlauf, B., Baumeister, P., Bühler,
M., Kraus, J. and Pleiter, D., 2016, June. Exploring Energy Efficiency for
GPU-Accelerated POWER Servers. In International Conference on High
Performance Computing (pp. 207-227). Springer International Publishing.
Hatley, D. and Pirbhai, I., 2013. Strategies
for real-time system specification. Addison-Wesley.
T. and Kao, O., 2014, June. Deploying and extending on-premise cloud storage
based on ownCloud. In Distributed Computing Systems Workshops (ICDCSW), 2014
IEEE 34th International Conference on (pp. 76-81). IEEE.
P., Pahl, C., Chinenyeze, S. and Liu, X., 2015. Cloud migration patterns: a
multi-cloud service architecture perspective. In Service-Oriented
Computing-ICSOC 2014 Workshops (pp. 6-19). Springer, Cham.
Walker, S., 2014. Big data: A revolution that will transform how we live, work,
Kelly, M.F., Kelly, B.M., Petermeier, N.B.,
Kroeckel, J.G. and Link, J.E., Agincourt Gaming, Llc, 2014. Method for
providing games over a wide area network. U.S. Patent 8,821,258.
F. and Chan, Y.E., 2014. Exploring the role of customer relationship management
(CRM) systems in customer knowledge creation. Information & Management,
R. and Ross, M., 2013. The data warehouse toolkit: The definitive guide to
dimensional modeling. John Wiley & Sons.
Larose, D.T., 2014. Discovering knowledge in
data: an introduction to data mining. John Wiley & Sons.
Song, S.S., Park, S.H. and Jeong, K.H., Samsung
Electronics Co., Ltd., 2016. Solid-state drive. U.S. Patent Application
Stensland, H.K., Gaddam, V.R., Tennøe, M., Helgedagsrud,
E., Næss, M., Alstad, H.K., Mortensen, A., Langseth, R., Ljødal, S., Landsverk,
Ø. and Griwodz, C., 2014. Bagadus: An integrated real-time system for soccer
analytics. ACM Transactions on Multimedia Computing, Communications, and
Applications (TOMM), 10(1s), p.14.
G.S., Bhaduri, B.L., Piburn, J.O., Sims, K.M., Stewart, R.N. and Urban, M.L.,
2015, November. PlanetSense: a real-time streaming and spatio-temporal
analytics platform for gathering geo-spatial intelligence from open source data.
In Proceedings of the 23rd SIGSPATIAL International Conference on Advances
in Geographic Information Systems (p. 11). ACM.