Are we
still discussing Big Data or we should start reviewing and monitoring its different
tools and platforms? Also, how organization should determine the right tool to
utilize as part of its data analytics architecture?
In today’s competitive business environment,
Internet of Things devices and Information Services increasingly produce large
amount of data in disparate structures. Many Open source and commercial tools
continue to pop up to deal with the different characteristics of Big Data. As a
result, there is an abundance of tools and platforms to analyze Big Data or act
as building blocks of such. Just by reviewing open source tools, we have come
across 300 tools. That number is final after we applied strict filters like
legitimity of the source, license type, and last commitment activity. We are
not talking about Big Data anymore, we are talking about Big Tools.
To extract value from Big Data, an
organization should determine the right tool to utilize as part of its data
analytics architecture. The right tool would depend on the characteristic of
the data to be analyzed and the domain that the organization is operating
under. The organization would train its IT workforce to obtain the technical
expertise to be effective with those tools. Businesses incur costs when they
try to adopt these tools or change their existing source codes to run on newer
versions. In other words, technical debt. In the Big Tools era, there is no
standard on how these tools come together and compose a data analytics
architecture. Most of these tools are unknown to business world, some of these
tools even we didn’t know. To illustrate this, Apache Beam and Apache SAMOA are
good examples. Latest trends in the big data domain is moving towards providing
a level of abstraction to utilize popular data processing platforms. Apache
Beam implements its dataflow programming model on multiple processing platforms
like Apache Spark and Apache Flink. Apache SAMOA enables programmers to apply machine
learning algorithms on data streams. Applications developed with SAMOA can be
executed on Apache Storm and Apache Samza. Moreover, new models and tools
continue to emerge at a fast pace in Big Data domain. There is no established
method to track the newest developments particularly for the open source tools.
We are working towards developing an open
source big data analytics architecture. We are trying to keep it as simple as
possible to provide a comprehensive picture on big data analytics lifecycle.
For academia, the architecture will provide the state of the art, tools that
are missing, and tools that are mature enough to be used as part of a research.
It will also provide the method for tracking notable new open source tools
popping up in different sources. For technical people, it will help determine
the tool to use for a particular implementation. Small and medium sized
enterprises can provide services using some these tools addressing the gaps in
a bigger architecture. For an established firm trying to develop a strategy,
the architecture will provide the comprehensive picture on what fits where.
Commercial big data solution providers can also benefit from this architecture.
They will see the capability they lack and collaborate with a small sized
enterprise to provide that capability.
Mert Gokalpand Keres Kayabay are working with Mohamed Zaki
to build this architecture. We will publish a working paper soon on this topic.
No comments:
Post a Comment