“Big Data – An Introduction” by Subu Sangameswar is exactly what the title professes the book to be. A succinct, short and simple introduction to the topical concept of Big Data. American organizational theorist, management consultant and author, Geoffrey Moore once said, “Without big data, you are blind and deaf and in the middle of a freeway.” In an era characterized by exponential leaps of technological advance and knowledge accumulation, no company would like to find itself not only blind and deaf, but in the middle of a metaphorical economic freeway. It is access to data that prevents organsations from taking the path of Moore’s perdition.
As Mr. Sangameswar reveals, in the year 2010 alone, more than 13 Exabytes of data was mined across the world. To make sense out of this number, this is over 50,000 times the data in the Library of Congress! In a shape up or ship out competitive heterogeneous world, Big Data confers competitive advantages to such an extent that organisations exploiting its worth are more likely to outperform their peers by over two times. So what exactly is this Big Data and its attendant features, scope, advantages and perils? Here are some of the salient takeaways from Mr. Sangameswar’ s work:
- Big Data encompasses data that is epitomized by scale and complexity. The term ‘Big Data is not solely restricted to describing the data element. It also envelops the tools, processes, and procedures that enable an organization to create, manage and manipulate data;
- Big Data, in its normal parlance includes traditional enterprise data, machine-generated and sensor data, and social data;
- Big Data, as to be expected, is gargantuan in size. McKinsey Global Institute estimates that data volume is growing 40% per annum and is expected to grow 44 times between 2009 and 2020;
- The three classical ‘hallmarks’ of Big Data are Volume, Velocity and Variety. Mr. Sangameswar terms these the ‘3Vs’;
- The volume of data generated globally is close to unthinkable. Bernard Marr writing for the Forbes magazine informs his awed readers that There are 2.5 quintillion bytes of data created each day at our current pace, but that pace is only accelerating with the growth of the Internet of Things (IoT). Between 2016 and 2018 alone, 90 percent of the data in the world was generated;
- Velocity refers to the rate at which the data manifests in an organization before, in turn, being processed. Great stress is emphsises on the turnaround of the data by organizations. The speed at which data is not just captured, but analysed as well, can make or mar the process of decision making;
- Variety refers to the various forms and shapes that the data arrives in. Images, videos and texts are all myriad varieties of data that lend themselves amenable for identification, filtration, analysis and evaluation;
- Realising the impact that Big Data can have upon a business, entrepreneurs and organisations have evolved Models of Big Data with a view to adapt and prosper. While some models concentrate on employing data to create new products, yet others involve brokering this information. A third model builds networks to deliver products at the relevant time and location;
- While Big Data comes with its own advantages, it would be absurd to view it solely with rose tinted glasses. Big Data has its own set of perils and pitfalls. Separating the wheat from the chaff and noise from the signal is a perennial challenge of Big Data. How does an organization effectively capture the most relevant data and deliver it to the right people at the right time?
- Rules of security and guarding against a breach of privacy is the single biggest challenge of Big Data. As Cambridge Analytica and Facebook demonstrated, data and information can wreak havoc upon even the sovereign prospects of a nation as was the case with The United States of America and the Trump election fracas;
- The primary framework of Big Data involves four distinct phases – Capture, Integrate, Analyse and Share. Capturing involves weeding out noise from signal and assimilating the relevant data. Integration involves getting the captured data ready for analysis. Analysis involves statistical analysis as well as Data Mining. Finally Sharing results in dissemination of data to the appropriate audience via the most relevant medium;
- There is a plethora of tools available in the market for facilitating Data Analysis on a gigantic scale and size. Apache Hadoop, for example, brings the ability to cheaply process humongous volumes of data. MapReduce provides a framework for writing applications that process significant amounts of both structured and unstructured data;
- Other notable tools that one can avail are HBase, Hadoop Distributed File System (“HDFS”), Hive, Apache Pig, Zookeeper, and Oozie
To use the now abused cliché which is quotes with a frequency bordering on the irritating, “Data is the new Oil.” Big Data is a powerful medium which bestows upon the user, immense potential, both economic and social. However, an injudicious use of Big Data would also, in all probability, lead the cavalier organization towards unintended consequences. Big Data brings along with it the biggest risk of data privacy. Organisations such as Amazon, Google and Facebook make use of sensitive data, personal customer information and strategic documents. With the world awash in confidential data a data breach at one single point may be enough to create pandemonium. Reputations may go up in smoke as may painstakingly built fortunes and fame. Not to mention legal actions and punitive penalties. Taking measures for data privacy is no longer a paean to appreciable initiatives or best practices. It is an uncompromising mechanism of inevitable compliance.