Big Data technology has seen a rapid growth in recent years. Big Data tools like Hadoop etc are extensively used in various fields. This post will discuss it, its functionalities, categories, attributes, applications and advantages as well as disadvantages.
Table of Contents
- 1 What is Big Data
- 2 Function Mechanism of Big Data
- 3 Categories of Big Data
- 4 Attributes of Big Data
- 5 Applications of Big Data
- 6 Advantages of Big Data
- 7 Disadvantages of Big Data
- 8 Big Data Hadoop Tool
What is Big Data
Data set that are highly intricate and is beyond the storage capacity and processing power of the computer is called Big Data.
These are exceedingly huge data sets with proportions beyond the ability of day-to-day computable activities that will eventually end up using software tools to capture, analyze, share, transfer, manage & process the data.
Function Mechanism of Big Data
Big Data helps in jet setting real-time computing decisions that estimate in assessing an out flux of facts and figures from social media, logistics, financial, retailer databases.
It succors in understanding the past, predicting the future, detecting patterns in data sets.
Categories of Big Data
The umbrella of ‘Big Data’ houses three groups, mainly:
- Structured Data
- Unstructured Data
- Semi-structured Data
It is the defined size of data which is precise and highly efficient. This is the most systematic data model because here any data can be stockpiled, obtained, organized, recouped and maneuvered in any way. This type of data resides in relational database and helps in easy storage.
Example: Data warehouses, Enterprise systems, Databases
It is the type of data that cannot be well ordered and customarily does not have any structured row-column configuration. Big data software tools like Hadoop can undertake the activity to organize and manage such disassembled data that are extremely convoluted, acutely huge and change rapidly.
Example: Text documents, Audio/video streams, log files
It is a self-describing data where the data format is implied and deducible. In this kind of structure, not necessarily all the acquired statistics may be similar and the schema can differ within a single database and over a period of time it can fluctuate imperiously.
Example: HTML, XML, RDF
Attributes of Big Data
The attributes of Big-data are as follows:
Fig. 2 – Attributes of Big Data
Volume of Data
- Recorded & transacted data amounting to the time consumed.
- Scaling of the bulky data.
Example: High resolution sensors
Velocity of Data
- Speed at which the data is originated.
- Processing and analysing of the streaming data.
Example: Improved connectivity
Variety of Data
- Different forms of data.
- Heterogeneous & noisy data
Example: Structured Data, Unstructured Data, Semi-structured Data
Veracity of Data
- Incoming data from unreliable resources
- Inaccuracy of the data
Example: Costing, Source availability issues
Value of Data
- Scientifically related data
- Elongated studies
Example: Simulation, Hypothetical events
Applications of Big Data
Fig. 3 – Applications of Big-Data
The applications of Big Data in various fields are as follows: –
In Health / Life Science
- Unearthing new medicines & developing it further.
- Analysis of disease patterns
In Retail /Consumer
- Managing supply-chains
- Targeting events
- Customer based programs
- Marketing segments
In Digital Media
- Controlling campaigns
- Targeting advertisements
- Click fraud prevention
In Finance Services
- Management of risk analysis
- Detecting fraud services
- Compliance & regulating the issues
- Propagating proper offers at the proper time
- Highly directed efficient engines that use predictive analytics
Advantages of Big Data
Its advantages are as follows: –
- Extracts ingenious results and helps in establishing main causes that hinder real-time issues.
- It is the biggest software boom because it intensifies cyber surveillance.
- Big Data is the next big thing as it helps in upgrading the sector of health care and has given a way for deeper understanding in the analysis of digital forensics.
- Since it is an open source, it has pathways to large information via surveys and add-ons happen every other second.
- Provides flexibility in financial markets and enhances sports consummation.
Disadvantages of Big Data
Its disadvantages are as follows: –
- There will be breach in the confidentiality of certain criterion in ‘Big Data’.
- To keep up with the refurbishes big data needs lot of agility to harmonize the data.
- It is always not an accommodating environment for analysts, data mining connoisseurs as the conversion of progressive data to analysis of the same data sometimes proves to be a uphill task.
- It is not useful for short run and sometimes strenuous to handle such big data.
- There are always technical and analytical challenges.
Fig. 4 – Big Data Hadoop Tool
Big Data Hadoop Tool
The emerging environment of ‘Big Data’ has Hadoop as its intermediary crux to support all of its primary activities. It is an easily accessible informant where this software framework is used in machine learning applications, predictive analytics, data mining etc. This is a distinguished framework where the dominant usage is for batch processing.
The Apache Hadoop is a famous open-source software utility that simplifies a cluster of network from distinct computers to resolve mammoth amount of data.
Components of Big-Data Hadoop Tool
The important components of Big Data Hadoop Tool are:
- Hadoop distributed file system (HDFS)
- Hadoop YARN
- Hadoop MapReduce
Hadoop distributed file system (HDFS)
Hadoop distributed file system (HDFS) is used for storage of the data.It has a master/slave architecture that sets up an error tolerant planning.
Hadoop YARN (Yet Another Resource Negotiator)
Hadoop YARN is used for blob management of data and is used to separate HDFS and MapReduce. It is used for dynamic allocation of lagoon of data from resource point to application point.
Hadoop MapReduce is used in the development of the data and to learn the measure and mechanism of the data. It is used for static allocation of data of resources for designated tasks.
Author: Savitha Rishank Chegu