What is Data?
Data is a collection of facts, such as numbers, words, measurements, observations or just descriptions of things. Everything we have is Data. Data is present in various forms around us.
Some Facts about Data
- 1.7MB of data is created every second by every person during 2020.
- In the last two years alone, the astonishing 90% of the world’s data has
- 463 exabytes of data will be generated each day by humans as of 2025.
- 95 million photos and videos are shared every day on Instagram.
- By the end of 2020, 44 zettabytes will make up the entire digital universe.
- Every day, 306.4 billion emails are sent, and 5 million Tweets are made.
- Google gets over 3.5 billion searches daily
- Whats App users exchange up to 65 billion messages daily.
- Internet users generate about 2.5 Quintilian bytes of data each day
- In 2019, there are 2.3 billion active Facebook users, and they generate a lot of data.
What is Big Data?
From the above facts, you must have known how Big the Data is? Everyday we humans create about 2500000000000000000 (2.5 Quntilion)bytes of Data. This would be about about 10 million Blu Ray Disks, if you stack them it would be 4 times the height of Eiffel Tower. Isn’t that Amazing?
But have you ever thought how google still responds in some fraction of seconds? How are you able to login through facebook within seconds?
Have you ever thought how these Tech giants are storing this Big Data and how they are analyzing it?
Big Data Problems
There are mainly three problems associated with Big Data known as 3 V’s of Big Data:
- Volume: The amount of data matters. With big data, you’ll have to process high volumes of low-density, unstructured data. This can be data of unknown value, such as Twitter data feeds, clickstreams on a webpage or a mobile app, or sensor-enabled equipment. For some organizations, this might be tens of terabytes of data. For others, it may be hundreds of petabytes.
- Velocity: Velocity is the fast rate at which data is received and (perhaps) acted on. Normally, the highest velocity of data streams directly into memory versus being written to disk. Some internet-enabled smart products operate in real time or near real time and will require real-time evaluation and action.
- Variety: Variety refers to the many types of data that are available. Traditional data types were structured and fit neatly in a relational database. With the rise of big data, data comes in new unstructured data types. Unstructured and semistructured data types, such as text, audio, and video, require additional preprocessing to derive meaning and support metadata.
How Big Data Problems are Resolved?
Problem of Volume is Solved using Distributed Storage and Problem of Velocity is Solved using Distributed Computing.
Distributed Storage: Storing data on Various Machines instead of a Single Machine is known as Distributed Storage. Topology used in this is Master Slave Topology. In this there is a Master node known as Name Node and have various slave nodes called data nodes. Data is Shared among these nodes. This concept of Distributed Storage can be implemented by various software, one of them is Hadoop and File System used is HDFS.
Data is Present in huge amount, but it is very useful as well as hard to use it using traditional methods as it is present in vast amount. So we use concept of Distributed Storage to solve Big Data Problems.