Skip to main content

NoSQL Databases

What Is BIG DATA? Why?

The Simple Meaning of BIG DATA is, large amount/volume of structured and unstructured data, data which are unable to kept/stored on one machine. In 2012 Gartner (an american research and advisory firm ) introduced 3Vs which describes Big Data.



  1. Volume - The amount of generated stored data. 
  2. Velocity - This term explains the speed at which the data is generated and processed.(the speed which Big data should be analysed).
  3. Variety - Data will have different kind of types; structured, unstructured.



Also by adding two more characteristics this can be called as 5Vs
      4. Variability - Inconsistency of the data
      5. Veracity - Uncertainty of the data  

_________________________________________________________

WHY?


  For a moment think that you have a cloth shop. If you can track all your previous data of your business, customers, trends and then apply analytics to that and get some business ideas and maximize the profit of your business.
  Simply bigdata analytics helps to harness the data and use to find new patterns/opportunities which will lead to more efficient business moves, operations, services, happy customers.

_________________________________________________________


METHODS/TERMS


RDBMS(Relational Database Management Systems) have to face various kind of difficulties when it comes to handling Big Data. Majority of the data comes in a semi-structured format from different types of sources and these types of unstructured data cannot be handled via traditional databases and also the scalability and all the tuples/records in a relation must stored in one machine were some main problems. So the inability of RDBMS led to the emergence of new technologies.



NoSQL -  NoSQL does not mean 'no-SQL' but it means 'not only SQL'. It is non-relational, distributed data sources.

Hadoop - Was launched in 2006 as an open source project under Apache which is a distributed database processing platform.

HDFS Hadoop Distributed File System. It provides the scalable, fault-tolerant, cost-efficient storage for Big Data.

HBase - Its a distributed, non-relational database. It can be used when we want real time read/write access to our data(Big Data).

MapReduce - Its the main functionality of Hadoop. 'Map' is the process of where the individual element are broken down into key/value pairs. 'Reduce' is the process which takes the output of the 'Map' process as the input and combines those input data into another smaller tuples.

Yarn - It's a core part of Hadoop. It's a cluster management system.

Spark - It is a cluster computing system. It can run on Hadoop, standalone or even in the cloud and the data sources can be HBase, HDFS.

HIVE - HIVE provides data summarization, query and analysis to hadoop.

Kafka - This is an open source streaming process platform which is written on java and scala.
_________________________________________________________



To Be Continued.....
(All the above mentioned and more methods/frameworks will be explained individually on the upcoming blogs)




Comments

Popular posts from this blog

SSD

Cross Site Request Forgery             Cross-Site Request Forgery (CSRF) is an assault that powers an end client to execute undesirable activities on a web application in which they're at present verified. CSRF assaults particularly target state-evolving demands, not robbery of information, since the assailant has no real way to see the reaction to the produced demand. With a little help of social designing, (for example, sending a connection through email or talk), an attacker may trap the clients of a web application into executing activities of the assailant's picking. On the off chance that the injured individual is a typical client, a fruitful CSRF assault can compel the client to perform state changing solicitations like exchanging reserves, changing their email address, ect. On the off chance that the unfortunate casualty is a managerial record,CSRF can compromise the entire web application. What is CSRF and how it works ?     A...

EclEmma for code coverage

What is code coverage?                  Code coverage analysis is a structural testing technique (AKA glass box testing and white box testing). Structural testing is also called path testing since you choose test cases that cause paths to be taken through the structure of the program.  Code coverage is a measure used to describe the degree to which the source code  of a program  is executed when a particular test suits  runs. A program with high code coverage, measured as a percentage, has had more of its source code executed during testing which suggests it has a lower chance of containing undetected software bugs  compared to a program with low code coverage. What is EclEmma? This is a plugin for Eclipse which will shows you the coverage of a specific test set. You can run these on you Junit tests to check which statements are executed and which are not. there will be three main colors after you run this...

SSD_OAuth 2.0

This blog post will give a brief description about OAuth 2.0 What is OAuth 2.0?    OAuth 2.0 is a (Open Authorization) is a framework that give users the ability to grant access to their information stored in one place, from another place. To understand what is OAuth first we need to understand the main roles, grant types and the types of tokens. Roles    Roles are used to define and separate entities which are involved in a request. Mainly there are four roles. The Client - The client is the third party application which tries t access the user account. This can be either a website or a application. The Resource Server - Resource server is the place which stores the user's information The Authorization Server - This is the server which approves or denies the request. The authorization server produces the access token to the client. The Resource Owner - Resource owner grants access to the requested data. Grant Types     Grant...