Introduction to Microsoft Big Data Solution – Microsoft HDInsight

Big Data is one of the hottest topics on data systems
nowadays. Many of organizations tries to find a clue to start work with Big
Data, and there are many courses and conference sessions on Big Data. Microsoft
as a Database and software vendor started to provide specific solutions for Big
Data. In this post you’ll learn about Big Data and some related terminologies,
and a high level overview of Microsoft solution for Big Data.

Everyone thinks that Big Data is every database that is more
than 1TB, but this is not correct. A very simple Definition of Big Data is:

Big Data is data set collections with high volume, velocity
and variety information, which can be used to fetch information regarding
decision making.

From definition above three main dimensions of Big Data are
obvious;

Volume; Size of Data

Variety; Different formats of Data

Velocity; How face data increases and How fact it will be
processed

Regarding to increasing number of database systems
especially transactional systems, social networks, logging systems and many
other systems that produce large number of transaction per time slice, Businesses
faces Big Data as times goes on.

Large Volume of data set and variety of data and the concern
for velocity will make it harder and harder to work with Big Data in regular
Relational Database Systems or ever in Data Warehouses. So Database vendors
started to think about methods and tools for dealing with Big Data in an
efficient way.

Microsoft also joined the Big Data vendors with introducing
Microsoft HDInsight.

What is HDInsight?

Microsoft HDInsight powered by Hortonworks and Microsoft,
Hortonworks is the company that provides Hadoop based solutions for Big Data, which
are powerful solutions for Big Data. So HDInsight is Hadoop based solution for
Microsoft Windows to provide Big Data Solutions with Microsoft Technologies.

What is Hadoop?

Hadoop is Apache based open source project for reliable, scalable,
distributed computing.

Hadoop provides distributed processing of large data sets
across clusters of computers using programming models.

Hadoop project includes different components to work with
Big Data, some of the main components of Apache Hadoop listed below:

Map Reduce

MapReduce is a programming model for processing large data
sets

MapReduce framework of Hadoop is for writing applications
that process large amount of structures/semi-structured data in parallel across
large clusters.

Pig

Pig provides a high level language (Pig Latin) which is a
scripting language to execute MapReduce jobs.

Hive

Hive is a data warehouse that enables fetching meanings from
MapReduce job through an SQL-Like scripting language (HiveQL) from large data
sets.

What Microsoft HDInsight Provides?

Microsoft HDInsight provides apache based Hadoop technology
for working with Big Data, and query meaningful data for decision making from
large data sets.

Links for read more;

More about Hadoop:

http://hortonworks.com/what-is-apache-hadoop/

Hortonworks provides Microsoft HDInsight:

http://hortonworks.com/partners/microsoft/

Microsoft Big Data Solution web address:

http://www.microsoft.com/sqlserver/en/us/solutions-technologies/business-intelligence/big-data.aspx

Microsoft HDInsight PREVIEW installation:

http://www.microsoft.com/web/gallery/install.aspx?appid=HDINSIGHT-PREVIEW

In next post I’ll explain more about how to install
HDInsight Preview version and how to run some examples on it.

Reza Rad on FacebookReza Rad on LinkedinReza Rad on TwitterReza Rad on Youtube
Reza Rad
Trainer, Consultant, Mentor
Reza Rad is a Microsoft Regional Director, an Author, Trainer, Speaker and Consultant. He has a BSc in Computer engineering; he has more than 20 years’ experience in data analysis, BI, databases, programming, and development mostly on Microsoft technologies. He is a Microsoft Data Platform MVP for 12 continuous years (from 2011 till now) for his dedication in Microsoft BI. Reza is an active blogger and co-founder of RADACAD. Reza is also co-founder and co-organizer of Difinity conference in New Zealand, Power BI Summit, and Data Insight Summit.
Reza is author of more than 14 books on Microsoft Business Intelligence, most of these books are published under Power BI category. Among these are books such as Power BI DAX Simplified, Pro Power BI Architecture, Power BI from Rookie to Rock Star, Power Query books series, Row-Level Security in Power BI and etc.
He is an International Speaker in Microsoft Ignite, Microsoft Business Applications Summit, Data Insight Summit, PASS Summit, SQL Saturday and SQL user groups. And He is a Microsoft Certified Trainer.
Reza’s passion is to help you find the best data solution, he is Data enthusiast.
His articles on different aspects of technologies, especially on MS BI, can be found on his blog: https://radacad.com/blog.

Leave a Reply