Introduction to Microsoft Big Data Solution – Microsoft HDInsight

Posted by on Jan 5, 2013 in Big Data, HDInsight | No Comments
Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail

Big Data is one of the hottest topics on data systems
nowadays. Many of organizations tries to find a clue to start work with Big
Data, and there are many courses and conference sessions on Big Data. Microsoft
as a Database and software vendor started to provide specific solutions for Big
Data. In this post you’ll learn about Big Data and some related terminologies,
and a high level overview of Microsoft solution for Big Data.

Everyone thinks that Big Data is every database that is more
than 1TB, but this is not correct. A very simple Definition of Big Data is:

Big Data is data set collections with high volume, velocity
and variety information, which can be used to fetch information regarding
decision making.

From definition above three main dimensions of Big Data are
obvious;

Volume; Size of Data

Variety; Different formats of Data

Velocity; How face data increases and How fact it will be
processed

Regarding to increasing number of database systems
especially transactional systems, social networks, logging systems and many
other systems that produce large number of transaction per time slice, Businesses
faces Big Data as times goes on.

Large Volume of data set and variety of data and the concern
for velocity will make it harder and harder to work with Big Data in regular
Relational Database Systems or ever in Data Warehouses. So Database vendors
started to think about methods and tools for dealing with Big Data in an
efficient way.

Microsoft also joined the Big Data vendors with introducing
Microsoft HDInsight.

What is HDInsight?

Microsoft HDInsight powered by Hortonworks and Microsoft,
Hortonworks is the company that provides Hadoop based solutions for Big Data, which
are powerful solutions for Big Data. So HDInsight is Hadoop based solution for
Microsoft Windows to provide Big Data Solutions with Microsoft Technologies.

What is Hadoop?

Hadoop is Apache based open source project for reliable, scalable,
distributed computing.

Hadoop provides distributed processing of large data sets
across clusters of computers using programming models.

Hadoop project includes different components to work with
Big Data, some of the main components of Apache Hadoop listed below:

Map Reduce

MapReduce is a programming model for processing large data
sets

MapReduce framework of Hadoop is for writing applications
that process large amount of structures/semi-structured data in parallel across
large clusters.

Pig

Pig provides a high level language (Pig Latin) which is a
scripting language to execute MapReduce jobs.

Hive

Hive is a data warehouse that enables fetching meanings from
MapReduce job through an SQL-Like scripting language (HiveQL) from large data
sets.

What Microsoft HDInsight Provides?

Microsoft HDInsight provides apache based Hadoop technology
for working with Big Data, and query meaningful data for decision making from
large data sets.

Links for read more;

More about Hadoop:

http://hortonworks.com/what-is-apache-hadoop/

Hortonworks provides Microsoft HDInsight:

http://hortonworks.com/partners/microsoft/

Microsoft Big Data Solution web address:

http://www.microsoft.com/sqlserver/en/us/solutions-technologies/business-intelligence/big-data.aspx

Microsoft HDInsight PREVIEW installation:

http://www.microsoft.com/web/gallery/install.aspx?appid=HDINSIGHT-PREVIEW

In next post I’ll explain more about how to install
HDInsight Preview version and how to run some examples on it.

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail
rssyoutuberssyoutube
Reza Rad
Reza Rad is an Author, Trainer, Speaker and DW/BI Consultant. He has a BSc in Computer engineering; he has more than 15 years’ experience in databases, programming and development mostly on Microsoft technologies. He is a Microsoft MVP in Data Platform for seven continues years (from 2011 till now) for his dedication in Microsoft BI. He is author of some SQL Server and BI books, and also Power BI online book; from Rookie to Rock Star.

Leave a Reply

Your email address will not be published. Required fields are marked *