Data Science Virtual Machine


Data Science Virtual Machine (DSVM) is a virtual machine on the Azure cloud that is customized for doing data science. DSVM has some pre-configured and pre-install tools that help users to build the AI applications. DSVM will assist data science team to access a consistent setup. In this post, a brief introduction to DSVM and how to install it will be provided. Also, an overview of installed tools will be provided.

Data Science Virtual Machine Overview

Data science virtual machine is a pre-installed and pre-configured tool. DSVM able to promote collaboration among the data science team. Installing a set of required tools in the cloud, reduce the need for maintaining the software, and the cost and time for it.

DSVM can be useful for trainers and educators to teach data science with a consistent setup. Having a pre-built setup environment is good for short-term training. DSVM can be useful for learning and comparing different machine learning tools such as Microsoft ML Server, SQL Server, Visual Studio, Jupyter, deep learning tools, and so forth. DSVM has much different data sciences and deep learning tools.

·         Microsoft R Open with popular packages pre-installed

·         Microsoft ML Server (R, Python) Developer

·         Microsoft Office Pro-Plus

·         Anaconda Python 2.7, 3.5

·         Relational Database

·         Database Tools

·         Jupytor Notebook Server with R, Python, Julia, PySpark, Sparkmagic, and SparkR

·         Development tools such as Visual Studio 2017, RStudio desktop and Server, Pycharm Community and so forth.

·         Power BI Desktop

·         Data Movement and Management tools such as Azure storage explorer and Microsoft Data Management Gateway

·         Machine Learning tools such as Azure Machine Learning, Weka, Rattle, and H2O

·         Deep learning tools such as Microsoft cognitive Toolkit, TensorFlow, and Keras

·         Big Data Platforms such as Spark and Hadoop [1]


Setup the Data Science Virtual Machine (DSVM)

To set up the DSVM, first, you need to log into the Azure portal. And choose the AI and Machine Learning tools, then click on the Data Science Virtual Machine Windows 2016.


The next step is to set up the size and chooses the proper setup. For the aim of the data science, the virtual machine CPU size should be more than four cores and 14GB or more RAM [2]. There is a star beside the required service.



The next step is to set up the setting, for identifying the locations and names. Finally, in the last step, the summary of the DSVM will be shown. Deploying the DSVM will take about five minutes to create the service. After creating the resource, click on the Got to Resource. An overview of the DSVM will be shown to the end user. 

As you can see in the below figure, an overview of the location of DSVM, the size of the VM, and so forth.


To connect to the DSVM, click on the Connect option at the top of the window. The related Remote Desktop Protocol (RDP) will be able to be download from Azure.


After downloading the RDP file, open the virtual machine and check the environment.

In DSVM, visual studio with R and Python tools has been pre-installed (RTVS and PTVS). As you can see in the below figure, you able to check for recent update vis Visual Studio [2].

RStudio has been installed in DSVM. There is a need to install all required packages in RStudio and all users able to access these packages.



Leila Etaati on LinkedinLeila Etaati on TwitterLeila Etaati on Youtube
Leila Etaati
Trainer, Consultant, Mentor
Leila is the first Microsoft AI MVP in New Zealand and Australia, She has Ph.D. in Information System from the University Of Auckland. She is the Co-director and data scientist in RADACAD Company with more than 100 clients in around the world. She is the co-organizer of Microsoft Business Intelligence and Power BI Use group (meetup) in Auckland with more than 1200 members, She is the co-organizer of three main conferences in Auckland: SQL Saturday Auckland (2015 till now) with more than 400 registrations, Difinity (2017 till now) with more than 200 registrations and Global AI Bootcamp 2018. She is a Data Scientist, BI Consultant, Trainer, and Speaker. She is a well-known International Speakers to many conferences such as Microsoft ignite, SQL pass, Data Platform Summit, SQL Saturday, Power BI world Tour and so forth in Europe, USA, Asia, Australia, and New Zealand. She has over ten years’ experience working with databases and software systems. She was involved in many large-scale projects for big-sized companies. She also AI and Data Platform Microsoft MVP. Leila is an active Technical Microsoft AI blogger for RADACAD.

Leave a Reply