Data Science Virtual Machine



Data Science Virtual Machine (DSVM) is a virtual machine on the Azure cloud that is customized for doing data science. DSVM has some pre-configured and pre-install tools that help users to build the AI applications. DSVM will assist data science team to access a consistent setup. In this post, a brief introduction to DSVM and how to install it will be provided. Also, an overview of installed tools will be provided.

Data Science Virtual Machine Overview

Data science virtual machine is a pre-installed and pre-configured tool. DSVM able to promote collaboration among the data science team. Installing a set of required tools in the cloud, reduce the need for maintaining the software, and the cost and time for it.

DSVM can be useful for trainers and educators to teach data science with a consistent setup. Having a pre-built setup environment is good for short-term training. DSVM can be useful for learning and comparing different machine learning tools such as Microsoft ML Server, SQL Server, Visual Studio, Jupyter, deep learning tools, and so forth. DSVM has much different data sciences and deep learning tools.

·         Microsoft R Open with popular packages pre-installed

·         Microsoft ML Server (R, Python) Developer

·         Microsoft Office Pro-Plus

·         Anaconda Python 2.7, 3.5

·         Relational Database

·         Database Tools

·         Jupytor Notebook Server with R, Python, Julia, PySpark, Sparkmagic, and SparkR

·         Development tools such as Visual Studio 2017, RStudio desktop and Server, Pycharm Community and so forth.

·         Power BI Desktop

·         Data Movement and Management tools such as Azure storage explorer and Microsoft Data Management Gateway

·         Machine Learning tools such as Azure Machine Learning, Weka, Rattle, and H2O

·         Deep learning tools such as Microsoft cognitive Toolkit, TensorFlow, and Keras

·         Big Data Platforms such as Spark and Hadoop [1]


Setup the Data Science Virtual Machine (DSVM)

To set up the DSVM, first, you need to log into the Azure portal. And choose the AI and Machine Learning tools, then click on the Data Science Virtual Machine Windows 2016.


The next step is to set up the size and chooses the proper setup. For the aim of the data science, the virtual machine CPU size should be more than four cores and 14GB or more RAM [2]. There is a star beside the required service.



The next step is to set up the setting, for identifying the locations and names. Finally, in the last step, the summary of the DSVM will be shown. Deploying the DSVM will take about five minutes to create the service. After creating the resource, click on the Got to Resource. An overview of the DSVM will be shown to the end user. 

As you can see in the below figure, an overview of the location of DSVM, the size of the VM, and so forth.


To connect to the DSVM, click on the Connect option at the top of the window. The related Remote Desktop Protocol (RDP) will be able to be download from Azure.


After downloading the RDP file, open the virtual machine and check the environment.

In DSVM, visual studio with R and Python tools has been pre-installed (RTVS and PTVS). As you can see in the below figure, you able to check for recent update vis Visual Studio [2].

RStudio has been installed in DSVM. There is a need to install all required packages in RStudio and all users able to access these packages.



Leila Etaati
Dr. Leila Etaati is Principal Data Scientist, BI Consultant, and Speaker. She has over 10 years’ experience working with databases and software systems. She was involved in many large-scale projects for big sized companies. Leila has PhD of Information System department, University of Auckland, MS and BS in computer science. Leila is Microsoft Data Platform MVP.

Leave a Reply

Your email address will not be published. Required fields are marked *