Automating machine learning with SQL Server 2019
By Leila Etaati
Machine learning is a powerful tool for making predictions based on data. With a plethora of models and approaches available to choose from, simply knowing where to begin can be a project in itself. Data scientists can spend a significant amount of time configuring, running, testing, and comparing different machine learning methodologies, before ever generating an actual prediction.
This process with analyzing different algorithms and different parameters can be time-consuming
To simplify this process, we DBAs can suggest automated machine learning in SQL Server 2019. This approach runs in many different models on a dataset, compares the results to hold back data, and selects the approach that is determined to have the best fit.
One way to do this is with SQL Server 2019 Big Data Clusters. This technology uses Kubernetes containers to run and coordinate scalable clusters of SQL Server, Apache Spark, and HDFS instances. For the purposes of our discussion, what’s great about this is that the Apache Spark big data framework is built-in, along with the MLlib machine learning library. Many machine learning automation APIs are accessible through Apache Spark and Big Data Clusters. A recent Microsoft blog post [URL: https://cloudblogs.microsoft.com/sqlserver/2019/01/09/how-to-automate-machine-learning-on-sql-server-2019-big-data-clusters/] focuses on the open-source H2OAutoML software, which can run your data through deep neural nets, generalized linear models, stacked ensembles, and more.
Another possibility is to invoke Azure Machine Learning directly from SQL Server. If your data is already in SQL Server tables and you have a version of SQL Server that includes SQL Server Machine Learning Services, this high-performance option is easy to access. Azure Machine Learning has highly advanced automation [URL: https://docs.microsoft.com/azure/machine-learning/service/concept-automated-ml] built-in that can train and tune your model for you using your specified target metrics. Once you get the model back, you can explore the relevance of specific data features to its performance.
Whichever approach you use, automated machine learning makes it much easier to identify an appropriate approach to solve a given problem. It reduces the barrier to entry, empowering the average DBA to make a meaningful contribution to modern intelligent analytics. I highly recommend you check out these capabilities and get familiar with how to use them.
To learn more about what you can do with Microsoft SQL 19, check out the free Packt guide Introducing Microsoft SQL 19 [https://info.microsoft.com/ww-landing-introducing-sql-server-2019-content.html]. If you’re ready to jump to a fully managed cloud solution, check out The Essential Guide to Data in the Cloud [https://azure.microsoft.com/resources/essential-guide-to-data-in-the-cloud/].