PASS Summit Day 2 started with Keynote of Dr. Dewitt from MIT. I personally am so exited to hear this keynote. Sitting again on blogger’s table will write highlights of today’s keynote here in this post. Please refresh this post to get updated information.
8:18 – Grant Fritchey stepped up in stage
8:21 – Over 36K members of PASS Virtual Chapters
8:23 – Denise McInerney stepped up in stage
8:26 – New Logo for SQL PASS
8:27 – New website for PASS demos showed, it will be launching next year (SOON!)
8:29 – Summit 2017 Oct 31 to Nov 3 announced and live now
8:30 – Dr. Dewitt stepped up on stage from MIT talking about Data Warehousing in Cloud
8:32 – Why DW in Cloud?
8:34 – Why? Reduce Time to insights
8:35 – Why? Dynamically adjust capactiy
8:36 – Scalable DW Fundamentals:
- Alternative architectures
- Partitioned tables
- The basis for scalable execution
- Patitioned Parallelism
- software building blocks for scalable database systems
- Handling hardware failures
8:38 – Two alternative scalable DW designs
- Shared-Nothing
- Microsoft APS, Teradata, Netezza
- Shared-Storage
- Microsoft SQL DW, Snowflake, DataBricks…
8:40 – Shared-Nothing architecture diagram
8:41 – Shared-Storage architecture
8:43 – Partitioned Tables
8:46 – Round-Robin Partitioning
8:47 – Hash (Key) Partitioning
8:48 – Table Replication
8:50 – Partitioned Paralleslism
Used to parallelize the execution of relational operators (selects, joins, aggregates,…)
By both shared-storage and shared-nothing systems
Pipelining is used between operators to avoid unnecessary disk I/Os.
An example:
8:53 – Turning to joins; was hardest part of early days of parallel data processing
9:05 – Node Failures with Shared Storage
9:07 – A look at competitiros: Amazon Redshift, Snowflake, Microsoft SQL DW
9:07- Redshift classic shared-nothing design
9:09 – whitin a slice; columns stored in 1MB blocks. Min and Max value of each block retained in a “zone” map. Rich collection of compression options (RLS, Dictionary…)
Two sort options: Compound sort key, and “interleaved” sort key
9:13 – Handling node failures in Redshift
Redshift summary
9:14 – 2nd Comparison option: Snowflake: Shared-storage design
compute decoupled from storage
Highly elastic
Leverages AWS
9:18 – Table Storage in Snowflake
9:20 Virtual Warehouses
9:26 – Snowflake Summary
9:27 – Microsoft SQL DW
DWU Performance Metric
9:32 – Scaling up in SQL DW
9:32- SQL DW Summary
9:37 – Wrap up of comparison
9:38 – Azure SQL DW is by far the best query engine in the planet!
9:41 – Dr. Dewitt Thanks to all people.
Thank you Dr. This was the best session in my life. Now I have to take a bit time off to consume part of this awesome presentation 🙂