Wednesday, April 1, 2015

Moore’s Law, Cloud Computing, and DW/BI

Computer processing power and data storage capacities continue to increase and costs per calculation and per MegaByte of data storage continue to fall. Over recent years a number of massive computing clusters like Amazon Web Services (AWS), Microsoft Azure, and others have become available to the public. What begins to evolve from the cross product of those two changes are significant opportunities for change in data warehousing (DW) and business intelligence (BI) practices

In this blog I will discuss 2-3 recent changes in the world of DW and/or BI that can be linked to Moore’s Law and/or cloud computing. Blog also discusses what has led to the changes and how they will change the world of DW/BI for better or worse.

But first, let's understand the Moore's law and cloud computing.
  1. "The observation made in 1965 by Gordon Moore, co-founder of Intel, that the number of transistors per square inch on integrated circuits had doubled every year since the integrated circuit was invented. Moore predicted that this trend would continue for the foreseeable future." - Wikipedia
As showed in the picture above, the expansion of in-memory databases’ depends heavily on the evolution of the price of memory. It’s hard to argue against price predictions or Moore’s law. But accidents even if rare are still possible. Any significant change in the trend of memory costs, or other hardware market conditions (e.g. an unpredicted decrease of the price for SSDs), could give Teradata and Pivotal the extra time/conditions to break into advanced hybrid storage solutions that would offer slightly less fast but also less expensive products than their competitors’ in-memory databases.

For many years Moore's law was good predictor for number of transistors in an integrated circuits and it is often believed to have impacted growth in many areas related to computers and integrated circuits. In case of DW/BI, I believe that this multiplication of transistors in a integrated circuits has resulted in 3 very important changes:
1. Powerful Processors
2. Faster Memory
3. Smaller Circuits

All these coupled together mean that computer processing and data storage capabilities have also multiplied significantly, not necessarily at the rate of moore's law.  This means that now information systems are capable of not only storing more information but all processing it in faster way. Given that the amount of data is also increasing at pace which is much faster than the moore's law it could mean that increase in computational processing and data storage space only balances each other unless we innovate.

Some of the innovations that have revolutionized the DW/BI industry are technologies like Hadoop and Cloud Computing.

Cloud computing is a type of computing in which, a large group of remote servers are networked to allow centralized data storage and online access to computer services or resources. In short it provides computing over the internet instead of traditional PC model. Usually these services are provided by a vendor and the companies use the cloud infrastructure on pay-per-use basis, this allows companies to scale up or down instantly.

There are many forms of cloud computing such as: Hosted web applications, Clustering, Terminal Services but the most important form is virtualization.

Virtualization, in computing, refers to the act of creating a virtual (rather than actual) version of something, including but not limited to a virtual computer hardware platform, operating system (OS), storage device, or computer network resources.

Virtualization allows users to quickly increase the hardware resources on the fly and use the increased space and computation power that comes with it. This means that DW/BI can now accommodate the data on need basis, instead of relying on infrastructure team to setup a new server for changes in demand.

Finally, lets look at the recent changes in the world of DW and/or BI that can be linked to Moore’s Law and/or cloud computing

1.In-memory technologies supercharge performance. The emergence of in-memory database architecture brings race car-like performance to data warehouses. The term in memory is highly descriptive, of course. It refers to the ability to process large data sets in system RAM, accelerating number-crunching and reporting of actionable information.

2.Data compression enables higher-volume, higher-value analytics. The best way to counter non-stop data expansion is—what else?—data compression. Your organization’s data may be growing at 10X, but advanced compression methods, such as Oracle’s Hybrid Columnar Compression, can match that. Using compression, companies can capture and store more valuable data, and they can do it without 10X the cost and 10X the pain.

3. On-demand analytics environments meet the growing demand for rapid prototyping and information discovery. If you’re familiar with cloud computing’s software-as-a-service model, then you’ll appreciate the concept of “analytics as a service.” Technical breakthroughs such as Oracle Database 12c’s pluggable database feature make it easy for administrators to provide “sandboxes” in a data warehouse environment for use in support of new analytics projects.

Citations:
1. Wikipedia
2. http://nosql.mypopescu.com/post/73401837113/aster-data-hawq-gpdb-and-the-first-hadoop
3. http://www.forbes.com/sites/oracle/2014/03/10/the-top-10-trends-in-data-warehousing/