Wednesday, April 1, 2015

Moore’s Law, Cloud Computing, and DW/BI

Computer processing power and data storage capacities continue to increase and costs per calculation and per MegaByte of data storage continue to fall. Over recent years a number of massive computing clusters like Amazon Web Services (AWS), Microsoft Azure, and others have become available to the public. What begins to evolve from the cross product of those two changes are significant opportunities for change in data warehousing (DW) and business intelligence (BI) practices

In this blog I will discuss 2-3 recent changes in the world of DW and/or BI that can be linked to Moore’s Law and/or cloud computing. Blog also discusses what has led to the changes and how they will change the world of DW/BI for better or worse.

But first, let's understand the Moore's law and cloud computing.
  1. "The observation made in 1965 by Gordon Moore, co-founder of Intel, that the number of transistors per square inch on integrated circuits had doubled every year since the integrated circuit was invented. Moore predicted that this trend would continue for the foreseeable future." - Wikipedia
As showed in the picture above, the expansion of in-memory databases’ depends heavily on the evolution of the price of memory. It’s hard to argue against price predictions or Moore’s law. But accidents even if rare are still possible. Any significant change in the trend of memory costs, or other hardware market conditions (e.g. an unpredicted decrease of the price for SSDs), could give Teradata and Pivotal the extra time/conditions to break into advanced hybrid storage solutions that would offer slightly less fast but also less expensive products than their competitors’ in-memory databases.

For many years Moore's law was good predictor for number of transistors in an integrated circuits and it is often believed to have impacted growth in many areas related to computers and integrated circuits. In case of DW/BI, I believe that this multiplication of transistors in a integrated circuits has resulted in 3 very important changes:
1. Powerful Processors
2. Faster Memory
3. Smaller Circuits

All these coupled together mean that computer processing and data storage capabilities have also multiplied significantly, not necessarily at the rate of moore's law.  This means that now information systems are capable of not only storing more information but all processing it in faster way. Given that the amount of data is also increasing at pace which is much faster than the moore's law it could mean that increase in computational processing and data storage space only balances each other unless we innovate.

Some of the innovations that have revolutionized the DW/BI industry are technologies like Hadoop and Cloud Computing.

Cloud computing is a type of computing in which, a large group of remote servers are networked to allow centralized data storage and online access to computer services or resources. In short it provides computing over the internet instead of traditional PC model. Usually these services are provided by a vendor and the companies use the cloud infrastructure on pay-per-use basis, this allows companies to scale up or down instantly.

There are many forms of cloud computing such as: Hosted web applications, Clustering, Terminal Services but the most important form is virtualization.

Virtualization, in computing, refers to the act of creating a virtual (rather than actual) version of something, including but not limited to a virtual computer hardware platform, operating system (OS), storage device, or computer network resources.

Virtualization allows users to quickly increase the hardware resources on the fly and use the increased space and computation power that comes with it. This means that DW/BI can now accommodate the data on need basis, instead of relying on infrastructure team to setup a new server for changes in demand.

Finally, lets look at the recent changes in the world of DW and/or BI that can be linked to Moore’s Law and/or cloud computing

1.In-memory technologies supercharge performance. The emergence of in-memory database architecture brings race car-like performance to data warehouses. The term in memory is highly descriptive, of course. It refers to the ability to process large data sets in system RAM, accelerating number-crunching and reporting of actionable information.

2.Data compression enables higher-volume, higher-value analytics. The best way to counter non-stop data expansion is—what else?—data compression. Your organization’s data may be growing at 10X, but advanced compression methods, such as Oracle’s Hybrid Columnar Compression, can match that. Using compression, companies can capture and store more valuable data, and they can do it without 10X the cost and 10X the pain.

3. On-demand analytics environments meet the growing demand for rapid prototyping and information discovery. If you’re familiar with cloud computing’s software-as-a-service model, then you’ll appreciate the concept of “analytics as a service.” Technical breakthroughs such as Oracle Database 12c’s pluggable database feature make it easy for administrators to provide “sandboxes” in a data warehouse environment for use in support of new analytics projects.

Citations:
1. Wikipedia
2. http://nosql.mypopescu.com/post/73401837113/aster-data-hawq-gpdb-and-the-first-hadoop
3. http://www.forbes.com/sites/oracle/2014/03/10/the-top-10-trends-in-data-warehousing/




Thursday, March 5, 2015

Presentation and Visualization Methods


Data visualization is the method of consolidating data into one collective, illustrative graphic. Traditionally, data visualization has been used for quantitative work (info-graphics are a popular example) but ways to represent qualitative work have shown to be equally as powerful.

US Population Spike
US Population
Everyday, our attention is being fought for. Data visualization excels in capturing a viewer’s attention and holding it through storytelling. It addresses a complex problem that could be easily looked over, and simplifies it using design. Naturally, a new market for business has emerged. Companies have recognized the value in representing research data in an innovative way and have built a platform to connect designers, data experts, and marketing managers with businesses who want to share their research findings. By taking their data and turning it into visual content, users are more likely to engage with and share it.



For this Blog, I will discuss data presentation and visualization methods for three very important business lines.
  • Financial Accounting
  • Customer Relations Management
  • Health Care Industry

Financial Accounting


This involves the preparation of financial statements available for public consumption. Stockholders, suppliers, banks, employees, government agencies, business owners, and other stakeholders are examples of people interested in receiving such information for decision making purposes.

Presenting financial accounts information
  1. Balance and Income statement sheet : This is the most basic and common representation of a company's financial information. However, its not suitable for all audiences
  2. Graphs: Bar charts and Line charts are another common way to represent the financial statements. It gives a better picture to folks are do not understand the financial lingo very well.
  3. Recommendation: Finding the a right visual representation for all audiences is a difficult task in this line of business. I found an interesting visualization, which is very intuitive in sharing this kind of information.
Balance Sheet for 10 years at PBoC

Customer Relations Management


Customer relationship management (CRM) is a system for managing a company's interactions with current and future customers. Users for this information make decisions at all levels.

Presenting CRM information:
  1. Excel Sheet: Using conditional formatting in excel is one of the basic ways of representing CRM information.
  2. Bar Charts: CRM dashboards usually contain several bar charts displaying the information about various orders, customers, sales calls , return, etc. related data.
  3. Recommendation: Using an combination of visuals which interactively represent the data. Example(given below): use a map to allow users to select a location and then show the information using a combination of pie, donut, bar and cone graphs.
Sales Data using Slick data visuals


Health Care Industry

The healthcare industry is an aggregation of sectors within the economic system that provides goods and services to treat patients with curative, preventive, rehabilitative, and palliative care. Everyone on this planet is interested in this data!

Presenting Healthcare information:
  1. Radar graphs: These are pretty simple ways to represent information in health care industry.
  2. Trend lines: This gives us ability to display trends, growth and decline information.
  3. Recommendation: Using a map based interactive representation with drill down facility is one of the best ways for representation of visual data on Healthcare,

Health care costs by State in US

Conclusion: Know your audience and present the information to them in the best possible way for them to understand. Visualization should be Intuitive, Simple, Appealing and Interactive.

References:
http://bridgeable.com/the-importance-of-data-visualization/
http://content.time.com/time/interactive/0,31813,1549966,00.html
www.wikipedia.com
http://jpkoning.blogspot.com/2012/11/data-visualization-peoples-bank-of.html
http://www.slickdata.com/sales/
http://community.copypress.com/wp-content/uploads/2013/03/HealthcareMap_Final5.png

Wednesday, February 18, 2015

Big Data Types

Big Unstructured Data v/s Structured Relational Data

What is structured data?
Data that resides in a fixed field within a record or file is called structured data. This includes data contained in relational databases and spreadsheets.
Structured data first depends on creating a data model – a model of the types of business data that will be recorded and how they will be stored, processed and accessed. This includes defining what fields of data will be stored and how that data will be stored: data type (numeric, currency, alphabetic, name, date, address) and any restrictions on the data input (number of characters; restricted to certain terms such as Mr., Ms. or Dr.; M or F).
Structured data has the advantage of being easily entered, stored, queried and analyzed. At one time, because of the high cost and performance limitations of storage, memory and processing, relational databases and spreadsheets using structured data were the only way to effectively manage data. Anything that couldn't fit into a tightly organized structure would have to be stored on paper in a filing cabinet.

What is unstructured data?
The phrase "unstructured data" usually refers to information that doesn't reside in a traditional row-column database. As you might expect, it's the opposite of structured data -- the data stored in fields in a database.
Unstructured data files often include text and multimedia content. Examples include e-mail messages, word processing documents, videos, photos, audio files, presentations, webpages and many other kinds of business documents. Note that while these sorts of files may have an internal structure, they are still considered "unstructured" because the data they contain doesn't fit neatly in a database.


Growth of Unstructured Data:
Experts estimate that 80 to 90 percent of the data in any organization is unstructured. And the amount of unstructured data in enterprises is growing significantly -- often many times faster than structured databases are growing.



Video about structured and unstructured data


Monday, February 2, 2015

BI Product Analysis



Business Intelligence & Analysis Products Scan & Evaluation

This is my first blog in a series of blogs coming up on Business Intelligence (BI). I should also warn you, I am not an experienced blogger, so you will have to bear with me as I work on improving my blogging skills. Now, you have been given a fair warning so gear up, for what’s coming next. Today I am going to share some insights on a very interesting topic “Business Intelligence & Analysis Products Scan & Evaluation”.

Yeah, I know it sounds boring, but trust me, it’s going to get better. Before I begin, let’s understand what BI means. Business intelligence (BI) is the set of techniques and tools for the transformation of raw data into meaningful and useful information for business analysis purposes. Let’s look at an example:

One major soda company used dashboards to view their international sales. They quickly noticed one region where sales had significantly dropped, and upon clicking on that area they saw that it was due to a local competitor with a new drink. They quickly created a new product to compete with the local business, and saw their sales numbers return to normal.

How cool is that!

Well, there are several tools (probably in 100’s) out there in the market which provide a variety of features and interesting tools and techniques. How do we choose the best for our businesses?

Let us consider following 5 BI vendors for analyses


1. MICROSOFT BUSINESS INTELLIGENCE


2. TABLEAU BUSINESS INTELLIGENCE

3. MICROSTRATEGY

4. ORACLE BUSINESS INTELLIGENCE ENTERPRISE EDITION (OBIEE)

5. SAS BUSINESS INTELLIGENCE


Criteria for selecting a BI Vendor or a Product


Choosing a BI vendor is all about finding the right fit. This requires an understanding of not only the BI marketplace, but also your own business and your organization’s needs. What are your company’s most pressing issues? How can a BI platform solve these challenges?

Developing use cases, or real world examples of ways in which the software could be implemented, will help you ask the right questions and make the right choice.

Before investigating potential BI solutions, it is important to understand how the platform will be implemented, and understand the people that will be using it. For today’s discussion, I will be exploring following criteria to evaluate my top 5 BI tools:

1. Customer support:  All BI products have their problems, and no vendor’s software is immune to bugs (although some vendors do seem to have higher software quality standards than others). When a problem arises, customers can reasonably expect to contact the technical support to resolve the problems. In some cases, this may be in form of some work around, and in others, it may be a matter of waiting for a software patch or new product release. Communication about the status of a resolution and built-in escalation procedures are both important. Customers do have a responsibility here to ensure they are using the software in a supported environment and to report problems in a clear way that can be readily documented. Support should never be considered as a replacement for training and reading quality documentation.

WeightageThus, I think a 25% weight-age should be given to customer support when considering a BI tool.

AnalysisAs per my analysis Microsoft and Tableau provide excellent support, Microsoft does a good job and finally Oracle and SAS do a decent job at support.

2. Pricing and packaging:  BI pricing and packaging continues to be confusing for BI buyers and, in many cases, unnecessarily complex. Most vendors offer named user licensing, server based, or a combination of the two. There may be special packaging for departmental or SMB deployments.

Scores on this criterion reflect pricing transparency,  degree of complaints from customers about inconsistency and confusing packaging policies and obviously the relative costs.

Weightage: Since this is one of the major factors when selecting a COTS product, I have given a weight-age of 25%

Analysis: As per my analysis Tableau does the best job in terms of pricing and packaging followed by the rest of the herd.

3. Business Query:  This module is often referred to as to as "ad hoc query", but such terminology  is misleading. Unable to wait weeks or months for IT to develop a new report, business user often demands the ability to create queries and reports themselves. The business environment changes at a rapid pace; information requirements can change as correspondingly fast. Business query tools facilitate a self-service reporting environment and allow business users to answer their own questions and do their own analysis. This module assumes that the data mart has been established and that IT has developed a business view to the physical tables.

Weightage: In the world of business analysis this is a very crucial factor and thus I have given a 20% weight-age.

Analysis: As per my analysis MicroStrategy has an excellent module and the rest of the herd more or less does good.

4. Dashboards:  A dashboard is a visual form of information display, which is used to monitor what's currently going on in the business at a glance. Any tool that can display multiple objects from multiple data sources, then, can correctly be referred to as a dashboard. A dashboard has to meet several criteria based on user tastes. However, as minimum following features are basic: interactive design, visually appealing, supporting insight into action, maps, gauges, trend line and cross tab functionality.

Weightage: This is one of  the common features that every vendor produces. Also, this is one of the must haves in a BI tool. So I have decided to give a weight-age of 15%.

Analysis: Being a basic feature for any BI tool, every vendor does a good job at visualization. Some of them have been really innovative with their dashboards. For the current set of vendors I would rate them to be equally good at this.

5. Predictive Analysis: Predictive analysis is used in a variety of forward-looking applications such as fraud detection, customer scoring, risk analysis, and campaign management. Predictive analysis and the task of creating predictive models are reserved for specialist users, with SAS and SPSS leading the market. In an effort to make BI more actionable, some BI vendors are incorporating predictive analytics into their BI suite. This module is still new to the world of BI tool, however it has potential to become mainstream.

Weightage: I think this is a useful tool to have with your BI tool and that's why I have decided to give a weight-age of 15%

Analysis: As per my analysis Tableau does provide this capability where as the rest of the herd does a decent job at this.

Snapshot of Weighted Analysis



Detailed View



Conclusion and Recommendations: Microsoft may emerge as the winner of this competition. However, this doesn't mean that they are the best player in the game. Like I discussed in the beginning choosing a BI vendor is all about finding the right fit. This requires an understanding of not only the BI marketplace, but also your own business and your organization’s needs.

So do your homework before you decide on a BI tool for your organization !

Thank you for reading my blog, please post your comments, suggestions and your take on BI in the section below. Kudos!


References:
1. http://www.jenunderwood.com/2014/03/16/analyzing-gartners-2014-magic-quadrant-for-bi-and-analytics-platforms/
2. http://www.docurated.com/all-things-productivity/50-best-business-intelligence-tools
3. https://www.microstrategy.com/Strategy/media/downloads/about-us/MicroStrategy-BI-Scorecard-Summary-Q4-2012.pdf