Wednesday, February 18, 2015

Big Data Types

Big Unstructured Data v/s Structured Relational Data

What is structured data?
Data that resides in a fixed field within a record or file is called structured data. This includes data contained in relational databases and spreadsheets.
Structured data first depends on creating a data model – a model of the types of business data that will be recorded and how they will be stored, processed and accessed. This includes defining what fields of data will be stored and how that data will be stored: data type (numeric, currency, alphabetic, name, date, address) and any restrictions on the data input (number of characters; restricted to certain terms such as Mr., Ms. or Dr.; M or F).
Structured data has the advantage of being easily entered, stored, queried and analyzed. At one time, because of the high cost and performance limitations of storage, memory and processing, relational databases and spreadsheets using structured data were the only way to effectively manage data. Anything that couldn't fit into a tightly organized structure would have to be stored on paper in a filing cabinet.

What is unstructured data?
The phrase "unstructured data" usually refers to information that doesn't reside in a traditional row-column database. As you might expect, it's the opposite of structured data -- the data stored in fields in a database.
Unstructured data files often include text and multimedia content. Examples include e-mail messages, word processing documents, videos, photos, audio files, presentations, webpages and many other kinds of business documents. Note that while these sorts of files may have an internal structure, they are still considered "unstructured" because the data they contain doesn't fit neatly in a database.


Growth of Unstructured Data:
Experts estimate that 80 to 90 percent of the data in any organization is unstructured. And the amount of unstructured data in enterprises is growing significantly -- often many times faster than structured databases are growing.



Video about structured and unstructured data





Analyzing Structured data
Structured data analysis is the statistical data analysis of structured data. This can arise either in the form of an a priori structure such as multiple-choice questionnaires or in situations with the need to search for structure that fits the given data, either exactly or approximately. This structure can then be used for making comparisons, predictions, manipulations

Types of structured data analysis:
Algebraic data analysis
Bayesian analysis
Cluster analysis
Combinatorial data analysis
Formal concept analysis
Functional data analysis
Geometric data analysis
Regression analysis
Shape analysis
Topological data analysis
Tree structured data analysis

Analyzing Unstructured data
Techniques such as data mining, Natural Language Processing (NLP), text analytics, and noisy-text analytics provide different methods to find patterns in, or otherwise interpret, this information. Common techniques for structuring text usually involve manual tagging with metadata or part-of-speech tagging for further text mining-based structuring. Unstructured Information Management Architecture (UIMA) provides a common framework for processing this information to extract meaning and create structured data about the information.

WHY NOBODY IS ACTUALLY ANALYZING UNSTRUCTURED DATA?

Role of data warehouses in future
Technologies likes Web 2.0 and rapid growth of unstructured data has really pushed the data warehouse industry to evolve and come up with innovative tools and techniques. Data warehouse is going to play a significant role in future technologies like Web 3.0, Natural Language Processing (NLP), Artificial Intelligence and Big Data.
In Web 3.0, data warehouse is going to be useful in providing a fast querying technology.
In NLP, Natural language processing we need our data warehouses to store unstructured data in the most efficient way to query and perform analysis.
In Artificial Intelligence, we need data warehouses to be able to store networks and relationships between data and not just the data itself.
And Finally the Big Data, the rate of generation of data is rapidly increasing. We need ETL processes which can transform and store big data in a very short duration.


Citations:


No comments:

Post a Comment