Big Unstructured Data
v/s Structured Relational Data
What is structured data?
Data that resides in a fixed field within a record or file
is called structured data. This includes data contained in relational databases
and spreadsheets.
Structured data first depends on creating a data model – a
model of the types of business data that will be recorded and how they will be
stored, processed and accessed. This includes defining what fields of data will
be stored and how that data will be stored: data type (numeric, currency,
alphabetic, name, date, address) and any restrictions on the data input (number
of characters; restricted to certain terms such as Mr., Ms. or Dr.; M or F).
Structured data has the advantage of being easily entered,
stored, queried and analyzed. At one time, because of the high cost and
performance limitations of storage, memory and processing, relational databases
and spreadsheets using structured data were the only way to effectively manage
data. Anything that couldn't fit into a tightly organized structure would have
to be stored on paper in a filing cabinet.
What is unstructured data?
The phrase "unstructured data" usually refers to
information that doesn't reside in a traditional row-column database. As you
might expect, it's the opposite of structured data -- the data stored in fields
in a database.
Unstructured data files often include text and multimedia
content. Examples include e-mail messages, word processing documents, videos,
photos, audio files, presentations, webpages and many other kinds of business
documents. Note that while these sorts of files may have an internal structure,
they are still considered "unstructured" because the data they
contain doesn't fit neatly in a database.
Growth of Unstructured Data:
Experts estimate that 80 to 90 percent of the data in any
organization is unstructured. And the amount of unstructured data in
enterprises is growing significantly -- often many times faster than structured
databases are growing.
Video
about structured and unstructured data
Analyzing Structured data
Structured data analysis is the statistical data analysis of
structured data. This can arise either in the form of an a priori structure
such as multiple-choice questionnaires or in situations with the need to search
for structure that fits the given data, either exactly or approximately. This
structure can then be used for making comparisons, predictions, manipulations
Types of structured data analysis:
Algebraic
data analysis
Bayesian
analysis
Cluster
analysis
Combinatorial
data analysis
Formal
concept analysis
Functional
data analysis
Geometric
data analysis
Regression
analysis
Shape
analysis
Topological
data analysis
Tree
structured data analysis
Analyzing Unstructured data
Techniques such as data mining, Natural Language Processing
(NLP), text analytics, and noisy-text analytics provide different methods to
find patterns in, or otherwise interpret, this information. Common techniques
for structuring text usually involve manual tagging with metadata or
part-of-speech tagging for further text mining-based structuring. Unstructured
Information Management Architecture (UIMA) provides a common framework for processing
this information to extract meaning and create structured data about the
information.
WHY NOBODY IS ACTUALLY ANALYZING UNSTRUCTURED DATA?
Well is here is an interesting answer: http://iianalytics.com/research/why-nobody-is-actually-analyzing-unstructured-data
Role of data warehouses in future
Technologies likes Web 2.0 and rapid growth of unstructured
data has really pushed the data warehouse industry to evolve and come up with
innovative tools and techniques. Data warehouse is going to play a significant
role in future technologies like Web 3.0, Natural Language Processing (NLP),
Artificial Intelligence and Big Data.
In Web 3.0, data warehouse is going to be useful in providing
a fast querying technology.
In NLP, Natural language processing we need our data
warehouses to store unstructured data in the most efficient way to query and
perform analysis.
In Artificial Intelligence, we need data warehouses to be
able to store networks and relationships between data and not just the data
itself.
And Finally the Big Data, the rate of generation of data is
rapidly increasing. We need ETL processes which can transform and store big data
in a very short duration.
Citations:
No comments:
Post a Comment