Our exclusive team of experts in Seo4u.com are learning and
specializing in the field of Internet Data Mining or Knowledge
Discovery in Databases (KDD) and Research
We at present provide Data Mining and Internet Research
Services to various sectors of Industries and business peoples
such as Importers, Exporters, Research institutes, Internet
Marketing companies, Technical consultants, Business Directories
developers, Portal developers etc..,
We provide accurate and most useful data ,which will highly
helpful to do the business confidently , research work in a more
effective and efficient manner.
We are also more competitive in our pricing strategies.
Definition and Process
of Data Mining
What is data mining?
The past two decades has seen a dramatic
increase in the amount of information or data being stored in
electronic format. This accumulation of data has taken place at an
explosive rate. It has been estimated that the amount of
information in the world doubles every 20 months and the size and
number of databases are increasing even faster. The increase in
use of electronic data gathering devices such as point-of-sale or
remote sensing devices has contributed to this explosion of
available data.
The Growing Base of Data
Data storage became easier as the availability of large amounts of
computing power at low cost ie the cost of processing power and
storage is falling, made data cheap. There was also the
introduction of new machine learning methods for knowledge
representation based on logic programming etc. in addition to
traditional statistical analysis of data. The new methods tend to
be computationally intensive hence a demand for more processing
power.
Having concentrated so much attention on the accumulation of data
the problem was what to do with this valuable resource? It was
recognized that information is at the heart of business operations
and that decision-makers could make use of the data stored to gain
valuable insight into the business. Database Management systems
gave access to the data stored but this was only a small part of
what could be gained from the data. Traditional on-line
transaction processing systems, OLTPs, are good at putting data
into databases quickly, safely and efficiently but are not good at
delivering meaningful analysis in return. Analyzing data can
provide further knowledge about a business by going beyond the
data explicitly stored to derive knowledge about the business.
This is where Data Mining or Knowledge Discovery in Databases
(KDD) has obvious benefits for any enterprise.
The term data mining has been stretched beyond its limits to apply
to any form of data analysis. Some of the numerous definitions of
Data Mining, or Knowledge Discovery in Databases are:
Data Mining, or Knowledge Discovery in Databases (KDD) as it is
also known, is the nontrivial extraction of implicit, previously
unknown, and potentially useful information from data. This
encompasses a number of different technical approaches, such as
clustering, data summarization, learning classification rules,
finding dependency net works, analysing changes, and detecting
anomalies.
William J Frawley, Gregory Piatetsky-Shapiro and Christopher J
Matheus
Data mining is the search for relationships and global patterns
that exist in large databases but are `hidden' among the vast
amount of data, such as a relationship between patient data and
their medical diagnosis. These relationships represent valuable
knowledge about the database and the objects in the database and,
if the database is a faithful mirror, of the real world registered
by the database.
Marcel Holshemier & Arno Siebes (1994)
The analogy with the mining process is described as:
Data mining refers to "using a variety of techniques to identify
nuggets of information or decision-making knowledge in bodies of
data, and extracting these in such a way that they can be put to
use in the areas such as decision support, prediction, forecasting
and estimation. The data is often voluminous, but as it stands of
low value as no direct use can be made of it; it is the hidden
information in the data that is useful"
Clementine User Guide, a data mining toolkit
Basically data mining is concerned with the
analysis of data and the use of software techniques for finding
patterns and regularities in sets of data. It is the computer
which is responsible for finding the patterns by identifying the
underlying rules and features in the data. The idea is that it is
possible to strike gold in unexpected places as the data mining
software extracts patterns not previously discernable or so
obvious that no-one has noticed them before.
Data mining analysis tends to work from the data up and the best
techniques are those developed with an orientation towards large
volumes of data, making use of as much of the collected data as
possible to arrive at reliable conclusions and decisions. The
analysis process starts with a set of data, uses a methodology to
develop an optimal representation of the structure of the data
during which time knowledge is acquired. Once knowledge has been
acquired this can be extended to larger sets of data working on
the assumption that the larger data set has a structure similar to
the sample data. Again this is analogous to a mining operation
where large amounts of low grade materials are sifted through in
order to find something of value.
Data Mining Models
IBM have identified two types of model or modes of operation which
may be used to unearth information of interest to the user.
Verification Model
The verification model takes an hypothesis from the user and tests
the validity of it against the data. The emphasis is with the user
who is responsible for formulating the hypothesis and issuing the
query on the data to affirm or negate the hypothesis.
In a marketing division for example with a
limited budget for a mailing campaign to launch a new product it
is important to identify the section of the population most likely
to buy the new product. The user formulates an hypothesis to
identify potential customers and the characteristics they share.
Historical data about customer purchase and demographic
information can then be queried to reveal comparable purchases and
the characteristics shared by those purchasers which in turn can
be used to target a mailing campaign. The whole operation can be
refined by `drilling down' so that the hypothesis reduces the
`set' returned each time until the required limit is reached.
The problem with this model is the fact that no new information is
created in the retrieval process but rather the queries will
always return records to verify or negate the hypothesis. The
search process here is iterative in that the output is reviewed, a
new set of questions or hypothesis formulated to refine the search
and the whole process repeated. The user is discovering the facts
about the data using a variety of techniques such as queries,
multidimensional analysis and visualization to guide the
exploration of the data being inspected.
Discovery Model
The discovery model differs in its emphasis in that it is the
system automatically discovering important information hidden in
the data. The data is sifted in search of frequently occurring
patterns, trends and generalizations about the data without
intervention or guidance from the user. The discovery or data
mining tools aim to reveal a large number of facts about the data
in as short a time as possible.
An example of such a model is a bank database which is mined to
discover the many groups of customers to target for a mailing
campaign. The data is searched with no hypothesis in mind other
than for the system to group the customers according to the common
characteristics found.