A Research Frame Work of Machine Learning in Data Mining

Article Summary:

A data warehouse is a central store of data that has been extracted from operational data. Data in a data warehouse is typically subject-oriented, non-volatile, and of a historic nature, as contrasted with data used in an on-line transaction processing system. Data in data warehouses are often used in data mining and on-line analytical processing tools. OLAP techniques do not ..

A Research Frame Work of machine learning in data mining.

1. Introduction:-
A data warehouse is a central store of data that has been extracted from operational data. Data in a data warehouse is typically subject-oriented, non-volatile, and of a historic nature, as contrasted with data used in an on-line transaction processing system. Data in data warehouses are often used in data mining and on-line analytical processing tools.

OLAP techniques do not process enterprise data for hidden or unknown intelligence.
The data mining process takes data from a data warehouse as input and identifies the hidden patterns i.e. Data mining process extracts hidden predictive information from data warehouse through the Neural Networks tools.

It identifies the hidden patterns through classifier and clustering technique. Several experiments are already done to learn and train the network architecture for the data set used in back propagation neural N/W with different activation functions. Further studies may be carried out in classification and clustering. Classification is done through supervised machine learning and clustering through unsupervised machine learning. Mining Association rules is termed as Pattern mining. So Classification, clustering and pattern mining are the important issues in data mining.

Association rule mining or pattern mining is one of the most popular data mining methods. However, mining association rules or pattern often results in a very large number of found rules, leaving the analyst with the task to go through all the rules and discover interesting ones. Sifting manually through large sets of rules is time consuming and strenuous. Visualization has a long history of making large amounts of data better accessible using techniques like selecting and zooming. However, most association rule visualization techniques are still falling short when it comes to a large number of rules.

2. Motivations:-
Motivation from Machine Learning workshop, such a simple universal model designed with complex mathematics used in diversity area in real world applications.
Secondly keen interest in learning MATLAB new matrix software for solving such a universal model.
This interest drives me to learn and explore in Machine Learning.

3. Aim and objective:-
To achieve an improved pattern classification using Back propagation through an optimum machine learning model (NN model).

4. Artistic work:-

4.1 Classification and Clustering:-
It is the process of finding a set of models (or functions) that describe and distinguish data classes or concepts to predict the class of object whose class label is unknown. The derived model is based on the analysis of a set of training data (i.e. data objects whose class label is known. [1].
Unlike classification and prediction, which analyze class-labeled data objects? Clustering analyzes data objects without consulting a known class label [1]. Cluster of objects are formed so that object within a cluster have high similarity in comparison to one another but are very dissimilar to objects in other clusters. Each clustering that is formed can be viewed as a class of objects from which rule can be derived. The identification of patterns through classification and clustering technique is described.

4.2 Machine Learning:-

Supervised machine learning:-
The supervised machine learning consists of presenting an input pattern and modifying the network parameters (weights) to reduce distances between the computed output and the desired output.
Important issue concerning supervised learning is the problem of error convergence, i.e the minimization of error between the desired and computed unit values (target). The aim is to determine a set of weights which minimizes the error. One well-known method, which is common to many learning paradigms, is the least mean square (LMS) convergence rule(delta learning rule)[3].

Unsupervised machine learning:-
The unsupervised learning uses no external teacher and is based upon only local information. It is also referred to as self-organization, in sense that it self-organizes data presented to the network and detects their emergent collectible properties paradigms of unsupervised learning are Hebbian learning and Competitive learning.

4.3 Pattern Mining

Generally, data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information - information that can be used to increase revenue, cuts costs, or both. Data mining software is one of a number of analytical tools for analyzing data. It allows users to analyze data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational databases.

The association rule mining [4] or the pattern mining can also be used effectively in software defect predictions, associations and defect classifications.

"Pattern mining" is a data mining method that involves finding existing patterns in data. The patterns often referred association rules. The original motivation for searching association rules came from the desire to analyze supermarket transaction data, that is, to examine customer behavior in terms of the purchased products. For example, associations rule "beer =>potato chips (80%)" states that four out of five customers that bought beer also bought potato chips.
Micro array database[5] is a typical Relational database, which contains a large number of columns and a small number of rows, and it poses a great challenge for existing associated pattern mining algorithms that discover patterns in item enumeration space complexity . So instead of searching the large number of columns in a micro array database (bioinformatics database), its associated framing patterns should be searched.

5. Research plan

Searching large volumes of data automatically, looking for certain patterns as well as creating concise representations of the data that can be used for future prediction or classification tasks.

5.1 Literature review

Association rule learning (Dependency modeling) - Searches for relationships between data. For example a supermarket might gather data on customer purchasing habits. Using association rule learning, the supermarket can determine which products are frequently bought together and use this information for marketing purposes. This is sometimes referred to as market basket analysis.
The pattern mining is a movement from data to knowledge. The patterns are the association rules or the correlations among the large data sets. It helps to find out the interesting relationships among the large and complex data.

While thinking about data mining concepts the various queries comes into my mind.
How to recognize the association rules or patterns?
How to teach to a Machine? Or how a machine can able to learn?
How different features are extracted and mapped into the machine?
The data mining can able to solve some queries like
Identify the dependencies or relationship between data.
Find associations rules /patterns in between the data.
The data mining can work well for limited number of attributes present inside the relations.
Various methods are already proposed to find the relationships among data.

[A] Principal Components Analysis (PCA) or decision trees [2] have limitations to find out the relationships among data.

[B] The Kernel methods [2] have the potentials to detect generalized relationships among the data.
The Kernel methods map data into a [2] higher dimensional vector space in order to detect structure / complex relations in the data more easily.
In this method, the non-linear relationship in one space is transformed into linear relationships in another space.

Also to identify the non-linear relationships among data, various other methods are further used based on kernel methods .Those are support vector machines, kernel PCA, etc.).

5.2 Task planning

1. Background and motivation for my work.
2. Review current state of research.
3. My contribution to achieve goal.
4. Problem findings.
5. Methodology and techniques to be used for my research.
6. Implementations and experimentation.
7. Publication and conference presentation.
8. Tools requirement to complete my research.
9. Collection of my cited research papers to read further.
10. Thesis preparations and final presentation.

5.3 Overall research goals

I. In-spite of doing the empirical study, it can be optimized to a suitable corrective architecture by the help of Genetic Algorithm.
II. Improve Back Propagation Algorithm for better pattern recognition.
III. Hybridization can be done at architecture level to generate the more accurate result.
IV. The better classification can be done if we have the accurate and large sizable database.

5.4 Experimental Platform: -
The experimental platform is currently under construction.

6 . References:-

[1] Jaiwei Han, Michaline kambler. Data mining concept and technique.

[2] Cristianini N. Kernel Methods for General Pattern Analysis. www.kernel- methods.net

[3] S.N. Sivanandam. Neural Network a practical approach.

[4] Nilamadhab Mishra. "Art of software Defect Association and Correction Using Association Rule mining ", International journal of Computer Engineering and Technology (ISSN Print: ISSN 0976 - 6367, ISSN Online: ISSN 0976 - 6375), Volume 1 Issue 1(June 2010), PP. 275 - 285.

[5] Nilamadhab Mishra. "A Framework for Associated Pattern Mining over Micro array Database", Journal of Global Research in Computer Science (ISSN-2229-371X), Vol-2, N0-2, February-2011, PP.08 - 11.

About Author / Additional Info:
Nilamadhab Mishra. "Art of software Defect Association and Correction Using Association Rule mining ", International journal of Computer Engineering and Technology (ISSN Print: ISSN 0976 - 6367, ISSN Online: ISSN 0976 - 6375), Volume 1 Issue 1(June 2010), PP. 275 - 285.

..... Nilamadhab Mishra. "A Framework for Associated Pattern Mining over Micro array Database", Journal of Global Research in Computer Science (ISSN-2229-371X), Vol-2, N0-2, February-2011, PP.08 - 11.


HOME	WANT AN ACCOUNT?	SUBMIT ARTICLES	TOP AUTHORS	Commercial Collection Agency