Then this work, the data sets from UCI machine

Then train the IFSVM with this membership values. In this paper
proposed a new approach along with membership values for reducing the
misclassification rate that is set the threshold value based on membership
values, after training Fuzzy approaches still the point is misclassified but
membership value is greater than the threshold it is predicated as actual
class.

 

 

 

Fig1: Proposed Methodology

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

     The paper is organized
as follows. The preprocessing of raw data has been presented in Section2 and the
proposed FS approaches have been discussed in Section 3. The classification
using IFSVM is presented in Section4. The Experimental results and analysis on
selected four data set has been discussed in setcion5 and the work has been
concluded in Section6.

2. Preprocessing

     In this work, the data sets from UCI
machine Repository have been considered. Before developing a model the data has
to be analyzed and it should be understood to know the structure and relevance
of features. The data includes a countable number of missing values for number
of features and some of feature values are continuous and other discrete. An
even more noise can be present in the data, which demands the cleaning of data
in preparing the dataset for classification analysis 15. In this hybrid
approach, as a part of cleaning, missing values of attributes have been
replaced by mean of all the values of the attribute and based on equal area
method the continuous values have been discretized in this paper.

 

3. Feature Selection

     After preprocessing feature subset selection
has been employed. FS refers to the problem of selecting those input attributes
that are most predictive of a given outcome. Unlike other dimensionality
reduction methods, feature selectors preserve the original meaning of the
features after reduction. FS has been mainly employed where huge number of
features or presence of large feature space has found in the given dataset.
Because the noisy features, irrelevant and redundant feature present in the
given feature space may leads to poor classification performace.  FS techniques have also been applied to small
and medium-sized datasets in order to locate the most informative features for
later use. The importance of feature selection is to reduce the problem size by
improving the quality and speed of classification. In this work FS has been
performed based on fuzzy rough sets and fuzzy soft sets.

3.1 Fuzzy Rough

           The most important and widely used
concept of fuzzy rough was originated by Dubois and prade 11. This demonstrates the power
of fuzzy-rough set theory in handling the vagueness and uncertainty often
present in data. Because of fuzzy nature present in most of real world problems,
expansion of rough approximation into fuzzy space helps in solving real world
problems. In that first fuzzyification is done after calculate the dependency
of each attribute is calculated after select highest dependency attribute
continue this procedure until dependency does not change. Finally  attribute set is obtained with reduced
features.

 

3.2 Fuzzy soft

                 Fuzzy Soft sets are the combination of both
fuzzy sets and soft sets. Fuzzification is a process of converting continuous
data into categorical data by assigning a membership value ranging from 0, 1.
Fuzzification is done by applying membership functions. There are many
membership functions like triangular, sigmoid, trapezoidal member ship
functions. Triangular membership function has been selected for our work
because of simple formula and computational efficiency. In Fuzzy Soft sets
first Fuzzification will be done and next normal parameter reduction will be
done.

Definition:

Let ?(U) denote the set of
all fuzzy sets of U. Let Ai ? E. A pair (Fi, Ai) is called a fuzzy soft set over
U, where Fi is a mapping given by Fi : Ai?
?(U)12.

 

4. Classification Based on
IFSVM

     After
selecting the relevant features from the dataset then train with IFSVM classification
technique. In this work fuzzy approach based machine learning technique has
been implemented. One of the main drawbacks of the standard SVM is that
the training process of the SVM is sensitive to the outliers or noise in the
training dataset due to over fitting. In many realword applications community,
due to over-fitting problem in SVMs, the training process is particularly
sensitive to those sample points which are far away from their own class in the
training dataset. SVM considers all data points with the equal importance in
classification problems.  But Different
input points can make different contributions to the learning of decision
surface. But these kinds of uncertainty points may be more important than
others for making decision, which leads to the problem of over fitting. As
fuzzy approaches are effective in solving uncertain problems, this problem for
SVM   can be handled with IFSVM. It is
very important to assign each data point in the training dataset with a
membership in order to decrease the effect of those outliers or noises.

     Data
points with large membership value can be treated as more envoy point of that
class where as the points with small membership value should be considered as
less significant point, then the contribution of abnormal data points with
minimum membership towards error will get reduced. In fact, this fuzzy at
present, Fuzzy based ML techniques faces two main difficulties: How to set
fuzzy memberships and how to decrease computational complexity. But computing
fuzzy memberships is still a challenge. This work concentrated on two
approaches for computing the fuzzy membership values namely Iterative and fuzzy
clustering. Membership value determines how important it is to classify a data
sample correctly.

4.1
Determining Membership Values

4.1.1 Iterative Approach

The membership values for only misclassified points
by using two membership values have been calculated as follows:

Calculated the ?i = max {0, 1 ? dig (xi)} i.e slack variable

For Membership Function1 (MF1) ?i  is given as

 (

) =

For Membership Function2 (MF2) slack variable is lsig (?i) = tanh(?i)

The MF1 applied is given blow:

hcnt (?) = h(?) =  1 if ?<1                              = 1/ ? otherwise Where, the membership is inversely proportional to the distance from the hyper plane. The MF2employed is presented below: hsig (?) = sech2 (?) Where,  ?i = max {0, 1 ? dig (xi)}, di is class label, and g (xi) =wT? (xi) + b. 4.1.2 Fuzzy Clustering The steps applied in fuzzy clustering are as follows: 1) Selected a clustering algorithm 2) Applied selected clustering on the training data set 3) Determined a subset of clusters that contain both normal and abnormal data. And mark the subset as MIXEDCLUS. 4) For each data point x ? MIXEDCLUS, fuzzy membership value has been set to1 5) For each data point x not belongs MIXEDCLUS, Identified the center of closest cluster to x and then calculated fuzzy membership of x with that cluster.            4.2 IFSVM In IFSVM, membership values are generated iteratively based on the positions of training vectors relative to the SVM decision surface itself. The calculation of the membership values using an IFSVM that makes no a-priori assumptions about the shape of the distribution of the training vectors. This method makes use of the result of the SVM training process and information about incorrectly classified training vectors (error vectors) to tune the membership values. The FSVM is then retrained with these new values, and the process repeated either for a fixed number of iterations or until the membership values converge 14. ALGORITHM Step 1:      SVM has been applied on the given dataset. For applying or Training SVM on given dataset we need to calculate ?, w, b values.  ?1…?N  values have been calculated  such that                     Q(?) = ??i  - ½???i?jyiyjxiTxj is maximized and ??iyi = 0, 0 ? ?i ? C for all ? After calculating ? value w value has been calculated such that                     Weight vector = ??iy i xi(0<=?i<=C)  Then b value is obtained using equation.                      Bias=1/n?sv(Ysv-??iyixi.xsv) Classification:                         If wT? (xi) + b >0   is 1

                        else  wT? (xi)
+ b <0   is -1 Misclassified points have been determined by comparing original data with predicted classes. Calculated MF values for each misclassified point using MF1 and MF2.  Step2:     1) Set s = 1.     2) Solved the FSVM dual training problem. ?1…?N values haven calculated such that               Q(?) =??i  - ½???i?jyiyjxiTxj is maximized and   ??iyi = 0,  0 ? ?i ? Si*C for all ?     3) For all i? ZN set (where 0 < µ < 1):                si = spreviousi + µ(h(?i) ? spreviousi)     4) Stop if termination condition has been met.     5) Otherwise repeat from step 2 The termination conditions employed are: ·       The first (and simpler) termination condition is to stop after a fixed number (n) of iterations ·       The second termination condition is to continue until the rate of change of s becomes sufficiently small; indicating that the class membership vector s has converged to some value.  si ? sprevious i  ? 10?3 ?i ? ZN. Step 3: the threshold value has been fixed based on MF. Step 4: Prediction:            If predicate=actual Then  Final prediction=predicted classes            Else if predicated =! Actual and Threshold value<=MF   Then Final predicted = Predicted class            Otherwise   Final prediction= Actual class Step 5: calculated accuracy.   5. Experimental Results and Analysis          Data set Name No. of instances &No. of Classes Before FS No. of  attributes After Fuzzy Soft FS No. of  attributes After Fuzzy rough FS No. of  attributes Thyroid 3772,2 classes 30 21 16 Thoracic Surgery 470,2 classes 16 7 14 Heart 270,2 classes 13 10 7 Breast 699,2 classes 9 8 8 Table 3.1 Comparisons of the different feature selection approaches Dataset Name Classification Technique Kernel function Membership functions Before FS After FR FST After  FS FST Thyroid   IFSVM linear MF1 96.78 93.67 97.13 MF2 94.78 98.53 98.41 RBF MF1 90.61 90.62 98.89 MF2 87.67 89.56 97.56 Poly MF1 93.45 97.53 97.53 MF2 93.5 97.53 97.93 Breast Cancer   IFSVM linear MF1 99.14 99.8 98.29 MF2 99.14 99 97.43 RBF MF1 99.14 99.87 99.14 MF2 99.8 99.14 96.58 Poly MF1 99.25 99.5 98.29 MF2 99.14 99.5 68.29 Thoracic surgery     IFSVM linear MF1 96 99.5 99.5 MF2 96.20 96 99.5 RBF MF1 99.56 98 99.5 MF2 96 97 99.5 Poly MF1 96.20 98.93 97.46 MF2 95 95.23 96 Heart Disease   IFSVM linear MF1 99.5 95.5 99.56 MF2 97.7 95.5 99.5 RBF MF1 99.25 97.7 98.52 MF2 97.7 95.5 98.52 Poly MF1 97.7 95.5 97.7 MF2 95.5 93.3 97.2 Table 3.2 Comparisons of Performance of the Different Classifiers   6. Conclusion      In this work, the proposed Fuzzy Approach based machine learning techniques are being used for multi-classification problems. In this work fuzzy rough and fuzzy soft approaches for Feature selection have been used. Among this feature selection, Fuzzy soft gives better results with high time complexity. In all methods, IFSVM gives promising results on different medical data sets. While computing multi-classification problems unclassifiable regions may exist if a data point belongs to more than one class or does not belong to any class. When no of classes increases time complexity also increases. This problem can be handled  novel model based on support vector domain combined with kernel-based fuzzy clustering.