Then train the IFSVM with this membership values. In this paper

proposed a new approach along with membership values for reducing the

misclassification rate that is set the threshold value based on membership

values, after training Fuzzy approaches still the point is misclassified but

membership value is greater than the threshold it is predicated as actual

class.

Fig1: Proposed Methodology

The paper is organized

as follows. The preprocessing of raw data has been presented in Section2 and the

proposed FS approaches have been discussed in Section 3. The classification

using IFSVM is presented in Section4. The Experimental results and analysis on

selected four data set has been discussed in setcion5 and the work has been

concluded in Section6.

2. Preprocessing

In this work, the data sets from UCI

machine Repository have been considered. Before developing a model the data has

to be analyzed and it should be understood to know the structure and relevance

of features. The data includes a countable number of missing values for number

of features and some of feature values are continuous and other discrete. An

even more noise can be present in the data, which demands the cleaning of data

in preparing the dataset for classification analysis 15. In this hybrid

approach, as a part of cleaning, missing values of attributes have been

replaced by mean of all the values of the attribute and based on equal area

method the continuous values have been discretized in this paper.

3. Feature Selection

After preprocessing feature subset selection

has been employed. FS refers to the problem of selecting those input attributes

that are most predictive of a given outcome. Unlike other dimensionality

reduction methods, feature selectors preserve the original meaning of the

features after reduction. FS has been mainly employed where huge number of

features or presence of large feature space has found in the given dataset.

Because the noisy features, irrelevant and redundant feature present in the

given feature space may leads to poor classification performace. FS techniques have also been applied to small

and medium-sized datasets in order to locate the most informative features for

later use. The importance of feature selection is to reduce the problem size by

improving the quality and speed of classification. In this work FS has been

performed based on fuzzy rough sets and fuzzy soft sets.

3.1 Fuzzy Rough

The most important and widely used

concept of fuzzy rough was originated by Dubois and prade 11. This demonstrates the power

of fuzzy-rough set theory in handling the vagueness and uncertainty often

present in data. Because of fuzzy nature present in most of real world problems,

expansion of rough approximation into fuzzy space helps in solving real world

problems. In that first fuzzyification is done after calculate the dependency

of each attribute is calculated after select highest dependency attribute

continue this procedure until dependency does not change. Finally attribute set is obtained with reduced

features.

3.2 Fuzzy soft

Fuzzy Soft sets are the combination of both

fuzzy sets and soft sets. Fuzzification is a process of converting continuous

data into categorical data by assigning a membership value ranging from 0, 1.

Fuzzification is done by applying membership functions. There are many

membership functions like triangular, sigmoid, trapezoidal member ship

functions. Triangular membership function has been selected for our work

because of simple formula and computational efficiency. In Fuzzy Soft sets

first Fuzzification will be done and next normal parameter reduction will be

done.

Definition:

Let ?(U) denote the set of

all fuzzy sets of U. Let Ai ? E. A pair (Fi, Ai) is called a fuzzy soft set over

U, where Fi is a mapping given by Fi : Ai?

?(U)12.

4. Classification Based on

IFSVM

After

selecting the relevant features from the dataset then train with IFSVM classification

technique. In this work fuzzy approach based machine learning technique has

been implemented. One of the main drawbacks of the standard SVM is that

the training process of the SVM is sensitive to the outliers or noise in the

training dataset due to over fitting. In many realword applications community,

due to over-fitting problem in SVMs, the training process is particularly

sensitive to those sample points which are far away from their own class in the

training dataset. SVM considers all data points with the equal importance in

classification problems. But Different

input points can make different contributions to the learning of decision

surface. But these kinds of uncertainty points may be more important than

others for making decision, which leads to the problem of over fitting. As

fuzzy approaches are effective in solving uncertain problems, this problem for

SVM can be handled with IFSVM. It is

very important to assign each data point in the training dataset with a

membership in order to decrease the effect of those outliers or noises.

Data

points with large membership value can be treated as more envoy point of that

class where as the points with small membership value should be considered as

less significant point, then the contribution of abnormal data points with

minimum membership towards error will get reduced. In fact, this fuzzy at

present, Fuzzy based ML techniques faces two main difficulties: How to set

fuzzy memberships and how to decrease computational complexity. But computing

fuzzy memberships is still a challenge. This work concentrated on two

approaches for computing the fuzzy membership values namely Iterative and fuzzy

clustering. Membership value determines how important it is to classify a data

sample correctly.

4.1

Determining Membership Values

4.1.1 Iterative Approach

The membership values for only misclassified points

by using two membership values have been calculated as follows:

Calculated the ?i = max {0, 1 ? dig (xi)} i.e slack variable

For Membership Function1 (MF1) ?i is given as

(

) =

For Membership Function2 (MF2) slack variable is lsig (?i) = tanh(?i)

The MF1 applied is given blow:

hcnt (?) = h(?) = 1 if ?<1 = 1/ ? otherwise Where, the membership is inversely proportional to the distance from the hyper plane. The MF2employed is presented below: hsig (?) = sech2 (?) Where, ?i = max {0, 1 ? dig (xi)}, di is class label, and g (xi) =wT? (xi) + b. 4.1.2 Fuzzy Clustering The steps applied in fuzzy clustering are as follows: 1) Selected a clustering algorithm 2) Applied selected clustering on the training data set 3) Determined a subset of clusters that contain both normal and abnormal data. And mark the subset as MIXEDCLUS. 4) For each data point x ? MIXEDCLUS, fuzzy membership value has been set to1 5) For each data point x not belongs MIXEDCLUS, Identified the center of closest cluster to x and then calculated fuzzy membership of x with that cluster. 4.2 IFSVM In IFSVM, membership values are generated iteratively based on the positions of training vectors relative to the SVM decision surface itself. The calculation of the membership values using an IFSVM that makes no a-priori assumptions about the shape of the distribution of the training vectors. This method makes use of the result of the SVM training process and information about incorrectly classified training vectors (error vectors) to tune the membership values. The FSVM is then retrained with these new values, and the process repeated either for a fixed number of iterations or until the membership values converge 14. ALGORITHM Step 1: SVM has been applied on the given dataset. For applying or Training SVM on given dataset we need to calculate ?, w, b values. ?1…?N values have been calculated such that Q(?) = ??i - ½???i?jyiyjxiTxj is maximized and ??iyi = 0, 0 ? ?i ? C for all ? After calculating ? value w value has been calculated such that Weight vector = ??iy i xi(0<=?i<=C) Then b value is obtained using equation. Bias=1/n?sv(Ysv-??iyixi.xsv) Classification: If wT? (xi) + b >0 is 1

else wT? (xi)

+ b <0 is -1
Misclassified points have been determined by comparing
original data with predicted classes. Calculated MF values for each
misclassified point using MF1 and MF2.
Step2:
1) Set s =
1.
2) Solved
the FSVM dual training problem. ?1…?N values haven calculated such that
Q(?) =??i - ½???i?jyiyjxiTxj is maximized and ??iyi = 0, 0 ? ?i ? Si*C for all
?
3) For all i? ZN set (where 0 < µ < 1):
si = spreviousi + µ(h(?i) ? spreviousi)
4) Stop if
termination condition has been met.
5)
Otherwise repeat from step 2
The termination conditions employed are:
· The first (and simpler) termination condition is to stop after a fixed
number (n) of iterations
· The second termination condition is to continue until the rate of change
of s becomes sufficiently small; indicating that the class membership vector s
has converged to some value. si ? sprevious i ?
10?3 ?i ? ZN.
Step
3: the threshold value has been fixed based on MF.
Step
4: Prediction:
If
predicate=actual Then Final prediction=predicted classes
Else
if predicated =! Actual and Threshold value<=MF Then Final predicted = Predicted class
Otherwise
Final prediction= Actual class
Step
5: calculated accuracy.
5. Experimental Results and
Analysis
Data set Name
No. of instances &No. of Classes
Before FS No. of
attributes
After Fuzzy Soft FS No. of attributes
After Fuzzy rough FS No. of attributes
Thyroid
3772,2 classes
30
21
16
Thoracic Surgery
470,2 classes
16
7
14
Heart
270,2
classes
13
10
7
Breast
699,2
classes
9
8
8
Table 3.1 Comparisons of the
different feature selection approaches
Dataset
Name
Classification
Technique
Kernel
function
Membership
functions
Before FS
After FR
FST
After FS FST
Thyroid
IFSVM
linear
MF1
96.78
93.67
97.13
MF2
94.78
98.53
98.41
RBF
MF1
90.61
90.62
98.89
MF2
87.67
89.56
97.56
Poly
MF1
93.45
97.53
97.53
MF2
93.5
97.53
97.93
Breast
Cancer
IFSVM
linear
MF1
99.14
99.8
98.29
MF2
99.14
99
97.43
RBF
MF1
99.14
99.87
99.14
MF2
99.8
99.14
96.58
Poly
MF1
99.25
99.5
98.29
MF2
99.14
99.5
68.29
Thoracic
surgery
IFSVM
linear
MF1
96
99.5
99.5
MF2
96.20
96
99.5
RBF
MF1
99.56
98
99.5
MF2
96
97
99.5
Poly
MF1
96.20
98.93
97.46
MF2
95
95.23
96
Heart
Disease
IFSVM
linear
MF1
99.5
95.5
99.56
MF2
97.7
95.5
99.5
RBF
MF1
99.25
97.7
98.52
MF2
97.7
95.5
98.52
Poly
MF1
97.7
95.5
97.7
MF2
95.5
93.3
97.2
Table 3.2 Comparisons of
Performance of the Different Classifiers
6. Conclusion
In
this work, the proposed Fuzzy Approach based machine learning techniques are
being used for multi-classification problems. In this work fuzzy rough and
fuzzy soft approaches for Feature selection have been used. Among this feature
selection, Fuzzy soft gives better results with high time complexity. In all
methods, IFSVM gives promising results on different medical data sets. While
computing multi-classification problems unclassifiable regions may exist if a
data point belongs to more than one class or does not belong to any class. When
no of classes increases time complexity also increases. This problem can be handled
novel model based on support vector
domain combined with kernel-based fuzzy clustering.