Adversarial Regularized Class Incremental Autoencoder Technique with Convolution Neural Network for Secure Clustering of Short Text in the High Dimensional Data

Main Article Content

Kiruthika B, Dr. B. Srinivasan, Dr. P. Prabhusundhar

Abstract

High Dimensional data Clustering with differential privacy has been gained significant attention recently in the large scale distributed cloud data center to cluster the high dimensional data with increased security to the data outsourced. Especially deep learning architecture based on deep adversarial regularized hierarchical autoencoder structure provides efficient clustering solution to high dimensional data containing non linear long text. However it provides class imbalance problem in terms of over-fitting and under-fitting issues with high computation complexity. To mitigate those challenges, class incremental deep adversarial regularized multi-view autoencoder technique is proposed in this work. Initially, missing value prediction using factor analysis and dimensionality reduction and data normalization is carried out to high dimensional data using principle component analysis. Principle component analysis technique eliminates the reconstruction errors occurring due to short text in the dimension space to project the sparse matrix. Sparse matrix is used to the select the discriminative features using non linear discriminant analysis. Selected feature is employed to the convolution neural network to process the selected features to establish the cluster using fully connected layer with activation function and softmax layer. Convolution Neural Network is capable of clustering structurally similar attributes and its information containing the normalized short text with increased classes. Proposed model achieves minimized intra cluster similarity and inter cluster variation by computing the data affinity of new representation. Further autoencoder model is to encode cluster structure containing attribute information by reconstructing the adjacency matrix. Decoder model retrieves or suggest the data records on basis of the attribute similarity or structure similarity of the record attribute or user attribute of the high dimensional data. Detailed experimental analysis has been performed on benchmarks datasets to compute the proposed model performance with conventional approaches using cross fold validation. The performance outcome represents that proposed architecture can produce good accuracy and effectiveness on high dimensional data containing the short-text.

Article Details

Section
Articles