Code:-
JPJA2311
Abstract:-
As a side effect of increasingly popular social media, cyber bullying has emerged as a serious problem afflicting children,adolescents and young adults. Machine learning techniques make automatic detection of bullying messages in social media possible,and this could help to construct a healthy and safe social media environment. In this meaningful research area, one critical issue is robust and discriminative numerical representation learning of text messages. In this paper, we propose a new representation learning method to tackle this problem. Our method named Semantic-Enhanced Marginalized Denoising Auto-Encoder (smSDA) is developed via semantic extension of the popular deep learning model stacked denoising autoencoder. The semantic extension consists of semantic dropout noise and sparsity constraints, where the semantic dropout noise is designed based on domain knowledge and the word embedding technique. Our proposed method is able to exploit the hidden feature structure of bullying information and learn arobust and discriminative representation of text. Comprehensive experiments on two public cyber bullying corpora (Twitter andMySpace) are conducted, and the results show that our proposed approaches outperform other baseline text representation learning methods.
Existing System:-
Disadvantages of Existing System:-
The first and also critical step is the numerical representation learning for text messages.</li><li>Secondly, cyber bullying is hard to describe and judge from a third view due to its intrinsic ambiguities.</li> <li>Thirdly, due to protection of Internet users and privacy issues, only a small portion of messages are left on the Internet, and most bullying posts are deleted.</li>
Proposed System:-
Three kinds of information including text, user demography, and social network features are often used in cyber bullying detection. Since the text content is the most reliable, our work here focuses on text-based cyber bullying detection.</li><li>In this paper, we investigate one deep learning method named stacked denoising autoencoder (SDA). SDA stacks several denoising autoencoders and concatenates the output of each layer as the learned representation. Each denoising autoencoder in SDA is trained to recover the input data from a corrupted version of it. The input is corrupted by randomly setting some of the input to zero, which is called dropout noise. This denoising process helps the autoencoders to learn robust representation.</li> <li>In addition, each autoencoder layer is intended to learn an increasingly abstract representation of the input.</li> <li>In this paper, we develop a new text representation model based on a variant of SDA: marginalized stacked denoising autoencoders (mSDA), which adopts linear instead of nonlinear projection to accelerate training and marginalizes infinite noise distribution in order to learn more robust representations.</li> <li>We utilize semantic information to expand mSDA and develop Semantic-enhanced Marginalized Stacked Denoising Autoencoders (smSDA). The semantic information consists of bullying words. An automatic extraction of bullying words based on word embeddings is proposed so that the involved human labor can be reduced. During training of smSDA, we attempt to reconstruct bullying features from other normal words by discovering the latent structure, i.e. correlation, between bullying and normal words. The intuition behind this idea is that some bullying messages do not contain bullying words. The correlation information discovered by smSDA helps to reconstruct bullying features from normal words, and this in turn facilitates detection of bullying messages without containing bullying words.</li>
Advantages of Proposed System:-
Our proposed Semantic-enhanced Marginalized Stacked Denoising Autoencoder is able to learn robust features from BoW representation in an efficientand effective way. These robust features are learned by reconstructing original input from corrupted(i.e., missing) ones. The new feature spacecan improve the performance of cyber bullying detection even with a small labeled training corpus.</li> <li>Semantic information is incorporated into the reconstruction process via the designing of semantic dropout noises and imposing sparsity constraints on mapping matrix. In our framework, high-quality semantic information, i.e., bullying words, can be extracted automatically through word embeddings.</li> <li>Finally, these specialized modifications make the new feature space more discriminative and this in turn facilitates bullying detection.</li> <li>Comprehensive experiments on real-data sets have verified the performance of our proposed model.</li>
Hardware Requirements:-
Software Requirements:-
- Operating system : Windows 10/11.<li>Coding Language : JAVA.</li> <li>Frontend : JSP, HTML, CSS, JavaScript.</li> <li>IDE Tool : Apache Netbeans IDE 16.</li> <li>Database : MYSQL.</li>
Cost:-
Rs 2000
Detection of Bullying Messages in Social Media
Code:
JPJA2311
Abstract:
As a side effect of increasingly popular social media, cyber bullying has emerged as a serious problem afflicting children,adolescents and young adults. Machine learning techniques make automatic detection of bullying messages in social media possible,and this could help to construct a healthy and safe social media environment. In this meaningful research area, one critical issue is robust and discriminative numerical representation learning of text messages. In this paper, we propose a new representation learning method to tackle this problem. Our method named Semantic-Enhanced Marginalized Denoising Auto-Encoder (smSDA) is developed via semantic extension of the popular deep learning model stacked denoising autoencoder. The semantic extension consists of semantic dropout noise and sparsity constraints, where the semantic dropout noise is designed based on domain knowledge and the word embedding technique. Our proposed method is able to exploit the hidden feature structure of bullying information and learn arobust and discriminative representation of text. Comprehensive experiments on two public cyber bullying corpora (Twitter andMySpace) are conducted, and the results show that our proposed approaches outperform other baseline text representation learning methods.
Existing System:
Disadvantages of Existing System:
The first and also critical step is the numerical representation learning for text messages.</li><li>Secondly, cyber bullying is hard to describe and judge from a third view due to its intrinsic ambiguities.</li> <li>Thirdly, due to protection of Internet users and privacy issues, only a small portion of messages are left on the Internet, and most bullying posts are deleted.</li>
Proposed System:
Three kinds of information including text, user demography, and social network features are often used in cyber bullying detection. Since the text content is the most reliable, our work here focuses on text-based cyber bullying detection.</li><li>In this paper, we investigate one deep learning method named stacked denoising autoencoder (SDA). SDA stacks several denoising autoencoders and concatenates the output of each layer as the learned representation. Each denoising autoencoder in SDA is trained to recover the input data from a corrupted version of it. The input is corrupted by randomly setting some of the input to zero, which is called dropout noise. This denoising process helps the autoencoders to learn robust representation.</li> <li>In addition, each autoencoder layer is intended to learn an increasingly abstract representation of the input.</li> <li>In this paper, we develop a new text representation model based on a variant of SDA: marginalized stacked denoising autoencoders (mSDA), which adopts linear instead of nonlinear projection to accelerate training and marginalizes infinite noise distribution in order to learn more robust representations.</li> <li>We utilize semantic information to expand mSDA and develop Semantic-enhanced Marginalized Stacked Denoising Autoencoders (smSDA). The semantic information consists of bullying words. An automatic extraction of bullying words based on word embeddings is proposed so that the involved human labor can be reduced. During training of smSDA, we attempt to reconstruct bullying features from other normal words by discovering the latent structure, i.e. correlation, between bullying and normal words. The intuition behind this idea is that some bullying messages do not contain bullying words. The correlation information discovered by smSDA helps to reconstruct bullying features from normal words, and this in turn facilitates detection of bullying messages without containing bullying words.</li>
Advantages of Proposed System:
Our proposed Semantic-enhanced Marginalized Stacked Denoising Autoencoder is able to learn robust features from BoW representation in an efficientand effective way. These robust features are learned by reconstructing original input from corrupted(i.e., missing) ones. The new feature spacecan improve the performance of cyber bullying detection even with a small labeled training corpus.</li> <li>Semantic information is incorporated into the reconstruction process via the designing of semantic dropout noises and imposing sparsity constraints on mapping matrix. In our framework, high-quality semantic information, i.e., bullying words, can be extracted automatically through word embeddings.</li> <li>Finally, these specialized modifications make the new feature space more discriminative and this in turn facilitates bullying detection.</li> <li>Comprehensive experiments on real-data sets have verified the performance of our proposed model.</li>
Hardware Requirements:
Software Requirements:
- Operating system : Windows 10/11.<li>Coding Language : JAVA.</li> <li>Frontend : JSP, HTML, CSS, JavaScript.</li> <li>IDE Tool : Apache Netbeans IDE 16.</li> <li>Database : MYSQL.</li>
Cost:
Rs 2000
Additional Information
Tools Used:
Java
Cost:
₹Rs 2000