FusionNet is a neural network model designed to improve sentiment analysis by combining semantic and statistical representations of text. It leverages GloVe-based embeddings and combines convolutional and recurrent networks for better contextual understanding of input data.
Its host repository contains a cross-comparison with classical machine learning techniques (Naïve Bayes, Support Vector Machines) and contemporary techniques such as fine-tuning transformers.
FusionNet can be integrated with customer relationship management systems and recommended engines to provide insight upon the latest cinematic trends.
Project Overview
Project Overview
The project explores multiple machine learning techniques for sentiment analysis on the Rotten Tomatoes dataset, including classical methods like Naive Bayes and modern approaches such as FusionNet. The final model integrates semantic embeddings and statistical preprocessing to enhance performance. FusionNet builds upon the architecture proposed by Luo et al. (2022), employing GloVe embeddings and advanced neural networks for optimal sentiment prediction.
Objeto del proyecto
Objeto del proyecto
This project was the winning submission to an intraclass Datathon held by an IE Adjunct Professor Alejandro Vaca Serrano.
The final product was inspired by the need to overcome the limitations of traditional machine learning methods in text vectorization, particularly in capturing word order and meaning. By leveraging advanced neural network architectures, FusionNet aims to enhance classification accuracy.
Technical Details
Technical Details
FusionNet was constructed with the following:Embeddings: Pretrained GloVe word embeddings (100-dimensional vectors).
- Model Architecture:A combination of convolutional layers, GRUs, and feed-forward networks.
- Frameworks and Libraries: PyTorch and PyTorch Lightning.
- Training Strategies: Includes techniques like early stopping, hyperparameter tuning, and binary cross-entropy loss for optimization.
- Datasets: Rotten Tomatoes dataset and IMDb reviews, with preprocessing to handle tokenization, stop-word removal, and padding.
Challenges and Solutions
Challenges and Solutions
Challenge One: Balancing the need for high-dimensional embeddings with computational efficiency.
- Solution: Conducting sensitivity analyses to determine the optimal embedding dimensions and kernel sizes.
Challenge Two: Training on large datasets with limited resources.
- Solution: Utilizing Google Colab's GPU resources and adopting floating-point representation (fp16) to reduce computational overhead.
Collaboration and Teamwork
Collaboration and Teamwork
The project was developed collaboratively, incorporating insights from published research and leveraging feedback from peers and professors. The teamwork facilitated rigorous experimentation and implementation of cutting-edge methodologies.
Learning and Takeaways
Learning and Takeaways
- Gained practical experience in designing and optimizing neural networks for sentiment analysis.
- Developed skills in preprocessing text data, implementing machine learning pipelines, and tuning hyperparameters for neural network models.
Future Development
Future Development
- Expanding the model to handle multi-class sentiment analysis.
- Exploring alternative embeddings like FastText or Word2Vec for comparison.