Imbalanced Data Analyzer

During my final semester of college, I took a course in Data Mining. The professor, Nitesh Chawla is also my Research Advisor and Director at iCeNSA. For the class, he stated that my group and I could continue my research project, where Professor Chawla wanted a prototype for a one-stop shop web framework for automated classification of large sets of imbalanced data.

The Solution

Thinking from a framework perspective, it made the most sense to use Django as the web framework so I could utilize the various scientific and mathematical libraries that Python provides, more specifically Pandas and Scikit-Learn. For building the API, I also decided to use Django Rest Framework. From Django’s built-in admin panel, I could handle the upload and storage of dataset files to test our application on.

Picture of the Dashboard View The dashboard showing the results of each data analysis

Whenever a user creates an analysis from the front-end application, the analysis is stored in the back-end and then processed in a background Celery task, executing the selected classifier. After the Celery task is finished processing, the accuracy, precision, and f1 score, in addition to the support are saved into the database. Additionally, PNGs are created of the precision graphs and ROC curves and also saved, sending these back to the front-end. In the end, the project served as a successful proof-of-concept and starting point for the one-stop shop.