American inventor Dean Kamen once said, “Every once in a while, a new technology, an old problem, and a big idea turn into an innovation.” In December 2018, those words were reminiscent as 32 teams of graduate students participated in the UIC Student Project Expo organized by the University of Illinois Business College Capstone graduate course in the Business School’s Information and Decision Sciences curriculum.
The capstone course is a project-based course which involves the execution of an information systems and data analysis project, with student teams of four (on average) at both the undergraduate and graduate level. Projects are typically sponsored by a university researcher or a company from private industry.
This fall, Ekta co-sponsored a project with Sigma Chi, the national fraternal organization, to identify strategies for consolidating over 200 databases. Our team, led by Saumya Agrawal who is a student in the Master’s program, beat out the competition for the grand prize, including many much larger and well-established players such as Weber, FCB, UI Health and Hub Group.The projects were judged in 3 categories, and we emerged victorious in terms of quality, presentation and project output.
The Ekta Team Challenge
The Ekta team was comprised of four members: Saumya Agrawal (team captain), Ajinkya Tope, Mohammed Rehan, and Srija Gupta. The group presented a groundbreaking data normalization and unification project which demonstrated a path to resolving the multiple data sources as well as some custom business metrics indicating the value of a “true-up” between any two databases.
Ekta was asked by the Sigma Chi organization in August to develop a systematic and automated approach for semantic mapping, data unification, cleansing and characterization of the amount, rate and ordinal drift between multiple data sources.
The Data Normalization And Unification Project (UNICON for short!) basically operates in 6 steps:
1. Ingesting The Databases. The first step was converting all databases to Excel and sending them to a local SQL.
2. Data Wrangling. Data is converted into a standard format.
3. Exploring Data Analysis. National database and Chapter databases were analyzed separately.
4. Deploying Database On Cloud. After the database has been corrected and unified, it’s deployed to Azure for accessibility and usability.
5. Schema Matching. Columns of National and Chapter databases were mapped using the Levenshtein Algorithm and Bag of words approach.
6. Developing A Semi-supervised Tool. A web application was built to automate the process and assist humans in resolving database record conflicts.
The project’s detailed analysis was performed using R, Tableau, python, Excel and SQL. The implementation of semantic mapping and record unification was developed as a custom web application from scratch. All of this, along with the unique use of the Levenshtein algorithm, is a fantastic example of the type of leading-edge solutions the competition is intended to produced. We are so proud of Saumya and the team’s accomplishment and are excited to harness her talents for our Ekta clients.