Introduction
For my DGAH midterm project, I was tasked with showcasing the plethora of tools we’ve learned in class so far by using the data provided. While there were many interesting data options to choose from, I decided to go down a route I was sort of familiar with. While I am by no means an expert on X-Men trivia, I do have some basic (emphasis on basic) knowledge of the X-men. So, I figured, why not? If I am going to have to choose one of the data sets provided, I might as well choose the one I am somewhat familiar with.
Sources
For this assignment, I chose to utilize the uncanny-xmen-200-249-characters.csv file. The file contains the names of the characters present in the Uncanny X-Men issues 200–249. On top of that, the file contains statistics for each character, such as kill count, whether they were rendered unconscious, captured, declared dead, etc.
Before I could begin creating my visual model, there was some data cleaning to do. Using the open refine tool we learned in class, I was able to do most of my data cleaning, such as trimming white space, changing the variations of a name to a singular variation, sorting the data to fit my needs, etc. However, I still needed to do some manual cleaning, which I did through Excel. Some of the manual cleaning involved changing the character names from “Rogue = Name Unknown” to “Rogue.” Additionally, I had to include the issue number next to the names of the characters in order for my visual model to function as I wanted it to. This was a simple change from “Kitty Pryde” to “Kitty Pryde (203)” and so on.
Process and Presentation
After much deliberation, I decided to create my visualization based on the hand-holding relationships between the characters by issue. The data available for this character relationship was sufficient to make a significant visual representation. While I initially tried creating my visual model through RAWGraphs.io, I found that I could make a much cleaner version through Flourish. However, I did run into difficulties making the network links function properly. Either the links were not there at all or there were mismatched links and singular points scattered around. The only way I found to solve this issue was by adding the issue number next to the character’s names, as mentioned in my sources’ paragraph.
By hovering your mouse pointer over a link in my model, you will see the names of the characters who had a hand-holding relationship in the color-oriented issue number, which can be found on the top left. Additionally, if you only want to see the relationships for certain issues, you can click on an issue number on the top left to hide it. Just to add some “creativity,” I included the X-Men logo on the top right.
Significance
For sure, one thing this midterm project taught me is that data cleaning can really make your life easier. Even simple cleaning, like removing trailing or leading white space, can really make a difference. Also, if I don’t need certain columns of data, it doesn’t hurt to remove them. Data isn’t always about having as much information as possible, but having enough information for your specific purposes. This is one of the differences I found regarding DGAH and data science. With regards to data science, I find that it is very helpful to have a ton of information, as long as it is stored in a clear and concise manner, as opposed to DGAH, where you really only need data that relates to your goal.
One approach I thought of doing, which would’ve been cool but time-consuming, was to showcase every single type of character relationship by issue number. For example, instead of the top left legend only showing the issue number, there would also be a part for the different types of relationships shown in the original raw csv file. I can only imagine how crazy the visual model would’ve looked with everything on the screen.