The Final Project - I joined a Bootcamp (2/2)

Sometimes consistency seems to be more relevant than quality. Hence, I will try writing a new blog post with no promises regarding the quality of the content. Wow, I am already back to stating such premises exactly as in my first blog, there is no difference between being 16 and 25, apparently.

In our Bootcamp, the last week was fully dedicated to a final project. It was quite a challenge, after a lot of group work in which each member could bring their own strength, all of a sudden, we had to confront ourselves with working solo through the whole process.

The only possible mood at the beginning of the week:

Jokes aside, the reality is that every student spent quite some time thinking about their final project well ahead of the final week. Because four days to think of a topic, find the data, and then run some analyses/models is just not enough.

Topic Set-Up

I knew I wanted to focus on topics that had to do with social issues or the environment, as these are fields in which I would like to work in the future.

Finding a Topic

From the very beginning, one thing appeared evident. When the time is as limited as during the final project week, you have to carefully balance this section and the following one (finding the data!). 

First, let yourself get inspired by themes you would like to learn more about. Research around, typing all possible keys to find useful data. Then, have a reality check with yourself: are you going to find the data that fits exactly your project's idea? You have to keep in mind that you do not have that much time for scraping data or creating your own datasets.
If the answer is no, it is time to start looking at spin-offs of your original plan, or different project ideas altogether.

Sometimes it is easier to just skip this section, and focus on looking for data that is already somewhat clean and usable.

This said I started my topic exploration by visiting the UN development goals. In my opinion, if I want to spend time working on a project, it needs to have a certain impact and relevance.
Here you can read more about each goal and check all the subtopics and work currently being done by the UN.



Initially, I really wanted to focus on renewable energy (goal seven) and the way it is spreading in Europe in relation to funding and promotional projects by governments. I explored the very cool charts and datasets on energy and climate change provided by IEA  and NASA, but I could not come up with something that would fit within the week we had available.

Last-minute, I decided to change my topic to focus on women's rights and gender equality instead. Even more relevant considering that our final project coincided with International Women's Day.


So much work is still needed to achieve the Fifth UN Goal, in spite of all the initiatives that are already being organized (you can read about them here). 
Way too often, we hear of women mistreated, raped, killed. And the scary thing is, that the majority of cases we do not even hear about.
Recently, the case of Sarah Everard seems to have raised more awareness on the topic, because of the organized protests that took place despite the lockdown regulations.
Hearing of Sarah is heartbreaking. 
Young, thinking and dreaming while walking home in what should have been a very normal afternoon. 
And it was not. 
And her dreams are not anymore, too.
Her story shook the conscience of the entire country. 
It is important to remember that she is not alone. Similar cases happen way too often everywhere, and more frequently than not, within what should be the 'safety' of the household.

I started researching data on femicides and domestic violence around the world. Additionally, I have been wondering about the impact that Covid-19 has had on the already too frequent cases of domestic violence and femicides, so I focused specifically on data for 2020 and a few more years.

Where is the Data?

Everywhere. Well, since this does not help and it took me some time to collect all the data sources I needed, I will list here some of the main platforms I consulted.

Kaggle

This platform offers a great variety of simple databases that can come in handy when you have to complete a project rather quickly. Often, the datasets are already clean and can be easily combined with other data you have. Additionally, as it is a public community, it can be useful to browse around and check what other people have been working on. The aim is not to copy, of course, but let yourself get inspired and combine things you like of different projects.

A cool dataset I found for my project's ideas, renewables here and femicides here.

Data.World

This platform is slightly less user-friendly than Kaggle, but it has a lot to offer. In fact, between datasets, projects, and queries, I have a feeling I could have taken advantage of its research function way more. Time was scarce, and I have only scratched the surface of my research.

Here you can find a dataset I considered about violence on women and girls.

OECD

This is a classic. Whatever project you are working on, if you need to complement it with social/economic/political data, there will have something for you here.

When still considering pre- and during Covid as a comparison, I thought of looking at unemployment and similar statistics to analyze inequality developments.

World Bank

Similar to OECD, the World Bank website offers plenty of databases and statistics you can play around with.

ACLED

This is an amazing collection of data on crises and conflicts around the world. You can export what you need using their data export tool.

Unfortunately, they did not have much on Turkey (my final project) and definitely not enough to relate to the femicides dataset I was already working on.

Other cool platforms to check out

Project

Okay, cool. 
You now have way too many ideas and possible data sources?
That is normal. Breath in, breath out.

I started exactly like that as well. First, I explored topics, and exactly as in design thinking, the explorative phase is messy. For a moment, it may seem like you cannot get out of the list of options you wrote down and that keeps increasing. 

But you must.

I usually work with Notion. For this first research phase, I just used a simple empty note and started saving all the links. At the end of each research session, I would cluster the links per topic.
I had given myself the end of the week prior to the project week as a deadline.

first brainstorm notes

For how interesting certain topics were, I decided to prioritize something that was also doable. 
I soon realize that it is much easier to find data from a few years back compared to very recent data. 

And here is how the idea of pre- and during Covid was thrown in the bin.

However, I was still eager to explore variability and patterns related to femicides and domestic violence. Aware of the political polarization that has been taking place in Turkey in the last few years, I decided to explore more the topic of femicides in Turkey.
I was lucky enough to find an amazing database that contained somewhat clean data for femicide cases in the time frame of 2010 to 2017.

Great, then Turkey it is. 

As you can see, for me it has been a lot of back and forth between the ideal project and available sources. I would be interested in hearing from anybody who works in a different way!

Additionally, the dataset came with some research already done on GDP, region, education, and more features on a province level. In an ideal scenario, I would have researched such features by myself using the sources above. I did not necessarily agree with the way education or religion had been defined as features. Furthermore, it would have been great to add extra features as tourism, different ethnical groups, urban vs rural population, etc.

These are all considerations that one can make when presenting the project because it would have been impossible to integrate them.

My original plan was to explore political polarization vs femicides patterns. Hence, I focused on exploring only how to add this feature on top of my data. Trust me, this was already quite some work.

And no, I do not speak Turkish. Yes, I had to spend hours on Google Translate.
Note to self, work with a country with an official language you know when on a tight schedule.

I moved to a new note page on Notion, and I started collecting sources that could be specifically useful for my project. This time, I wrote down what the link was about and stored all PDFs and Datasets in a folder on my computer.

Whenever I save a link to an academic paper I know I will use, I tend to save it already in APA citation form so that I will be able to easily include it in any presentation or description of the project later on.
If you cannot be bothered to make a citation form yourself, this is a great website for it. 

Anyway, a lot of cool data can be found on the Turkish Statistical Institute

Save your sources!

Anyhow, how did I classify the political aspect specifically?

I decided to look at municipal (province) elections in the year just before my dataset (2009), during (2014), and just after (2019).

I collected the results for each province in an Excel spreadsheet. I saved the name of the winning party and the percentage of votes achieved.

Finally, I had to read some articles to get an idea of party tendencies to be able to later classify them on a numerical scale. 

From my Jupyer file:
'Until 2011, in Turkey, instead of using left–right ideology, parties were better scaled on their position in the religious–secular issue. The Turkish left-wing differed from the European understanding of the left, having close ties with the military. (Aydogan, A., & Slapin, J. (2012). Left-Right Reversed: Parties and Ideology in Modern Turkey. SSRN Electronic Journal. doi: 10.2139/ssrn.2165409). Nowadays, however, it can be argued that their positions changed drastically from what was presented in the article. For example, the AKP party moved away from its past secular and liberal stance, becoming more and more conservative. Thus, we will scale parties as following (from 2-left-secular to -2-right-religious). '

Ideally, a future project would also look at how each party can be placed on the scale based on the historical moment. Each party would then not be assigned with the same scale point across the three elections considered.

Other challenges, anybody?

  • I had to cluster all the reasons given for the femicides in more reasonable buckets, some examples:
    • Household Expectations (e.g. 'Woman not preparing food', 'Woman not washing clothes'...)
    • Male Service Expectations (e.g. 'Woman not listening to man's troubles', 'Woman not having children...)
    • Masculinity & Honor (e.g. 'The man cannot control his nerves', 'Message from another man on the woman's phone'...)
    • Separation & Jealousy (e.g. 'Woman's desire to leave', 'Desire for divorce'...)
    • Woman Decision and Preferences (e.g.  'The woman asking the man not to drink alcohol', 'Woman not saying her phone password'...)
  • I clustered the killers' types in buckets. Unfortunately, the majority were family and men close to the woman.
  • I had to merge my regional data with the political dataset I created.
Still following? Here is some translation fun, for a change.

translation of reason's given for femicide


After cleaning...

I moved to the EDA. 
As my data was already so little, I was a bit wary of dropping things. 

Besides some basic explorations, I only dropped some outliers using the Z Score method.

To pre-process my data, I only used the Normalizer on the data. As any transformation (log, sqrt) did not look like the improved my distributions really.

Finally

I moved to apply models.
I tried three different models and evaluated them based on the r score. 
Liner Regression – Final Model. 
Random Forest Regressor.
KNeighborsRegressor.

To be completely honest with you all, when I started my project, my aim was solely exploratory. I felt pre-pressured into attempting some predictive model and I figured I could just do it on top. As expected, the data was not sufficient and it made it impossible to predict anything.

Cool try, Alice. Thank you very much.

Visualize and Present

As expected, I spent quite some time working on Tableau, to discover possible patterns and relations between the data I had available. For the whole thing, you can check Tableau here.

Some outcomes:

An overview of average cases scaled on 1 to 100 (100 based on the province with most cases)


No super clear trend appears. However, when checking GDP (darker = higher) and femicides avg (numbers), it appears that higher numbers are present on the right side of the map. Traditionally, this side has lower GDP and more ethnic diversity. These are just considerations, not necessarily causes.


The woman's partner is the most frequent perpetrator of the femicide.



Anyway, this was all to give an idea of how to work on the project. The rest can be found on my github repo

How to present all of this?

Repeat after me
Storytelling is key

And please, please use Canva or some proper slides template.

Let me paint a picture of what your final day's presentation will look like. 20something people have to present their work in a couple of hours. Everybody's project has more or less the same features, as there are limits to what can be learned in 9 weeks. 

Also, until each one's turn is done, at least 60% of their brain capacity will actually be busy comparing their work to yours. And after they will be done presenting, they will switch off even that 40% capacity they had, ignoring at least the 2 presentations following theirs.

I hope this makes it crystal clear that being compelling is quite relevant.
Also, do it for your classmates, for your mama, for whoever you want, but time yourself in advance. It is not cool to go over time, while the others tried sticking to the time limit.

Here you can sneak peek at my presentation. 
Probably one of the coolest ones of our cohort was this one. Yes, you can write Manuel how good he is.


Wow, 49624962 links later this post is done(?).

Not sure, but I sure am.

Comments

Popular Posts