07. July 2019
100 Best – Hip Hop 3/4
Another summer Sunday, another day for radioeins to award the 100 best songs. This time it is all about Hip-Hop. Personally, I’m a big fan of present Hip-Hop artist, such as Kendrick, or the recently released records of Fatoni or Tua, but as I know the editors and listeners of radioeins, the top songs will be something from the 80’s, the early era of Hip-Hop.
Wrap-Up
Just like I’ve done in last weeks post, I will wrap up my current code and see, whether I can catch some strange behavior form the radioeins.de page.
1 - Inconsistent number of jury members
Last week, I noticed a changing number of jury members in different page-calls. To check for this, I created a small loop. It does several website calls and prints out new jury members. The code can be seen below, nothing fancy, except the %notin% comparison I like very much. The goal is to observe a possibly varying number of judges for multiple page calls.
|
|
2 – Empty columns
Have a closer look at Jacho. If you use the element selector, you can see a fourth, empty column in his top 10 table. I can’t tell if this happened intentionally or by accident. It doesn’t matter, I just have to check for this kind of behavior. I compare the number of columns with the expected number and remove an unnecessary fourth column.
It just occurs to me, that this is very case-specific and I should generalize this problem a little bit more.
|
|
3 – Apostrophes
Let’s take a look at last weeks number 3 of the 100 best hippie songs. Buffalo Springfield, with For what it’s worth. Now, a big question: What do you use as your standard apostrophe? Of course, there is no other way than using shift + #
, but it appears, there are different ways, it is used in the radioeins charts.
|
|
Exactly, there are four different ways of using an apostrophe. I was wondering, if it makes sense, to define a single apostrophe and change all differing entries. I came to the conclusion, that I can simply remove all apostrophes, because they don’t really serve an identifying purpose, but make reading easier. The same happens with other punctuation signs.
Maybe I should make an analysis of the most popular apostrophes.
|
|
Additionally to removing punctuation, I’ll also transform upper case letters lower case letter for songs and artists. It shouldn’t has an effect on the identifying value, but should improve the final results.
Similarities
Because, things like punctuations doesn’t change the text very much, I was wondering, whether there is a way to calculate the difference between two texts. I should make a more detailed blog post about this method, but for now, we can inspect the results, when estimating the difference between song titles and artists. I calculate a difference value for the both of them. Afterwards, I’ll add my originally calculated score, in order to manually check if I miss some very significant songs or artists. The code is not very eloquent, but works for now.
|
|
A peak of this table can be seen below. It basically reveals a lot of special cases of different spelling. Maybe I can include these values for the next week to improve my results. We can also see, that this approach works pretty good to identify similar songs or artists.
row | col | value | artist | artist_row | artist_col | score_row | score_col | song_row | song_col | place_row | place_col | new_score |
---|---|---|---|---|---|---|---|---|---|---|---|---|
85 | 35 | 0 | 1 | snoopdoggfeatpharrellwillams | snoopdoggfeatpharrellwilliams | 12 | 24 | dropitlikeitshot | dropitlikeitshot | 85 | 35 | 36 |
491 | 45 | 0 | 1 | africabambaataa&soulsonicforce | afrikabambaataa&soulsonicforce | 2 | 20 | planetrock | planetrock | 491 | 45 | 22 |
276 | 90 | 0 | 1 | pubicenemy | publicenemy | 6 | 12 | dontbelievehype | dontbelievehype | 276 | 90 | 18 |
476 | 186 | 0 | 1 | fishmob | fischmob | 2 | 8 | susannezurfreiheit | susannezurfreiheit | 476 | 186 | 10 |
293 | 200 | 0 | 1 | käptnpeng&dietenktakelvondelphi | käptnpeng&dietentakelvondelphi | 6 | 8 | deranfangistnah | deranfangistnah | 293 | 200 | 14 |
209 | 208 | 0 | 1 | drdrefeatsnoopdoggkoruptnatedogg | drdrefeatsnoopdoggkuruptnatedogg | 8 | 8 | nextepisode | nextepisode | 209 | 208 | 16 |
198 | 54 | 0 | 2 | llcooljay | llcoolj | 8 | 17 | ineedlove | ineedlove | 198 | 54 | 25 |
514 | 470 | 0 | 4 | jurassicfive | jurassic5 | 1 | 2 | concreteschoolyard | concreteschoolyard | 514 | 470 | 3 |
274 | 238 | 0 | 5 | rootsfeaturingerykahbadu | rootsfeaterykahbadu | 6 | 7 | yougotme | yougotme | 274 | 238 | 13 |
217 | 113 | 0 | 6 | cooliofeatlv | coolio | 8 | 12 | gangstasparadise | gangstasparadise | 217 | 113 | 20 |
330 | 146 | 0 | 6 | missieelliott&dabrat | missyelliottfeatdabrat | 5 | 10 | sockit2me | sockit2me | 330 | 146 | 15 |
495 | 27 | 0 | 7 | tonelōc | icecube | 1 | 30 | itwasagoodday | itwasagoodday | 495 | 27 | 31 |
489 | 95 | 0 | 8 | amine | mcsolaar | 2 | 12 | caroline | caroline | 489 | 95 | 14 |
532 | 176 | 0 | 8 | beginner | absolutebeginner | 1 | 10 | hammerhart | hammerhart | 532 | 176 | 11 |
502 | 454 | 0 | 11 | saltnpepa | saltnpepafeatenvogue | 1 | 2 | whattaman | whattaman | 502 | 454 | 3 |
This weeks results
This much for the new scraping insights, lets go to my results for this week. As usually I’m gonna show the first 25 entries, according to my web-scraping script.
Have fun re-listening the show on spotify and see you next week.
place | artist | song | score | mentioned | average place |
---|---|---|---|---|---|
1 | grandmasterflash&furiousfive | message | 272 | 32 | 3.312500 |
2 | eminem | loseyourself | 134 | 20 | 4.550000 |
3 | missyelliott | geturfreakon | 125 | 18 | 4.444444 |
4 | sugarhillgang | rappersdelight | 109 | 20 | 5.750000 |
5 | publicenemy | fightpower | 109 | 18 | 5.277778 |
6 | cypresshill | insaneinbrain | 65 | 13 | 6.153846 |
7 | beastieboys | sabotage | 59 | 11 | 5.818182 |
8 | nwa | fuckthapolice | 57 | 9 | 5.000000 |
9 | beastieboys | intergalactic | 55 | 7 | 3.857143 |
10 | 2pacfeatdrdre&rogertroutman | californialove | 52 | 12 | 6.666667 |
11 | atribecalledquest | canikickit? | 49 | 8 | 5.125000 |
12 | rundmcvsaerosmith | walkthisway | 49 | 7 | 4.571429 |
13 | missyelliott | workit | 49 | 6 | 3.166667 |
14 | beastieboys | (yougotta)fightforyourright(toparty) | 48 | 6 | 3.666667 |
15 | houseofpain | jumparound | 43 | 9 | 6.333333 |
16 | kendricklamar | humble | 43 | 5 | 3.200000 |
17 | delasoul | memyself&i | 39 | 6 | 5.000000 |
18 | drdrefeatsnoopdogg | nuthinbutagthang | 38 | 5 | 3.400000 |
19 | beastieboys | sureshot | 38 | 4 | 2.250000 |
20 | nas | nystateofmind | 37 | 6 | 5.500000 |
21 | saltnpepa | pushit | 34 | 6 | 5.500000 |
22 | atribecalledquest | wepeople | 34 | 3 | 1.333333 |
23 | rootsfeatcodychesnutt | seed(20) | 33 | 5 | 4.800000 |
24 | beastieboys | sowhatchawant | 33 | 4 | 3.500000 |
25 | nwa | straightouttacompton | 32 | 6 | 6.000000 |