100 Best – Hippie- and Protest Songs 2/4

Recap summer songs

It is Sunday again and my favorite radio station radioeins is doing another round of “The 100 best…” This time, it is the 100 best Hippie- and Protest songs.

I did a closer look at the last weeks process of web-scraping and found some peculiarities. The webpage doesn’t always appear to show all jury members. For different html reads I got 109 or 110 different persons, so the results might no be consistent. I should inspect this behavior a little bit more.

Naming Schemes

Another Problem, were different names for the same songs and artists, which means, the final score doesn’t add up in the end. As a next step, I will point out some naming schemes, which lead a false result.

1 – Different versions but the same songs, Live versions

1
2
3
4
artist          song                  avg_place score mentioned
  <chr>           <chr>                     <dbl> <dbl>     <int>
1 Kool & The Gang Summer Madness             11.4    26         5
2 Kool & The Gang Summer Madness (Live)      10       1         1

-> get rid of the live indication in a song name, because they will be counted as one.

2 – “The”

Some jury member, used an article for a band, some don’t. When they are sorted alphabetically for the final place, it won’t be considered, no matter what. So I just exclude it from the data. Here is a nice example.

1
2
3
4
artist         song            avg_place score mentioned
  <chr>          <chr>               <dbl> <dbl>     <int>
1 The Beach Boys Good Vibrations      5.33    56         9
2 Beach Boys     Good Vibrations      5        6         1

3 – “And” and “&”

Some artist have “And” or “&” in their name. I have to make it more consistent.

See the Fresh Prince:

1
2
3
4
artist                             song       avg_place score mentioned
  <chr>                              <chr>          <dbl> <dbl>     <int>
1 DJ Jazzy Jeff & The Fresh Prince   Summertime      6.67    34         6
2 DJ Jazzy Jeff And The Fresh Prince Summertime      2       10         1

4 – Special Characters, line breaks

Extracting the information from html, in some cases leads to data, still containing functional character, such as the line break \n.

1
2
3
4
5
artist          song                                                          avg_place score mentioned
  <chr>           <chr>                                                             <dbl> <dbl>     <int>
1 Eddie Cochran   "Summertime\n  Blues"                                                 1    12         1
2 Meat Loaf       "You Took The Words\n  Right Out Of My Mouth (Hot Summer Nig…"        9     2         1
3 Die Toten Hosen "Eisgekühlter\n  Bommerlunder"                                       10     1         1

5 – Spaces in general

As you can see in the following example, different spaces appear to be a problem. Double Spaces are followed by the line break in the example above, and sometimes confusing spaces change the song name a hinder a good prediction of the final charts. As I don’t see a good use of spaces, other than better readability, just delete all the space.

1
2
3
4
 artist          song                                                          avg_place score mentioned
  <chr>           <chr>                                                             <dbl> <dbl>     <int>
1 Meat Loaf       "You Took The Words Right Out Of My Mouth ( Hot Summer Night )"       8     3         1
2 Meat Loaf       "You Took The Words\n  Right Out Of My Mouth (Hot Summer Nig… "       9     2         1

Here is an overview of the changes, I made to the Script. This is just swiftly typed down, I will look for a better way to put this in a Script and then upload it on my gitlab account.

1
2
3
4
5
6
7
Results_df_nona$song = gsub("(Live)","", Results_df_nona$song)
Results_df_nona$song = gsub("The", "",  Results_df_nona$song)
Results_df_nona$song = gsub("\n  ", "", Results_df_nona$song)
Results_df_nona$song = gsub("\\s","", Results_df_nona$song, fixed = FALSE) 
Results_df_nona$artist = gsub(" and ", " & ", Results_df_nona$artist, ignore.case = TRUE)
Results_df_nona$artist = gsub("The ", "",  Results_df_nona$artist)
Results_df_nona$artist = gsub("\\s","", Results_df_nona$artist, fixed = FALSE)

Hippie Songs

Following the new insights from recapping last week, I could simply apply my script to this weeks top-100 Hippie song charts. Here is my prediction for the 20 top songs and you can listen to it at radioeins.de.

place artist song score mentioned average place
1 JohnLennon Imagine 245 34 4.500000
2 JeffersonAirplane WhiteRabbit 141 21 4.761905
3 BuffaloSpringfield ForWhatIt'sWorth 93 14 5.000000
4 Mamas&Papas CaliforniaDreamin' 92 15 5.066667
5 JoeCocker WithALittleHelpFromMyFriends 82 11 3.909091
6 BobDylan TimesyAreA-Changin' 72 12 5.333333
7 JimiHendrixExperience PurpleHaze 67 10 4.800000
8 BobDylan Blowin'InWind 63 9 4.444444
9 JoniMitchell Woodstock 60 10 5.400000
10 BarryMcGuire EveOfDestruction 60 10 5.300000
11 JanisJoplin MercedesBenz 59 8 4.250000
12 Beatles AllYouNeedIsLove 59 6 2.500000
13 JanisJoplin MeAndBobbyMcGee 53 11 6.272727
14 JeffersonAirplane SomebodyToLove 53 9 5.333333
15 Byrds Turn!Turn!Turn!(ToEverythingreIsASeason) 53 7 4.142857
16 GilScott-Heron RevolutionWillNotBeTelevised 52 8 5.125000
17 ScottMcKenzie SanFrancisco(BeSureToWearFlowersInYourHair) 49 11 6.727273
18 PlasticOnoBand GivePeaceAChance 49 9 5.888889
19 RichieHavens Freedom 48 8 5.375000
20 JimiHendrix Star-SpangledBanner 45 7 4.857143
21 EdwinStarr War 44 8 5.875000
22 CreedenceClearwaterRevival FortunateSon 44 7 5.142857
23 JimiHendrixExperience AllAlongWatchtower 43 8 5.750000
24 CannedHeat GoingUpCountry 38 8 6.375000
25 NeilYoung HeartOfGold 38 7 5.714286
The LatestT