14. July 2019
100 Best GDR hits 4/4
Sadly, I didn’t have much time to go deeper into text similarities, I briefly introduced last week. So I’ll simply check the code reproducibility and validate the results.
GDR Songs
After setting the new url https://www.radioeins.de/musik/die-100-besten-2019/ostsongs/ the code run smoothly. Here are my results for todays vote of the 100 best eastern germany songs.
place | artist | song | score | mentioned | average place |
1 | city | amfenster | 188 | 24 | 3.666667 |
2 | ninahagen | duhastdenfarbfilmvergessen | 144 | 21 | 4.571429 |
3 | karussell | alsichfortging | 105 | 17 | 5.235294 |
4 | sandow | borningdr | 101 | 15 | 4.600000 |
5 | silly | bataillondamour | 99 | 15 | 4.800000 |
6 | herbstinpeking | bakschischrepublik | 93 | 16 | 5.500000 |
7 | pankow | langeweile | 87 | 16 | 5.750000 |
8 | silly | montklamott | 80 | 12 | 4.750000 |
9 | silly | verlorenekinder | 75 | 9 | 3.333333 |
10 | karat | derblaueplanet | 74 | 13 | 5.615385 |
11 | holgerbiege | sagtemaleindichter | 69 | 11 | 4.909091 |
12 | feelingb | artig | 58 | 10 | 5.500000 |
13 | karat | übersiebenbrückenmusstdugehn | 50 | 7 | 4.428571 |
14 | horstkrügerband | dietagesreise | 43 | 7 | 5.142857 |
15 | puhdys | gehzuihr | 41 | 7 | 5.285714 |
16 | czesławniemen | jednegoserca | 34 | 3 | 1.333333 |
17 | puhdys | altwieeinbaum | 33 | 6 | 5.500000 |
18 | city | derkingvomprenzlauerberg | 33 | 5 | 4.400000 |
19 | electricbeatcrew | herewecome | 33 | 4 | 3.500000 |
20 | dieanderen | freitagabendinberlin(gelbeworte) | 32 | 4 | 3.250000 |
21 | manfredkrug | wennsdraußengrünwird | 29 | 4 | 4.250000 |
22 | keimzeit | klingklang | 28 | 6 | 6.666667 |
23 | dieskeptiker | dadainberlin | 28 | 5 | 5.600000 |
24 | klausrenftcombo | werdieroseehrt | 28 | 5 | 5.800000 |
25 | pankow | aufruhrindenaugen | 28 | 4 | 4.500000 |
Problem analysis
I have the feeling, this weeks data contains more than ever ambiguous artists and song names. Here are two nice examples.
The Hungarian band Omega made a song called Gyöngyhajú Lány, which experienced a german adaption with the name Perlen im Haar (Pearls in her hair). Trivia: the song famously appeared in the great movie This ain’t California (YT), as well as Kanye Wests New Slaves (YT). Even though both artist were signed at the same label, Omega seemed to be a little angry, as they would have wished to receive a formal request from Kanye to sample their song.
But back to the topic.
Apparently the Jury members liked these songs, but which one specifically. Lets tak a look in the charts table and filter for Omega.
We see a german and an hungarian version, but also kind of both. Originally I wanted to count both versions together as the hungarian one. This would be closest to the last rule of the show and would handle all problems. Then I heard the german song in the radio and realized, they are going to play both. But I didn’t knew how to handle the last entry, which isn’t unambiguous. This is a nice example, that you can’t automatically summarise human generated content completely and sometimes it just boils down to an arbitrary decision of the evaluator.
The next example is a little bit simpler, but also shows a problem of summarising data, if people are too clever ;) Karussel landed close to the top ranking on place 3. The song ‘als ich fortging’ even scored better than can be seen on first sight, because some didn’t name the band, but the lead singer.
Both are problems, which solution can’t really be automated, but needed human assessment.
That’s it for this weeks web scraping. Have fun relistening to some old GDR hits on spotify, which sometimes can still influence some modern pop songs.