Week 4 Practical

UK Top 40 Singles Chart

https://www.officialcharts.com/charts/uk-top-40-singles-chart/

Inspect the HTML for each single. It isn’t visible on the local version, but the data is in the HTML.

Using the methods shown is this lecture recreate the same dataframe as that read in from top40s.csv. A local copy of the webpage is available in the code files as top40s.html.

library(xml2)
library(dplyr)
library(readr)

top40_soln <- read_csv(file.path("data","top40s.csv"))
## Rows: 40 Columns: 2
## -- Column specification ----------------------------------------------
## Delimiter: ","
## chr (2): song, artist
## 
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
print(top40_soln, n = 40)
## # A tibble: 40 x 2
##    song                        artist                        
##    <chr>                       <chr>                         
##  1 STICK SEASON                NOAH KAHAN                    
##  2 MURDER ON THE DANCEFLOOR    SOPHIE ELLIS-BEXTOR           
##  3 BEAUTIFUL THINGS            BENSON BOONE                  
##  4 LOSE CONTROL                TEDDY SWIMS                   
##  5 PRAISE JAH IN THE MOONLIGHT YG MARLEY                     
##  6 PRADA                       CASSO/RAYE/D-BLOCK EUROPE     
##  7 CRUEL SUMMER                TAYLOR SWIFT                  
##  8 GREEDY                      TATE MCRAE                    
##  9 TEXAS HOLD 'EM              BEYONCE                       
## 10 YES AND                     ARIANA GRANDE                 
## 11 HOMESICK                    NOAH KAHAN & SAM FENDER       
## 12 CARNIVAL                    KANYE WEST/TY DOLLA SIGN      
## 13 HOUDINI                     DUA LIPA                      
## 14 UNWRITTEN                   NATASHA BEDINGFIELD           
## 15 ALIBI                       ELLA HENDERSON FT RUDIMENTAL  
## 16 POPULAR                     WEEKND/PLAYBOI CARTI/MADONNA  
## 17 BURN                        KANYE WEST/TY DOLLA SIGN      
## 18 BACK TO ME                  KANYE WEST/TY DOLLA SIGN      
## 19 REDRUM                      21 SAVAGE                     
## 20 NEVER LOSE ME               FLO MILLI                     
## 21 LEAVEMEALONE                FRED AGAIN & BABY KEEM        
## 22 DNA (LOVING YOU)            BILLY GILLIES FT HANNAH BOLEYN
## 23 I REMEMBER EVERYTHING       ZACH BRYAN FT KACEY MUSGRAVES 
## 24 WHATEVER                    KYGO & AVA MAX                
## 25 EXES                        TATE MCRAE                    
## 26 RICH BABY DADDY             DRAKE FT SEXYY RED & SZA      
## 27 NOTHING MATTERS             LAST DINNER PARTY             
## 28 LOVIN ON ME                 JACK HARLOW                   
## 29 ASKING                      SONNY FODERA/MK/DOUGLAS       
## 30 ON MY LOVE                  ZARA LARSSON & DAVID GUETTA   
## 31 FOREVER                     NOAH KAHAN                    
## 32 SCARED TO START             MICHAEL MARCAGI               
## 33 TOXIC                       SONGER                        
## 34 HOME                        GOOD NEIGHBOURS               
## 35 PERFECT (EXCEEDER)          MASON/PRINCESS SUPERSTAR      
## 36 ONE OF THE GIRLS            WEEKND/JENNIE/LILY ROSE DEPP  
## 37 ABRACADABRA                 WES NELSON FT CRAIG DAVID     
## 38 FAST CAR                    TRACY CHAPMAN                 
## 39 SELFISH                     JUSTIN TIMBERLAKE             
## 40 LIL BOO THANG               PAUL RUSSELL

You will need to process the number one single separately from the other 39 as it has different attributes.

For each chart item there is a collection of 11 span elements. The fourth and fifth span elements contain the song and artist for your data.

Week 4 Practical Solution

#download.file("https://www.officialcharts.com/charts/uk-top-40-singles-chart/", file.path("data", "top40s.html"))
library(xml2)
library(dplyr)
library(readr)

top40_soln <- read_csv(file.path("data","top40s.csv"))
## Rows: 40 Columns: 2
## -- Column specification ----------------------------------------------
## Delimiter: ","
## chr (2): song, artist
## 
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
html_doc <- read_html(file.path("data", "top40s.html"))

top <- xml_find_all(html_doc, ".//div[@class=\"primis chart-item relative text-right\"]")

top39 <- xml_find_all(html_doc, ".//div[@class=\"chart-item relative text-right\"]")


top_spans <- xml_find_all(top, ".//span")

top_spans
## {xml_nodeset (11)}
##  [1] <span class="digits1 chart-key font-bold"><span class="sr-only ...
##  [2] <span class="sr-only">Number </span>
##  [3] <span class="movement-icon"></span>
##  [4] <span>STICK SEASON</span>
##  [5] <span>NOAH KAHAN</span>
##  [6] <span title="Last week">LW: <span class="text-brand-pink font- ...
##  [7] <span class="text-brand-pink font-bold">1</span>
##  [8] <span class="hidden sm:inline-block">, </span>
##  [9] <span class="text-brand-cobalt font-bold">1</span>
## [10] <span>, </span>
## [11] <span class="text-brand-pink font-bold">20</span>
select_song_and_artist <- function(node){
  spans <- xml_find_all(node, ".//span")
  song <- xml_text(spans[4])
  artist <- xml_text(spans[5])
  return(tibble(song = song, artist = artist))
}

top_row <- select_song_and_artist(top)
top39_rows <- lapply(top39, select_song_and_artist)

top40 <- bind_rows(top_row, top39_rows)
# top40 %>% write_csv(file.path("data","top40s.csv"))

print(top40)
## # A tibble: 40 x 2
##    song                        artist                   
##    <chr>                       <chr>                    
##  1 STICK SEASON                NOAH KAHAN               
##  2 MURDER ON THE DANCEFLOOR    SOPHIE ELLIS-BEXTOR      
##  3 BEAUTIFUL THINGS            BENSON BOONE             
##  4 LOSE CONTROL                TEDDY SWIMS              
##  5 PRAISE JAH IN THE MOONLIGHT YG MARLEY                
##  6 PRADA                       CASSO/RAYE/D-BLOCK EUROPE
##  7 CRUEL SUMMER                TAYLOR SWIFT             
##  8 GREEDY                      TATE MCRAE               
##  9 TEXAS HOLD 'EM              BEYONCE                  
## 10 YES AND                     ARIANA GRANDE            
## # i 30 more rows