1 Project Description

Tidy Tuesday has a weekly data project aimed at the R ecosystem. An emphasis is placed on understanding how to summarize and arrange data to make meaningful charts with ggplot2, tidyr, dplyr, and other tools in the tidyverse ecosystem.

2 Dataset

Data came from Steam Spy. The data includes time played, ownership, release date, publishing information, and for some a metascore.

3 Setup

3.1 Load Libraries

if (!require("pacman")) install.packages("pacman")
pacman::p_load("tidyverse","naniar","zoo","textclean","lubridate","grid","gridExtra")

theme_set(theme_minimal())

3.2 Import Data

df <- read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-07-30/video_games.csv")

4 Data Wrangling & Analysis

  • There are 26688 observations and 10 variables.

4.1 View Data

glimpse(df)
## Observations: 26,688
## Variables: 10
## $ number           <dbl> 1, 3, 21, 47, 36, 52, 2, 4, 14, 40, 9, 17, 43...
## $ game             <chr> "Half-Life 2", "Counter-Strike: Source", "Cou...
## $ release_date     <chr> "Nov 16, 2004", "Nov 1, 2004", "Mar 1, 2004",...
## $ price            <dbl> 9.99, 9.99, 9.99, 4.99, 9.99, NA, 14.99, 4.99...
## $ owners           <chr> "10,000,000 .. 20,000,000", "10,000,000 .. 20...
## $ developer        <chr> "Valve", "Valve", "Valve", "Valve", "Valve", ...
## $ publisher        <chr> "Valve", "Valve", "Valve", "Valve", "Valve", ...
## $ average_playtime <dbl> 110, 236, 10, 0, 0, 16, 0, 0, 0, 0, 0, 0, 0, ...
## $ median_playtime  <dbl> 66, 128, 3, 0, 0, 10, 0, 0, 0, 0, 0, 0, 0, 27...
## $ metascore        <dbl> 96, 88, 65, NA, NA, NA, 93, 87, 73, NA, 80, 6...
head(df)
## # A tibble: 6 x 10
##   number game  release_date price owners developer publisher
##    <dbl> <chr> <chr>        <dbl> <chr>  <chr>     <chr>    
## 1      1 Half~ Nov 16, 2004  9.99 10,00~ Valve     Valve    
## 2      3 Coun~ Nov 1, 2004   9.99 10,00~ Valve     Valve    
## 3     21 Coun~ Mar 1, 2004   9.99 10,00~ Valve     Valve    
## 4     47 Half~ Nov 1, 2004   4.99 5,000~ Valve     Valve    
## 5     36 Half~ Jun 1, 2004   9.99 2,000~ Valve     Valve    
## 6     52 CS2D  Dec 24, 2004 NA    1,000~ Unreal S~ Unreal S~
## # ... with 3 more variables: average_playtime <dbl>,
## #   median_playtime <dbl>, metascore <dbl>
summary(df)
##      number         game           release_date           price        
##  Min.   :   1   Length:26688       Length:26688       Min.   :  0.490  
##  1st Qu.: 821   Class :character   Class :character   1st Qu.:  2.990  
##  Median :2356   Mode  :character   Mode  :character   Median :  5.990  
##  Mean   :2904                                         Mean   :  8.947  
##  3rd Qu.:4523                                         3rd Qu.:  9.990  
##  Max.   :8846                                         Max.   :595.990  
##                                                       NA's   :3095     
##     owners           developer          publisher        
##  Length:26688       Length:26688       Length:26688      
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##                                                          
##  average_playtime   median_playtime     metascore    
##  Min.   :   0.000   Min.   :   0.00   Min.   :20.00  
##  1st Qu.:   0.000   1st Qu.:   0.00   1st Qu.:66.00  
##  Median :   0.000   Median :   0.00   Median :73.00  
##  Mean   :   9.057   Mean   :   5.16   Mean   :71.89  
##  3rd Qu.:   0.000   3rd Qu.:   0.00   3rd Qu.:80.00  
##  Max.   :5670.000   Max.   :3293.00   Max.   :98.00  
##  NA's   :9          NA's   :12        NA's   :23838
sapply(df, function(x) n_distinct(x)) %>% sort()
##           owners        metascore            price  median_playtime 
##               13               74              200              297 
## average_playtime     release_date           number        publisher 
##              387             2977             8846            13954 
##        developer             game 
##            16753            26611

4.2 Missing Values

View missing values in more detail.

#Visualize missing values
gg_miss_var(df) + labs(title="Missing Values")

#see count of missing values
na_values <- function(df){
  na <- colSums(is.na(df)) %>% sort(decreasing=TRUE)
  na[na>0]
}

na_values(df)
##        metascore            price        developer        publisher 
##            23838             3095              151               95 
##  median_playtime average_playtime             game 
##               12                9                3

4.3 Data Wrangling

Remove/replace missing values and drop unnecessary columns

#group by most specific attributes to less specific to get the mean for missing price
groups <- c("game","publisher","developer")

for(group in groups){
  df <- df %>% group_by(get(group)) %>% mutate(price = na.aggregate(price)) %>% ungroup()
}

df$average_playtime[(is.na(df$average_playtime))] <- 0

#check missing values again
na_values(df)
##       metascore           price       developer      get(group) 
##           23838            1979             151             151 
##       publisher median_playtime            game 
##              95              12               3
#for the remaining missing prices, use the mean of the entire dataset
df <- df %>% mutate(price = na.aggregate(price)) %>% select(-11,-metascore)

na_values(df)
##       developer       publisher median_playtime            game 
##             151              95              12               3
#check for string abnormalities in game
check_text(df$game)
## 
## 
## ===========
## CONTRACTION
## ===========
## 
## The following observations contain contractions:
## 
## 523, 988, 1450, 2029, 3304, 3546, 3836, 3859, 3992, 5328...[truncated]...
## 
## This issue affected the following text:
## 
## 523: Sam & Max 205: What's New Beelzebub?
## ...[truncated]...
## 988: nail'd
## ...[truncated]...
## 1450: Who's That Flying?!
## ...[truncated]...
## 2029: Monaco: What's Yours Is Mine
## ...[truncated]...
## 3304: Let's Sing
## ...[truncated]...
## 3546: It's A Wipe!
## ...[truncated]...
## 3836: Frayed Knights: The Skull of S'makh-Daon
## ...[truncated]...
## 3859: It's time to get out from the solar system
## ...[truncated]...
## 3992: Who's Your Daddy
## ...[truncated]...
## 5328: I Can't Escape: Darkness
## ...[truncated]...
## 
## *Suggestion: Consider running `replace_contraction`
## 
## 
## =====
## DIGIT
## =====
## 
## The following observations contain digits/numbers:
## 
## 1, 4, 6, 7, 8, 13, 14, 21, 22, 24...[truncated]...
## 
## This issue affected the following text:
## 
## 1: Half-Life 2
## ...[truncated]...
## 4: Half-Life 2: Deathmatch
## ...[truncated]...
## 6: CS2D
## ...[truncated]...
## 7: Unreal Tournament 2004: Editor's Choice Edition
## ...[truncated]...
## 8: DOOM 3
## ...[truncated]...
## 13: STAR WARS Battlefront (Classic, 2004)
## ...[truncated]...
## 14: Jagged Alliance 2 - Wildfire
## ...[truncated]...
## 21: SuperPower 2 Steam Edition
## ...[truncated]...
## 22: Dark Fall 2: Lights Out
## ...[truncated]...
## 24: Guilty Gear X2 #Reload
## ...[truncated]...
## 
## *Suggestion: Consider using `replace_number`
## 
## 
## ========
## EMOTICON
## ========
## 
## The following observations contain emoticons:
## 
## 75, 138, 167, 174, 221, 266, 276, 318, 337, 341...[truncated]...
## 
## This issue affected the following text:
## 
## 75: RPG Maker XP
## ...[truncated]...
## 138: X3: Reunion
## ...[truncated]...
## 167: 688(I) Hunter/Killer
## ...[truncated]...
## 174: Xpand Rally
## ...[truncated]...
## 221: Hidden Expedition: Video Games
## ...[truncated]...
## 266: Lost Planet: Extreme Condition
## ...[truncated]...
## 276: Peggle Extreme
## ...[truncated]...
## 318: eXperience 112
## ...[truncated]...
## 337: Redneck Kentucky and the Next Generation Chickens
## ...[truncated]...
## 341: Hidden Expedition: Everest
## ...[truncated]...
## 
## *Suggestion: Consider using `replace_emoticons`
## 
## 
## =======
## ESCAPED
## =======
## 
## The following observations contain escaped back spaced characters:
## 
## 8031, 9271, 14257, 19655, 22675, 23609, 24651
## 
## This issue affected the following text:
## 
## 8031: Major\Minor - Complete Edition
## 9271: Major\Minor
## 14257: C:\raft
## 19655: HENTAI CASINO \ <U+6027><U+8D4C><U+573A>
## 22675: KAMASUTRA \ <U+7231><U+7ECF>
## 23609: PUSSY PASSWORD \ <U+732B><U+7684><U+5BC6><U+7801>
## 24651: Meme Supreme ¯\_(<U+30C4>)_/¯
## 
## *Suggestion: Consider using `replace_white`
## 
## 
## ====
## HASH
## ====
## 
## The following observations contain Twitter style hash tags (e.g., #rstats):
## 
## 24, 852, 881, 7121, 8130, 8260, 8718, 8965, 9519, 12138...[truncated]...
## 
## This issue affected the following text:
## 
## 24: Guilty Gear X2 #Reload
## ...[truncated]...
## 852: Freeze Tag Fun Pack #1
## ...[truncated]...
## 881: Freeze Tag Fun Pack #2
## ...[truncated]...
## 7121: Case #8
## ...[truncated]...
## 8130: htoL#NiQ: The Firefly Diary / htoL#NiQ-<U+30DB><U+30BF><U+30EB><U+30CE><U+30CB><U+30C3><U+30AD>-
## ...[truncated]...
## 8260: #KILLALLZOMBIES
## ...[truncated]...
## 8718: Light Repair Team #4
## ...[truncated]...
## 8965: #SelfieTennis
## ...[truncated]...
## 9519: ComixPlay #1: The Endless Incident
## ...[truncated]...
## 12138: #Archery
## ...[truncated]...
## 
## *Suggestion: Consider using `qdapRegex::ex_tag' (to capture meta-data) and/or replace_hash
## 
## 
## ==========
## INCOMPLETE
## ==========
## 
## The following observations contain incomplete sentences (e.g., uses ending punctuation like '...'):
## 
## 1804, 2081, 4336, 4692, 4927, 4938, 6674, 10349, 10951, 12256...[truncated]...
## 
## This issue affected the following text:
## 
## 1804: Hamlet or the Last Game without MMORPG Features, Shaders ...
## ...[truncated]...
## 2081: BIT.TRIP Presents... Runner2: Future Legend of Rhythm Alien
## ...[truncated]...
## 4336: Oh...Sir! Prototype
## ...[truncated]...
## 4692: I must kill...: Fresh Meat
## ...[truncated]...
## 4927: Home is Where One Starts...
## ...[truncated]...
## 4938: Always The Same Blue Sky...
## ...[truncated]...
## 6674: Oh...Sir!! The Insult Simulator
## ...[truncated]...
## 10349: Karaski: What Goes Up...
## ...[truncated]...
## 10951: Icity - a Flight Sim ... and a City Builder
## ...[truncated]...
## 12256: Oh...Sir! The Hollywood Roast
## ...[truncated]...
## 
## *Suggestion: Consider using `replace_incomplete`
## 
## 
## ====
## KERN
## ====
## 
## The following observations contain kerning (e.g., 'The B O M B!'):
## 
## 3976, 4309, 8779, 9413, 13383, 15065, 15260, 16274, 20282, 22000...[truncated]...
## 
## This issue affected the following text:
## 
## 3976: HELLDIVERS A New Hell Edition
## ...[truncated]...
## 4309: NOT A HERO
## ...[truncated]...
## 8779: C O S M
## ...[truncated]...
## 9413: L U N E
## ...[truncated]...
## 13383: RHEM I SE: The Mysterious Land
## ...[truncated]...
## 15065: FROG X BIRD
## ...[truncated]...
## 15260: B A S E M E N T
## ...[truncated]...
## 16274: <U+57CB><U+846C> DIG A BOO
## ...[truncated]...
## 20282: BATTLE X Arcade
## ...[truncated]...
## 22000: CROSS X CARROT
## ...[truncated]...
## 
## *Suggestion: Consider using `replace_kern`
## 
## 
## =============
## MISSING VALUE
## =============
## 
## The following observations contain missing values:
## 
## 7763, 12553, 22303
## 
## *Suggestion: Consider running `drop_NA`
## 
## 
## ==========
## MISSPELLED
## ==========
## 
## The following observations contain potentially misspelled words:
## 
## 4, 10, 12, 19, 21, 26, 27, 28, 31, 35...[truncated]...
## 
## This issue affected the following text:
## 
## 4: H<<al>>f-<<Lif>>e 2: <<<<De>>a<<th>>match>>
## ...[truncated]...
## 10: <<Hitman>>: Contracts
## ...[truncated]...
## 12: <<<<Neighbour>>s>> from H<<el>>l Compilat<<io>>n
## ...[truncated]...
## 19: <<Wo>>rld <<Wa>>r II: <<Panzer>> Claws
## ...[truncated]...
## 21: <<<<Su>>perPower>> 2 Steam <<E<<di>>t<<io>>>>n
## ...[truncated]...
## 26: <<<<pla>>netar<<ian>>>> ~<<th>>e re<<ver>>ie of a litt<<le>> <<pla>>net~
## ...[truncated]...
## 27: <<Co<<de>>name>>: <<<<Panzer>>s>>, Pha<<se>> One
## ...[truncated]...
## 28: <<Kohan>> II: Kings of <<Wa>>r
## ...[truncated]...
## 31: <<Wa>>rlords <<Batt<<le>>cry>> III
## ...[truncated]...
## 35: <<Kon<<un>>g>> 2
## ...[truncated]...
## 
## *Suggestion: Consider running `hunspell::hunspell_find` & `hunspell::hunspell_suggest`
## 
## 
## ========
## NO ALPHA
## ========
## 
## The following observations contain elements with no alphabetic (a-z) letters:
## 
## 2135, 2249, 3282, 4451, 5591, 5718, 6317, 6440, 7371, 11180...[truncated]...
## 
## This issue affected the following text:
## 
## 2135: 140
## ...[truncated]...
## 2249: 10000000
## ...[truncated]...
## 3282: 1849
## ...[truncated]...
## 4451: <U+0410><U+0441><U+0442><U+0440><U+043E><U+043B><U+043E><U+0440><U+0434><U+044B>: <U+041E><U+0431><U+043B><U+0430><U+043A><U+043E> <U+041E><U+043E><U+0440><U+0442><U+0430>
## ...[truncated]...
## 5591: <U+6804><U+51A0><U+306F><U+541B><U+306B> <U+30EC><U+30B8><U+30A7><U+30F3><U+30C9><U+30D1><U+30C3><U+30AF>
## ...[truncated]...
## 5718: <U+0639><U+0627><U+0644><U+0645> <U+0623><U+0631><U+064A><U+0628>
## ...[truncated]...
## 6317: <U+30EB><U+30CA><U+30C6><U+30A3><U+30C3><U+30AF><U+30C9><U+30FC><U+30F3> <U+30EC><U+30B8><U+30A7><U+30F3><U+30C9><U+30D1><U+30C3><U+30AF>
## ...[truncated]...
## 6440: <U+30EB><U+30CA><U+30C6><U+30A3><U+30C3><U+30AF><U+30C9><U+30FC><U+30F3> <U+524D><U+9014><U+3078><U+306E><U+9053><U+6A19>
## ...[truncated]...
## 7371: <U+7D05><U+8718><U+86DB><U+5916><U+4F1D>:<U+6697><U+6226>
## ...[truncated]...
## 11180: <U+9F99><U+9B42><U+65F6><U+523B>
## ...[truncated]...
## 
## *Suggestion: Consider cleaning the raw text or running `filter_row`
## 
## 
## ==========
## NO ENDMARK
## ==========
## 
## The following observations contain elements with missing ending punctuation:
## 
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10...[truncated]...
## 
## This issue affected the following text:
## 
## 1: Half-Life 2
## ...[truncated]...
## 2: Counter-Strike: Source
## ...[truncated]...
## 3: Counter-Strike: Condition Zero
## ...[truncated]...
## 4: Half-Life 2: Deathmatch
## ...[truncated]...
## 5: Half-Life: Source
## ...[truncated]...
## 6: CS2D
## ...[truncated]...
## 7: Unreal Tournament 2004: Editor's Choice Edition
## ...[truncated]...
## 8: DOOM 3
## ...[truncated]...
## 9: Beyond Divinity
## ...[truncated]...
## 10: Hitman: Contracts
## ...[truncated]...
## 
## *Suggestion: Consider cleaning the raw text or running `add_missing_endmark`
## 
## 
## ====================
## NO SPACE AFTER COMMA
## ====================
## 
## The following observations contain commas with no space afterwards:
## 
## 233, 235, 255, 303, 385, 556, 891, 1243, 1570, 3057...[truncated]...
## 
## This issue affected the following text:
## 
## 233: Warhammer 40,000: Dawn of War - Dark Crusade
## ...[truncated]...
## 235: Warhammer 40,000: Dawn of War - Game of the Year Edition
## ...[truncated]...
## 255: Warhammer 40,000: Dawn of War – Winter Assault
## ...[truncated]...
## 303: 7,62 High Calibre
## ...[truncated]...
## 385: Warhammer 40,000: Dawn of War - Soulstorm
## ...[truncated]...
## 556: Warhammer 40,000: Dawn of War II
## ...[truncated]...
## 891: Warhammer 40,000: Dawn of War II Chaos Rising
## ...[truncated]...
## 1243: Warhammer 40,000: Space Marine
## ...[truncated]...
## 1570: Warhammer 40,000: Dawn of War II: Retribution
## ...[truncated]...
## 3057: Warhammer 40,000: Armageddon
## ...[truncated]...
## 
## *Suggestion: Consider running `add_comma_space`
## 
## 
## =========
## NON ASCII
## =========
## 
## The following observations contain non-ASCII text:
## 
## 50, 74, 117, 159, 176, 208, 210, 255, 340, 458...[truncated]...
## 
## This issue affected the following text:
## 
## 50: KóterGame
## ...[truncated]...
## 74: Telltale Texas Hold ‘Em
## ...[truncated]...
## 117: Total War: MEDIEVAL II – Definitive Edition
## ...[truncated]...
## 159: Marc Ecko's Getting Up: Contents Under Pressure
## ...[truncated]...
## 176: Disney•Pixar Cars
## ...[truncated]...
## 208: Disney•Pixar Cars: Radiator Springs Adventures
## ...[truncated]...
## 210: 18 Wheels of Steel: Haulin’
## ...[truncated]...
## 255: Warhammer 40,000: Dawn of War – Winter Assault
## ...[truncated]...
## 340: Disney•Pixar Cars Mater-National Championship
## ...[truncated]...
## 458: Telltale Texas Hold ‘Em
## ...[truncated]...
## 
## *Suggestion: Consider running `replace_non_ascii`
## 
## 
## ==================
## NON SPLIT SENTENCE
## ==================
## 
## The following observations contain unsplit sentences (more than one sentence per element):
## 
## 193, 198, 319, 501, 536, 619, 669, 978, 1064, 1101...[truncated]...
## 
## This issue affected the following text:
## 
## 193: Big Money! Deluxe
## ...[truncated]...
## 198: Typer Shark! Deluxe
## ...[truncated]...
## 319: Ride! Carnival Tycoon
## ...[truncated]...
## 501: Ride! Carnival Tycoon
## ...[truncated]...
## 536: Discovery! A Seek and Find Adventure
## ...[truncated]...
## 619: Time Gentlemen, Please! and Ben There, Dan That! Special Edition Double Pack
## ...[truncated]...
## 669: AaAaAA!!! - A Reckless Disregard for Gravity
## ...[truncated]...
## 978: The Misadventures of P.B. Winterbottom
## ...[truncated]...
## 1064: M.U.D. TV
## ...[truncated]...
## 1101: A.I.M. Racing
## ...[truncated]...
## 
## *Suggestion: Consider running `textshape::split_sentence`
## 
## 
## ====
## TIME
## ====
## 
## The following observations contain timestamps:
## 
## 7515, 24867
## 
## This issue affected the following text:
## 
## 7515: Memory's Dogma CODE:01
## 24867: Rise:30 Minutes to Extinction
## 
## *Suggestion: Consider using `replace_time`
## 
## 
## ===
## URL
## ===
## 
## The following observations contain URLs:
## 
## 26024
## 
## This issue affected the following text:
## 
## 26024: Driftpunk Racer
## 
## *Suggestion: Consider using `replace_url`
#fix strings and other data anomalies
df <- df %>% mutate(game = str_replace_all(game, "[^\x20-\x7E]", ""), #removes unicode characters
                    game = replace_non_ascii(game), #did not work for everything. line above is better
                    game = replace_emoticon(game),
                    game = replace_date(game),
                    game = replace_hash(game),
                    game = replace_kern(game),
                    game = str_trim(game),
                    game = str_squish(game),
                    game = str_to_title(game),
                    game = str_replace_all(game, "\\d/\\d", ""),
                    game = str_replace(game, "^[:punct:]+", ""),
                    game = str_replace(game, "\\|.*", ""),
                    game = str_replace(game, "(?<=\\w):(?=\\w)", ": "),
                    game = str_replace(game, "\\s-|-\\s", ": "),
                    game = paste0(game, ": "), #add : to end to make it easy to extract into a group
                    game = str_replace(game, "[:punct:]:$", ":"),
                    game_group = str_extract(game,"([^:]+(?=[:punct:]+\\s))"),
                    game_group = str_trunc(game_group,25),
                    game = str_replace(game, ": $", ""), #clean it up: remove the trailing :
                    publisher = str_replace_all(publisher, "[:punct:]+", ""),
                    publisher = str_replace_all(publisher, "[^\x20-\x7E]", ""),
                    publisher = str_to_title(publisher),
                    owners = str_replace_all(owners, ",", ""),
                    owners_lower = str_extract_all(owners, "\\d+(?=\\s\\.\\.\\s)") %>% as.integer(),
                    owners_upper = str_extract_all(owners, "(?<=\\s\\.\\.\\s)\\d+") %>% as.integer(),
                    release_date = as.Date(release_date, "%b %d, %Y"),
                    average_playtime_group = case_when(average_playtime <= max(average_playtime)*0.2 ~ "Very Low",
                                                       average_playtime > max(average_playtime)*0.2 & 
                                                         average_playtime <= max(average_playtime)*0.4 ~ "Low",
                                                       average_playtime > max(average_playtime)*0.4 & 
                                                         average_playtime <= max(average_playtime)*0.6 ~ "Medium",
                                                       average_playtime > max(average_playtime)*0.6 & 
                                                         average_playtime <= max(average_playtime)*0.8 ~ "High",
                                                       average_playtime > max(average_playtime)*0.8 ~ "Very High"),
                    average_playtime_group = fct_relevel(average_playtime_group,c("Very Low","Low","Medium","High","Very High"))) 

#remove rows that do not have any alpha characters
df <- keep_row(df, "game","\\w+")

#change strings back to factors to make them easier to work with
df <- df %>% mutate_if(is.character, as.factor)

5 Visualizations

df_price <- df %>% group_by(game_group) %>% summarize(total_revenue=sum(price)) %>% ungroup() %>%
  mutate(game_group = fct_reorder(game_group, total_revenue)) %>% arrange(total_revenue) %>% tail(20)

#practice with lollipop chart
df_price %>% ggplot(aes(game_group,total_revenue))+
  geom_point(size=1.75, color="gray10")+
  geom_segment(aes(x=game_group, xend=game_group, y=0, yend=total_revenue), size=1.25, color="deepskyblue4")+
  scale_color_gradient()+
  coord_flip()+
  labs(title="Highest Grossing Video Games", caption="Data: Steam Spy | Graphics: Cat Williams @catrwilliams",
       x = "Game", y = "Total Revenue")+
  theme(plot.title = element_text(size=14, hjust=0.5, face="bold"),
        plot.caption = element_text(size=6))

#################

#determine if there is a relationship between price and average playtime
outliers <- boxplot(df$price, plot=FALSE)$out
price_lim <- summary(outliers)[["3rd Qu."]]

outliers <- boxplot(df$average_playtime, plot=FALSE)$out
playtime_lim <- summary(outliers)[["3rd Qu."]]

p1 <- df %>% ggplot(aes(price, average_playtime))+
  geom_point(alpha=0.3,color="deepskyblue4")+
  labs(title="Summary for all data",x="Game Price",y="Average Playtime")+
  theme(text= element_text(color="gray10"),
        plot.title = element_text(size=13,hjust=0.5),
        axis.title = element_text(size=10))

p2 <- df %>% ggplot(aes(price, average_playtime))+
  geom_point(alpha=0.3,color="deepskyblue4")+
  xlim(0,price_lim)+
  ylim(0,playtime_lim)+
  labs(title="Zoomed in to remove outliers",x="Game Price",y="Average Playtime")+
  theme(text= element_text(color="gray10"),
        plot.title = element_text(size=13,hjust=0.5),
        axis.title = element_text(size=10))

tg <- grobTree(textGrob("Price vs. Average Time Played: No Distinct Relationship", 
                        y=1, 
                        vjust=1, 
                        gp=gpar(fontface="bold", fontsize = 16, color="deepskyblue4")),
               cl="titlegrob")

heightDetails.titlegrob <- function(x) do.call(sum,lapply(x$children, grobHeight))

#create final plot
p3 <- grid.arrange(arrangeGrob(p1,p2,nrow=1),top = tg)

#################

df_publisher <- df %>% group_by(publisher) %>% summarize(total_price=sum(price)) %>% ungroup() %>% 
  filter(publisher != "") %>% mutate(publisher = fct_reorder(publisher, total_price)) %>% 
  arrange(total_price) %>% tail(20)

df_publisher %>% ggplot(aes(publisher, total_price))+
  #geom_col(fill="deepskyblue4")+
  geom_point(size=1.75, color="gray10")+
  geom_segment(aes(x=publisher, xend=publisher, y=0, yend=total_price), size=1.25, color="deepskyblue4")+
  coord_flip()+
  labs(title="Top Video Game Publishers by Total Revenue", x="Revenue", y="Publisher")+
  theme(text= element_text(color="gray10"),
        plot.title = element_text(size=14, hjust=0.5, face="bold"),
        axis.title = element_text(size=10))

#################

#does having more players encourage more playtime?
df %>% ggplot()+
  geom_segment(aes(x=owners_lower, xend=owners_upper, y=average_playtime_group, yend=average_playtime_group), 
               size=1.25, color="deepskyblue4")+
  scale_x_continuous(labels=c("0","50","100","150","200"))+
  labs(title="Video Games: Does having more players encourage more playtime?", 
       subtitle="It appears that more players correlates with lower average playtime",
       x="Number of Players (in millions)", y="Time Played", 
       caption="Data: Steam Spy | Graphics: Cat Williams @catrwilliams")+
  theme(plot.title=element_text(hjust=0.5, color="deepskyblue4", face="bold"),
        text = element_text(color="gray10"),
        plot.subtitle=element_text(hjust=0.5, face="bold"),
        plot.caption = element_text(size=6),
        axis.title = element_text(size=10))

ggsave("video-games.png")

6 Conclusions

I do not have specific knowledge of most video games, however, I have heard the names of many of the highest grossing video games in passing. They also appear to be of multiple genres (ie. war, mystery, trivia) which comes as a surprise to me. I would not have placed a trivia game in the top 20 highest grossing video games.

Another surprise to me is that there is no obvious relationship between average time played and the price of the game. I would have expected more expensive games to be played more as it is a larger investment and logically higher prices tend to equate to better things. What these charts do show is that lower priced games are slightly more popular, due to the concentration of the data points closer to 0.

As for top video game publishers, it appears that Magix Software is doing much better than the other publishers. Big Fish Games, Ubisoft, Koei Tecmo Games, and Slitherine are in a slightly higher bracket than the other top performers. After that, revenue numbers seem to decrease gradually. I also wonder if the top publishers correlate to the highest grossing games. Further analysis would need to be done to determine this.

Lastly, having more overall players for a particular game does not encourage more playtime. At first thought it would seem that more popular games would have more playtime. Based on the data, however, one may infer that because there are more people involved (perhaps only occasional players) with more popular games, it skews the average rankings.