Getting app profile data from Google Play
How to get localized app profiles from Google Play using R, and rvest.
In this second part of this ASO series, and after reviewing how you could automatically retrieve app rankings out of the Google Play, I´m going to touch upon getting app profile data of any given app present in the Google Play Store.
As in the previous article, I will use R and Rstudio to proceed and share some snippets of code that might be relevant for you.
Building app profiles urls
You should start with a list of apps form which you´d like to get their profile data.
I suggest you using an app_id value with the following format to identify each app in your script and easily concatenate the localized url.
By ending your url in "&hl=es" you will check the localized version of that given app in Spanish (if available). Yoy may change the app profile localization adjusting the url termination with the localized languages available for apps hosted in Google Play.
app_id <- '/store/apps/details?id=women.workout.female.fitness'
url <- paste0('https://play.google.com', app_id, "&hl=es")Retrieving app profile data
The core of the script is built upon the library rvest which allows you to simulate a user interacting with a website, using forms and navigating from page to page.
libray(rvest)
page <- session(url) Once the app profile page has been saved in memory, you just need to:
Classify how the information is structured for app profiles hosted in Google Play.
Extract the data accordingly, and save it for further analisys.
In the following example I’m extracking the app title, the app long description, the installs, the score and the ratings.
As you might see each field has its own class that might be retrieved using a web inspector with your browser.
The code includes some conditional statements according to the known cases in which the fields are organized across the Google Play.
Also, there is some defensive programming utilized to prevent the for loop from stopping when getting errors while finding the right case.
Last but not least, the classes and app profile structure in Google Play might change, so beware of reviewing the output of your script regularly.
# title
result$Title[i] <- ifelse(class(try(page %>% html_nodes("[class='Fd93Bb F5UCq p5VxAd']") %>% html_text())) == 'try-error',
NA,
ifelse(length(page %>% html_nodes("[class='Fd93Bb F5UCq p5VxAd']") %>% html_text()) == 0,
page %>% html_nodes("[class='Fd93Bb F5UCq xwcR9d']") %>% html_text(),
page %>% html_nodes("[class='Fd93Bb F5UCq p5VxAd']") %>% html_text()
)
)
# long description
result$Long_desc[i] <- ifelse(class(try(page %>% html_nodes("[class='bARER']") %>% html_text())) == 'try-error',
NA,
page %>% html_nodes("[class='bARER']") %>% html_text())
check_case <-page %>% html_nodes("[class='l8YSdd']") %>% html_text()
if (str_detect(check_case, 'star')) {
# installs
result$Installs[i] <- ifelse(class(try((page %>% html_nodes("[class='ClM7O']") %>% html_text())[2])) == 'try-error',
NA,
(page %>% html_nodes("[class='ClM7O']") %>% html_text())[2])
# ratings
result$Ratings[i] <- ifelse(class(try((page %>% html_nodes("[class='g1rdde']") %>% html_text())[1])) == 'try-error',
NA,
(page %>% html_nodes("[class='g1rdde']") %>% html_text())[1])
# score
result$Score[i] <- ifelse(class(try((page %>% html_nodes("[class='ClM7O']") %>% html_text())[1])) == 'try-error',
NA,
(page %>% html_nodes("[class='ClM7O']") %>% html_text())[1])
} else {
result$Installs[i] <- ifelse(class(try((page %>% html_nodes("[class='ClM7O']") %>% html_text())[1])) == 'try-error',
NA,
(page %>% html_nodes("[class='ClM7O']") %>% html_text())[1])
result$Ratings[i] <- NA
result$Score[i] <- NA
}
}
}Some tips for your script
As for the app rankings, I suggest you some tips when building your utilty script:
Wrapping the app profile data retrieval within a for loop running for each app id.
Run timing tests to understand how long it takes for your machine and settings to retrieve a single app profile. Then, plan it accordingly within your machine schedule.
Add further information to your working data frame such as the retrieval date and schedule the whole script using a (early-am) daily CRON job to track the profile evolution over time for any given app.
If you are interested in running app profile data retrievals for a six-digit or more number of apps, I suggest you to parallelize the core for loop extracting the app profiles from Google Play. Drop me a line in the comments if you’d like to know more about this topic.
The tracking of app profiles is specially important if you want to track the Keywords used by yours and your competitors’ apps.
If you are interested to know how to retrieve the Keywords present in an app profile extracted with the script highlighted above, I encourage you to stay tuned, subscribe, and await for the third part of this series.
Should you have any comment or request, do not hesitate to get in contact with me by email at datadventures@substack.com
