Where does Apptopia get its data from?

We work with many different data sources so that we can collect the largest and most diverse set of data points possible. Each day we process and turn billions of data points into insightful information.

1. Download and Revenue Data

We have over 225,000 mobile apps that share their download, revenue, and advertising data directly with us. We use this data (which makes up over 40 billion data points) to build thousands of estimation models that accurately estimate download and revenue data for every app in more than 50 countries. We do this on a daily basis.

Furthermore, our technology and indexing engines scan the iOS App Store & Google Play every hour for every existing public data points, creating an extremely granular time series of app market data. There are 35 different meta data points collected hourly (rank, ratings, price, etc.) for each of our 50+ indexed countries.

It is the combination of these two data sets that let us accurately estimate performance for the entire market.

2. SDK Data

While our download and revenue data is based on models created by our data science team, our SDK data is factual. Our proprietary SDK Recognition technology tracks what SDKs are installed or uninstalled from the top 500,000 free apps across both the App Store & Google Play, accounting for approximately 10,000 monitored SDKs.

To do so, we:

Download the app
Decompile the app
Decrypt the app
Analyze the code and fingerprint (recognize) which known SDKs are currently installed in the app.

We re-run our "SDK Recognition" technology every time an app pushes out a new release or update, so we can create a granular timeline of what tools an app is using or has used. In addition, API call data goes beyond install and uninstall data to provide granular insight into the specific SDK features being used by each app.

We currently track every major SDK. And in the chance a customer needs data on a lesser-known SDK, we can provide it within 72 hours.

3) How We Measure Accuracy

When we build our models, it's not possible to use all of our data. So, we set aside data (i.e. the real downloads & revenue data) excluded in our model for testing. We then measure our models' estimates against this actual, unused data to gauge accuracy. This process is repeated on a monthly basis to make sure that we are consistently accurate.