Extrapolating likely pageview counts based on website rankings in the CrUX dataset.
CrUX (Chrome User Experience Report) is a well-established dataset, published by Google, giving developers a glimpse at the real-world performance of websites. It’s very interesting to correlate with technology detection to look for possible patterns in the data. It doesn’t say how much traffic do specific websites get exactly, but we can try to estimate that! We can reuse pageview data from analytics.usa.gov, comparing it with websites’ CrUX ranks, to get a rough idea of how many pageviews websites in different CrUX ranks might get.
Data sources
Source | Description | Last updated |
---|---|---|
analytics.usa.gov | Top hostnames, 30 days (10k websites) | 2024-09-09 |
Tranco | “latest list”, 30 days (1M websites) | 2024-09-09 |
CrUX | Cached Chrome Top Million Websites | 2024-08-14 |
Results
CrUX Rank | Pageviews (median) | Extrapolated |
---|---|---|
1,000 | 3,743,794,604 | |
5,000 | 969,861,891 | |
10,000 | 505,802,570 | |
50,000 | 105,333,939 | |
100,000 | 33,557,875 | |
500,000 | 7,759,115 | |
1,000,000 | 2,335,489 | |
5,000,000 | 612,733 | |
10,000,000 | 293,886 | |
50,000,000 | 53,367 |
5M, 10M, 50M values extrapolated based on a power series trend line of 7.73E+12x^-1.06
. R² = 0.992.
View the data in Google Sheets: CrUX rank to pageviews
The above results are based on CrUX ranks only. Tranco ranks results are shared below for reference.
Methodology
With DuckDB:
create table tranco as select * from './top-1m.csv';
create table analytics as select * from './top-10000-domains-30-days.csv';
create table crux as select * from './20240814-crux-current.csv';
create table mapping as (
select
a.hostname,
crux.origin as crux_origin,
column1 as tranco_domain,
a.pageviews,
a.visits,
rank as crux_rank,
column0 as tranco_rank,
from analytics a
-- CrUX has full origins including protocol. Analytics data uses domain only.
join crux on a.hostname = regexp_extract(crux.origin, '^(?:https?:\/\/)?([^\/]+)', 1)
-- Tranco has a tendency to use root domains even for sites served on www.
left join tranco on regexp_extract(a.hostname, '^(?:www\.)?(.+)', 1) = tranco.column1
order by rank asc
);
copy(select * from mapping) to './crux-tranco-analytics-mapping.csv';
CrUX full results
Pageview scores are given for 365 days.
crux_rank | min_pageviews | max_pageviews | median_pageviews | avg_pageviews | count |
---|---|---|---|---|---|
1000 | 2944146696 | 4543442510 | 3743794603 | 3743794603 | 2 |
5000 | 2615 | 4086099226 | 969861891 | 1160648800 | 17 |
10000 | 77886985 | 1670330142 | 505802570 | 546856260 | 8 |
50000 | 1606 | 720381544 | 105333938 | 140603434 | 55 |
100000 | 2932 | 390257476 | 33557874 | 50807897 | 70 |
500000 | 1399 | 1249634537 | 7759115 | 18421315 | 336 |
1000000 | 1679 | 67022845 | 2335489 | 3763187 | 283 |
select
crux_rank,
min(pageviews * 365.0 / 30.0) as min_pageviews_365,
max(pageviews * 365.0 / 30.0) as max_pageviews_365,
median(pageviews * 365.0 / 30.0) as median_pageviews_365,
avg(pageviews * 365.0 / 30.0) as avg_pageviews_365,
count(pageviews) as count_pageviews
from
mapping
group by
crux_rank
order by
crux_rank;
Caveats
- CrUX ranks origins (including protocol and full), while Tranco ranks hostnames.
- Only 771 data points are available, 80% of which are for the 500k/1M ranks.
- The date ranges differ, so the site traffic reflected in the ranks and page views are for different time periods.
- The pageviews dataset is for websites primarily intended for a USA audience, while the rankings are global.
- Yearly pageview data is extrapolated from a specific 30-day period over the summer months in North America.
- There is no data for CrUX ranks above 1M (5M, 10M, 50M).