Edit

CrUX ranking to pageviews

Extrapolating likely pageview counts based on website rankings in the CrUX dataset.

CrUX (Chrome User Experience Report) is a well-established dataset, published by Google, giving developers a glimpse at the real-world performance of websites. It’s very interesting to correlate with technology detection to look for possible patterns in the data. It doesn’t say how much traffic do specific websites get exactly, but we can try to estimate that! We can reuse pageview data from analytics.usa.gov, comparing it with websites’ CrUX ranks, to get a rough idea of how many pageviews websites in different CrUX ranks might get.

Data sources

Source Description Last updated
analytics.usa.gov Top hostnames, 30 days (10k websites) 2024-09-09
Tranco “latest list”, 30 days (1M websites) 2024-09-09
CrUX Cached Chrome Top Million Websites 2024-08-14

Results

CrUX Rank Pageviews (median) Extrapolated
1,000 3,743,794,604  
5,000 969,861,891  
10,000 505,802,570  
50,000 105,333,939  
100,000 33,557,875  
500,000 7,759,115  
1,000,000 2,335,489  
5,000,000   612,733
10,000,000   293,886
50,000,000   53,367

5M, 10M, 50M values extrapolated based on a power series trend line of 7.73E+12x^-1.06. R² = 0.992.

View the data in Google Sheets: CrUX rank to pageviews

Yearly pageviews by CrUX rank (log scale)

The above results are based on CrUX ranks only. Tranco ranks results are shared below for reference.

Methodology

With DuckDB:

create table tranco as select * from './top-1m.csv';
create table analytics as select * from './top-10000-domains-30-days.csv';
create table crux as select * from './20240814-crux-current.csv';
create table mapping as (
  select
    a.hostname,
    crux.origin as crux_origin,
    column1 as tranco_domain,
    a.pageviews,
    a.visits,
    rank as crux_rank,
    column0 as tranco_rank,
  from analytics a
  -- CrUX has full origins including protocol. Analytics data uses domain only.
  join crux on a.hostname = regexp_extract(crux.origin, '^(?:https?:\/\/)?([^\/]+)', 1)
  -- Tranco has a tendency to use root domains even for sites served on www.
  left join tranco on regexp_extract(a.hostname, '^(?:www\.)?(.+)', 1) = tranco.column1
  order by rank asc
);
copy(select * from mapping) to './crux-tranco-analytics-mapping.csv';

CrUX full results

Pageview scores are given for 365 days.

crux_rank min_pageviews max_pageviews median_pageviews avg_pageviews count
1000 2944146696 4543442510 3743794603 3743794603 2
5000 2615 4086099226 969861891 1160648800 17
10000 77886985 1670330142 505802570 546856260 8
50000 1606 720381544 105333938 140603434 55
100000 2932 390257476 33557874 50807897 70
500000 1399 1249634537 7759115 18421315 336
1000000 1679 67022845 2335489 3763187 283
select
    crux_rank,
    min(pageviews * 365.0 / 30.0) as min_pageviews_365,
    max(pageviews * 365.0 / 30.0) as max_pageviews_365,
    median(pageviews * 365.0 / 30.0) as median_pageviews_365,
    avg(pageviews * 365.0 / 30.0) as avg_pageviews_365,
    count(pageviews) as count_pageviews
from
    mapping
group by
    crux_rank
order by
    crux_rank;

Caveats

  • CrUX ranks origins (including protocol and full), while Tranco ranks hostnames.
  • Only 771 data points are available, 80% of which are for the 500k/1M ranks.
  • The date ranges differ, so the site traffic reflected in the ranks and page views are for different time periods.
  • The pageviews dataset is for websites primarily intended for a USA audience, while the rankings are global.
  • Yearly pageview data is extrapolated from a specific 30-day period over the summer months in North America.
  • There is no data for CrUX ranks above 1M (5M, 10M, 50M).