Did you know we have an online conference about product design coming up? SPRINT will cover how designers and product owners can stay ahead of the curve in these unprecedented times.

Over the past few years, articles and blog posts have started to ask some version of the same question: “Why are all websites starting to look the same?”

These posts usually point out some common design elements, from large images with superimposed text, to hamburger menus, which are those three horizontal lines that, when clicked, reveal a list of page options to choose from.

My colleagues Bardia Doosti, David Crandall, Norman Su and I were studying the history of the web when we started to notice these posts cropping up. None of the authors had done any sort of empirical study, though. It was more of a hunch they had.

We decided to investigate the claim to see if there were any truth to the notion that websites are starting to look the same and, if so, explore why this has been happening. So we ran a series of data mining studies that scrutinized nearly 200,000 images across 10,000 websites.

How do you even measure similarity?

It’s virtually impossible to study the entire internet; there are over a billion websites, with many times as many webpages. Since there’s no list of them all to choose from, performing a random sample of the internet is off the table. Even if it were possible, most people only see a tiny fraction of those websites regularly, so a random sample may not even capture the internet that most people experience.

We ended up using the websites of the Russell 1000, the top U.S. businesses by market capitalization, which we hoped would be representative of trends in mainstream, corporate web design. We also studied two other sets of sites, one with Alexa’s 500 most trafficked sites, and another with sites nominated for Webby Awards.

Because we were interested in the visual elements of these websites, as data, we used images of their web pages from the Internet Archive, which regularly preserves websites. And since we wanted to gather quantitative data comparing millions of website pairs, we needed to automate the analysis process.

To do that, we had to settle on a definition of “similarity” that we could measure automatically. We investigated both specific attributes like color and layout, as well as attributes learned automatically from data using artificial intelligence.

For the color and layout attributes, we measured how many pixel-by-pixel edits we would have to make to transform the color scheme or page structure of one website into another. For the AI-generated attributes, we trained a machine learning model to classify images based on which website they came from and measure the attributes the model learned. Our previous work indicates that this does a reasonably good job at measuring stylistic similarity, but it’s very difficult for humans to understand what attributes the model focused on.

How has the internet changed?

We found that across all three metrics – color, layout and AI-generated attributes – the average differences between websites peaked between 2008 and 2010 and then decreased between 2010 and 2016. Layout differences decreased the most, declining over 30% in that time frame.