November 2, 2022

Data and the Art of NFT Discovery

Kyle Waters asks how we can develop recommendation systems for NFTs without reproducing old power structures
Credit: Mario Klingemann, Hic et Nunc - State of the Art - March 18th 2021 (detail), 2021. Courtesy of the artist
Now Reading:  
Data and the Art of NFT Discovery

Today, there are millions of NFTs, tens of thousands of ERC-721 smart contracts, and dozens of marketplaces across multiple blockchains. As a result, the NFT ecosystem is full of noise and fragmentation. Whether you’re a newbie collector or a crypto OG it’s hard to find new and compelling work. But the problem of content recommendation is not unique to Web3.

Thanks to their stockpiles of data and years of testing and training, Web2 companies like Netflix and Spotify have mastered the art of discovery. But are their approaches transferable to NFTs, which pose new and unique challenges for discovery?

Here, I dive into some alternative approaches to NFT discovery.

Image courtesy of

Follow the data

Perhaps the simplest approach to NFT discovery is locating sales volume. For better or worse, rankings and leaderboards are convenient data points for establishing the most popular collections. They also serve as a natural point of gravity for collectors, who often sort by sales volume. A benefit of this approach is its verifiability — blockchain data is readily available and anyone can independently verify rankings by running an Ethereum node and collecting the data themselves.

Leaderboards are a natural sorting mechanism, but they are not fine-tuned to individual preferences.

After all, blockchain data enables us to peer into complex networks of artists and collectors. Network scientists immediately recognized the power of NFT data to unveil emergent ownership patterns.

However, Wash trading — when collectors sell works to themselves to give the appearance of volume and interest — remains a thorny issue for NFT marketplaces. It often takes independent crypto researchers like takenstheorem to visualize the links between accounts trading back and forth with each other.

Other collectors also liked

Another approach to NFT discovery is analyzing collectors who are similar to you. This approach assumes that if two collectors own works by an overlapping set of artists, they probably have similar tastes and might benefit from learning about artists whom one collects but not the other. This follows the logic of Facebook’s “mutual friends.” 

With its transparent, distributed, and live record of who owns what, the blockchain is especially rich in data fit for network graph analysis. Any time an NFT is transferred, it is logged on the public ledger of the relevant chain, establishing a new connection between two crypto wallets. This data can then be used to develop a collector’s social graph. Below, I’ve used my network visualization tool to analyze Jason Bailey’s collector network on SuperRare. This tool is also capable of visualizing an artist’s community of collectors, so I’ve included generative artist Manoloide for good measure.

SuperRare users connected to Jason Bailey (artnomevault) and Manoloide. Data as of 6 August, 2022. Courtesy of Kyle Waters

The first graph shows the artists Jason collects, while the second shows Jason’s “mutual collectors” on SuperRare, who share the distinction of owning a work created by Manoloide. If we zoom in on one of Manoloide’s collectors, we can search for artists whom Jason is yet to collect. Let’s consider 6529Museum, the collection of pseudonymous collector punk6529. The following graph shows a full set of new artists for Jason to consider. Based on this network analysis, he might wish to check out Seerlight.

SuperRare users connected to 6529Museum. Data as of 6 August, 2022. Courtesy of Kyle Waters

While this is a grossly oversimplified model, it shows the power of a network-based approach. A more sophisticated model might rank results based on exactly how many of Jason’s mutual collectors own work by a particular artist. Extending this approach across blockchains — Ethereum and Tezos for example — would require linking artist wallets together to keep track of cross-chain provenance.

ClubNFT’s discovery tool takes a first pass at surfacing recommendations derived purely from blockchain network data. However, there is room to extend the algorithm beyond its current scope by incorporating additional network layers, token metadata, and even going beyond the blockchain itself.

Mario Klingemann, Hic et Nunc - State of the Art - March 18th 2021, 2021. Courtesy of the artist

Show me more artworks like this

An alternative method is to approach the problem at the level of the NFT. For the image above, artist Mario Klingemann clustered over 25,000 NFTs from Tezos’s Hic Et Nunc marketplace in April 2021 based on color similarity. More advanced computer vision techniques are also capable of finding good matches based on subject matter. 

NFT metadata — the arbitrary chunk of information that a token points to — can also potentially aid recommendations. For crypto art, metadata tends to consist of a JSON file hosted on IPFS (InterPlanetary File System). Metadata often contains tags, descriptions, and other attributes related to the content of a work. Applying data analysis to this metadata might help to uncover new works for collectors. However, without clear standards in place, harmonizing such information is incredibly difficult.

Metadata can provide detailed descriptions, tags, and other information, but at times it fails to offer the same information richness, which can potentially undermine the power of recommendation engines in Web3.

Another issue is that of copyminting, whereby content is copied and minted by someone seeking to pass themself off as the authentic creator. This problem has recently escalated to the point where OpenSea is now using image recognition techniques to prevent it. Any visual recommendation engine likely needs to ensure artist provenance to avoid recommending a copyminted NFT. Although ClubNFT’s discovery tool does not seek explicitly to remove potentially copyminted works, it does require work to have been collected by actual collectors, which provides some level of defense against malicious results. To learn the telltale signs of copyminting, watch this short video on scams from ClubNFT’s SAFE Course.

Seerlight, High-Rise, 2022. Courtesy of the artist


The common denominator in successful recommendation engines is large volumes of data. Based on the data available in the NFT ecosystem, there’s clearly an opportunity to build discovery systems that help collectors navigate the market, as well as tools that allow artists to surface (or resurface) works from the ether. Regardless of the approach chosen, we must heed the lessons of Web2 in preventing algorithmic bias from reproducing hegemonic power structures. Web3 tools must also avoid the kind of repeat recommendations that are liable to privilege certain artists over others. From the perspective of a data analyst, uncovering artists without a track record is a big challenge, but it can be overcome if we replace the culture of collecting stars by supporting emerging artists.


Kyle Waters is a Research Analyst at Coin Metrics, a firm specializing in blockchain analytics and intelligence. He co-authors the Coin Metrics weekly newsletter “State of the Network,” alongside other long-form research content spanning the cryptosphere. He is a Data Science Contributor to ClubNFT and has been contributing to the art and technology blog Artnome since 2018.