Laying the Foundations for Data Science Through MLOps

Company

KnownOrigin is a pioneering digital art marketplace built on Ethereum. Founded in 2018, it provides a platform for artists to authenticate, showcase, and sell their rare digital artwork as non-fungible tokens (NFTs). KnownOrigin's mission is to empower digital creators by offering a transparent and fair environment for them to monetise their work and engage with their fans.

https://knownorigin.io/

Headquarters

Manchester, UK

Industry

Digital Art

Background

As British mathematician Clive Humby famously said: “Data is the new oil. Like oil, data is valuable, but if unrefined it cannot really be used”. The goal of MLOps is to allow for that refinement at industrial scales.

When we were approached by KnownOrigin (now a subsidiary of eBay) back in 2022, they were already a successful NFT marketplace with a vibrant community of creators, buyers and sellers. They had all the data you could ask for, but it was very much unrefined. They needed help from us to get that data into a format that allowed them to start to do smart machine learning driven things with it

We had previously worked with them in building systems for indexing NFT transactions across multiple blockchain platforms, so already had an understanding of the systems they had in place and had already identified the opportunity to build more powerful analytics tools.

The Plan

One of the challenges with having so much data is simply knowing where to start. What analytics will bring the biggest impact? And how do you build a system that integrates with an existing tech stack in a way that is seamless and won’t become bogged down in technical debt?

For KnownOrigin, one of their key value propositions was empowering digital art sellers; and a key part of that is giving the seller all the information they need to make informed decisions around how to market and sell their artwork. After some discussion with KnownOrigin, we came away with the initial idea of using the data to give sellers an answer to the question “who is buying your NFTs?”. In particular, we looked to categorise their customers into those looking to flip assets quickly, and those who treated them as a long-term investment.

We knew that if we could develop such a categorisation, it would demonstrate the utility of MLOps, but we also knew that it was really only the beginning. So, we formed a plan: we would build infrastructure to allow KnownOrigin to harness their data and the power of machine learning, using customer categorisation would be an initial case study, and our findings would inform more complex analytics further on in the project.

Matt Squire & James Morgan planning it out.

Building the Infrastructure

When choosing the tools for the job, we would select the ones already in use at KnownOrigin wherever possible. All the infrastructure we built was hosted on Google Cloud Platform, using the clients existing accounts and running on their Kubernetes cluster. This allowed for seamless integration into their existing offering.

When it came to data science and machine learning tools, there were fewer restrictions in terms of fitting into existing infrastructure, so we were able to weigh other factors like scalability, user experience, and pre-existing expertise. For the bulk of our experimentation, we built and ran machine learning pipelines using ZenML, using integrations with MLFlow for experiment tracking and as an artifact store. When it came to deploy the models, we used Seldon to get the finished models production ready.

Harnessing the Data

To demonstrate the utility of what we built, we ultimately tackled three data science challenges over the course of the project:

Customer categorisation: We derived a data-driven metric that categorised customers as long-term investors or short-term flippers.
Time-to-sale prediction: Using information from historical sales, we developed a model to predict time on market for NFTs based on things like time of purchase, purchase cost and rarity
Recommendation Engine: We investigated techniques for building a recommendation engine, based both on visual similarities between pieces and on similarities between different customer’s profiles.

These challenges were chosen not just because their solutions would provide significant value to KnownOrigin, but also because they ramped up in complexity. The first could be done using relatively simple data about a customer’s transaction history, while the last could require using computer vision models to understand the contents of the artwork itself.

For the first task, before jumping into building a bunch of cool machine learning models, we started by just looking at the data and seeing what useful metrics we could build in a purely data-driven way. We developed a metric based on comparing the average sale time per wallet to the population average, initially honing our proof of concept in a series of notebooks we shared with the client, before solidifying the final product into a ZenML pipeline, pulling data from the Postgres-backed indexer we’d previously built.

To find a second task, we looked at what information we could surface in a user’s dashboard that would help them in conducting transactions. As a buyer, if you’re buying an NFT it’s useful to know how long an NFT is likely to spend on the secondary market when they come to try to sell it. Is it something you can flip quickly if you need to liquidate your position quickly, or is it something that has a small pool of buyers who you may have to wait around for? For this reason, we started looking at whether predicted time-to-sell was another metric we could add.

Compared to the previous metric, this proved trickier: there are a lot of factors that can feed into sale time, and they can interact in unexpected ways. This made this metric a natural candidate for prediction via machine learning. In particular, we created a survival analysis model based around Cox’s proportional hazard, again implemented in ZenML.

Finally, we investigated whether it would be feasible to construct a recommendation system, allowing sellers to see wallets which had bought similar products and therefore may be potential customers. For this, we investigated several approaches, from collaborative filtering: determining a customer’s preferences by comparing them to similar customers, to feature extraction on the images themselves, determining similarities between the images a user currently held and others on the market. In particular, in our final demo to the client, we showed some really compelling results using FAISS (Meta’s image similarity search tool) for generating good recommendations for buyers.

While we were aiming for more than just a proof of concept, we also saw what we were doing as a crucial part of laying the groundwork for future data science projects at KnownOrigin. If you already have tangible results and solid data pipelines in place, bringing on more data scientists and doing further development is a much more attractive proposition.

Knowledge-Transfer and Productionising

Working closely with the customer and upskilling them is a key part of how we work with Fuzzy Labs. KnownOrigin have a very capable technical team but they don't have the specific knowledge and experience that we have in MLOps.

As such, we worked hard to ensure that over the course of the project, we gave KnownOrigin the skills and tools to enable them to continue developing new features after we completed our work together.

We used two main strategies to accomplish this: Firstly, we held regular demos, incrementally showing our progress and highlighting any challenges or major victories along the way. Secondly, at the end of the project, we presented a technical deep dive session, in which we gave a complete overview of the systems we had built, allowed KnownOrigin to ask all the questions they needed and gave a vision for what role we believed the system could play in their future. From this, they were able to build a complete understanding of ZenML pipelines that we’d built, meaning that they were able to not only maintain the current work, but expand it as they saw fit.

Conclusion

In large organisations, neither MLOps nor data science truly exist in isolation from each other. MLOps is the core that makes data science work, whilst data science brings purpose to all the efforts that go into solid MLOps engineering. In this project, we were able to bring both to the table, growing the infrastructure needed to make it possible for KnownOrigin to properly exploit the signal in their data, whilst also demonstrating its utility with increasingly complex business applications built on from that data.

What KnownOrigin Had to Say

“We’re a tech-led business that took a risk on blockchain technologies before it was cool! Today, we’re one of the major digital art marketplaces, with a specialist tech team that handles vast amounts of transaction data on our platform. We know our onions technically and wanted to work with a partner we could trust—one that would live up to our standards as we explored how to use machine learning to provide additional insights for our customers.

We knew about Fuzzy Labs through the Manchester tech ecosystem and were confident they had the people and skills to guide us on this journey. They didn’t disappoint. We now have a deeper understanding of our data than we did before, but more importantly, we’ve laid the foundation to continue our machine learning journey and deliver genuine AI-driven capabilities to our platform users.

The Fuzzy Labs team didn’t just build something to sit alongside our existing stack—they took the time to understand our current GCP and Kubernetes environment and built a solution that integrated seamlessly, allowing us to easily maintain it moving forward."

James Morgan - Co-Founder & CTO