Background
As former dancing Microsoft CEO Steve Ballmer famously taught us, developers are the lifeblood of technology companies. Our client was looking for innovative ways to engage with and grow its developer community.
Our client is a global hardware company. They depend on talented engineers, makers, and developers around the world to build end-user facing products on top of their hardware. When developers buy into a platform, they’ll use it to build great products, and that’s really what drives growth and revenue for this client.
A significant barrier to developer adoption is the quality of documentation and educational material. It’s not just about how well-written it is, but how easy it is to find the right information at the right time. We set out to give developers a tool that would help them quickly get up-to-speed on specifics of the hardware and its supporting software ecosystem, and we looked toward open source large language models to do this.
Below we detail the journey we took together, from building a working prototype that proved the art of the possible while getting business buy-in, to validating our approach with developers, and rolling it out to production in a scalable and robust way.
Working prototype to get buy-in
Data is what makes or breaks a project like this. Everything else comes second to getting the data right, so to begin with, we ran a discovery workshop to help us do just that.
The focus would be on educational content and documentation. As well as text from PDFs, we had videos, and presentations to work with. Throughout, we worked closely with domain experts to understand this content, as well as map out the improved experience we wanted to give to the developers who want to access it.
The client was keen to know that they would remain in full control of both the data and the technology we used. They had considered using a software-as-a-service approach, like OpenAI, but this would mean sharing large amounts of data with a 3rd party, and not having full ownership of the underlying technology. Our open source approach was attractive for this reason; because we build everything on a foundation of open source, there’s no vendor lock-in, and the customer has complete autonomy around how they use the end product, which is theirs to own.
For a working prototype, we wanted to demonstrate how an off-the-shelf open source large language model, augmented with their content, could guide users to the right content. We built data pipelines that transformed each content type into a common format, so that the model could work across all relevant content. All this data got pushed to a vector database, and we used retrieval augmented generation (RAG) to build a conversational agent capable of directing users to relevant content.
Our working prototype served to demonstrate what’s possible. It enabled us to show the wider business that this tooling had the potential to enhance developer experience, making the case for further investment.
Developer validation
Without external feedback, it’s impossible to measure how a tool like this would impact developers. So we needed to get in front of them, to test and validate our thinking. But before we could do that, we would need a version 2, incorporating internal feedback on the prototype, as well as putting us in the best position to gather external feedback from users.
We focused on a number of key areas:
- Guardrails: as users would be interacting directly with a conversational agent, we needed to prevent off-topic questions from being asked, as well as avoiding harmful or commercially-sensitive information being surfaced in responses.
- Capturing feedback: to help us improve the model’s responses, we set up mechanisms to capture user feedback, and automation to support feedback-driven model improvements.
- Load testing and initial scaling: although we didn’t need to handle production-scale traffic, we still needed to ensure that the prototype would scale to the number of users involved in testing the prototype. We performed some load testing and undertook some initial scaling work, also paving the way for more scale later on.
Alongside validating and iterating on our prototype, we also worked with the client’s technical team, upskilling them so that they were ready to take on and run the solution in the future. For this client, building that in-house base of knowledge around large language models was extremely important.
Getting to production
After a successful validation stage, the client was keen to get what we had built ready for production.
The groundwork was already laid: good MLOps practice needs to be-place from day 1. So that meant we had infrastructure as code, reproducible builds, and automated data pipelines ready to go.
For production, the focus was around scale and putting in place the tooling necessary to maintain the solution in the long term:
- Scaling: taking a large language model from prototype to supporting dozens or hundreds of requests per hour is especially challenging. We needed not only to make it scale, but to do so while remaining cost-effective, by optimising GPU utilisation.
- Logging and monitoring: a key component of running a production-grade application, logging and monitoring allows us and our client to understand traffic, resource usage, and to identify faults in the system.
- Further guardrails: based on feedback from the client, we had a number of things to improve in the guardrails. The balance between what’s allowed and disallowed in a guardrails system is tricky to get right, and we learnt a lot from the validation stage which helped us tune things in this stage.
At the same time, we continued to work with the customer’s engineers to train them on the technical nuances associated with running LLMs in production, enabling them to add further content types, maintain guardrails, and continue to scale and improve the solution.
Outcomes
It’s fair to say that we’re all on a journey with GenAI and particularly LLM’s to see a) how it can impact our business and b) what the right choice of technology is for our organisation. In this case, we’ve helped our client to build a strong business case internally and laid the foundations that enables them to push this service out to external customers and realise some tangible value from their investment.
At the same time we have helped to upskill their engineering team on the fundamentals of an open source RAG based system that doesn’t have any vendor lock-in and mitigates any data privacy concerns that could be associated with a proprietary solution. These strong technical foundations will allow them to keep pace with the latest open source LLM models as they become available without having to rewrite any of the existing technology stack.We’re excited to see how these foundations help them on their mission to drive developer engagement and to continue supporting them on this journey.