Architecting an AI native form builder
Arjun S
on
Apr 14, 2024
At Metaforms, we’ve been building AI-first products since GPT-3. When we pivoted and set out to build an AI native form (and form builder), we quickly realized that lot of fundamental aspects of the system design and architecture has to be different to make AI-product integration as seamless as possible. In my last blog post, I talked about the abstractions and internal tooling we built over time that have been helping us ship LLM completion features rapidly. In this post, I’ll talk about more tactical implementation details and system design decisions that went behind building our AI form builder, and how that helps us deliver a better product experience.
Quick context about the form builder:
A typical form builder like Google Form, Typeform, or Fillout roughly has the same flow - users define questions, configure each question (data type, required/optional, placeholders, etc), configure conditional branches, etc.
With Metaforms, we reimagined an AI native form builder experience from ground up. Our flow looks something like this:
Write the goal of this form, like “Find out why students are not using the referral program despite the cash incentive”.
Share context about your company/product/service by writing, uploading docs, or sharing URLs.
Set up your welcome screen.
Define mandatory questions (Journey) by describing them in natural language.
Identify and prioritize topics to help the AI form ask deeper follow-up questions and collect insights that are most relevant to you.
At each step, the builder heavily uses LLM completions to assist users build the best form. Few examples of how we do that:
Autocomplete goal/context to help with cold-start problem
Suggestions for the user to expand their goal with specific sub-goals
Suggestions for the user to add relevant context that is relevant to the goal and is most likely to be helpful when deciding what deeper questions to ask users, or even answer questions that people might have while filling the form
First draft suggestion to solve the cold start problem
Suggest the most relevant topics to collect deeper insights on and let users prioritize
Problem statements:
React to events in real-time
Every action the user performs on the builder gives Metaforms a better context of what the user is trying to build, and the rest of the experience should reflect that understanding to help users build a form that can bring them net new relevant insights. Intelligent products should be able to react to changes in real-time.
Examples:
By the time you write your goal and move to the context section, you should already have relevant context suggestions
By the time you reach the “Deeper Insights” section, you should already have a set of topic suggestions based on your form’s goal and context
…and so on.
Enable better state management, caching, and real-time updates to carry out complex chains
LLM completion latency fundamentally limits how many tokens you can sequentially generate before the user experience starts to degrade or becomes entirely unfeasible as a product. You can get around this in several ways:
Stream results and deliver value piece by piece as soon as they are made available
Break down larger problems into smaller ones, and have LLMs solve them in parallel
Heavily cache intermediate results in a chain, track what is outdated / still valid as changes occur, and selectively decide what completions in a chain to re-trigger
Abort steps within the chain as more changes occur simultaneously with ongoing requests
Practical implementation of this quickly gets complicated in a typical request-response model.
Process events independent of client session
While we see that users understand the overall efficiency gains and are often willing to wait for longer durations for AI features, bad UX when implementing AI features that require multiple steps and levels of planning leads to frustration. For example, today publishing a form Metaforms takes around 10-40 seconds depending on how many changes the user made since last publishing the form. If closing the tab or momentarily losing connection results in the entire transaction failing, that becomes very frustrating.
Support concurrent users and sync changes in real-time
This wasn’t a high priority for us right now, but we knew that at some point we’d want multiple users to edit forms simultaneously and collaborate without overwriting each other’s change. We wanted an architecture that supports this so that we wouldn’t have to redesign the entire system when we get to building this.
Solution:
We decided to go for a bidirectional client-server connection as opposed to a request-response model. We also adopted an event-driven architecture which was best suited for all the problems mentioned above. It would make it extremely simple to:
Implement multiple background tasks that reactively trigger based on different user actions
Keep track of what step caches are valid or have to be marked outdated based on the edit stream
Implement chains with sequential and parallel tasks each of which is triggered by events that indicate preconditions being satisfied, or skipped if cached results for that particular step are still valid
Let each step in the chain independently stream results to the client
Broadcast results to all clients listening for changes in a given form (multiple tabs or users working on the same form)
Broadcast changes by one client to all other clients
We also decided to have a global state that is synced in real time between the backend and all connected clients and defined MongoDB-like $set update operations for the server/client to pass state delta. As a result, whenever a user opens a form on the builder, they’d always start from the latest state (state of tasks that are still in progress from the last session, etc) and receive newer updates from ongoing tasks if any from that point onwards.
Also, all our services were already on ECS and we loved how convenient it was compared to managing our own cluster or running on bare metal. We were also already using AWS API Gateway’s Websocket for our (conversational) form, so we refactored our backend and moved most of the builder functionality to also use websocket to talk to the backend.
AWS API Gateway's WebSocket feature was a great choice because it abstracts away all the complexities of managing websocket connections when horizontally scaling services. Without it, we would have had to track which server each client is connected to, and load balance based on entity ID to make sure all users/sessions of a given form are connected to the same server. Or, we would have to let all servers cross-communicate and be aware of connections on other servers, so that messages on one server can be broadcasted to all the servers holding connections associated to that entity. When scaling down, we would also have to gracefully move all the open connections from the server being killed to another server with remaining capacity.
Instead, now API GW holds all the connections and passes through websocket payloads as REST POST requests to backend services after inserting a connection ID (which is unique to each websocket session) in the payload. The request can now be stateless-ly load balanced and handled by any server, and whenever any of the servers want to talk to the same client, send a POST request back to API GW with the same connection ID. The backend services now just maintain a global state on Redis mapping each connection ID to the entity it is listening for, so that any server can directly broadcast changes to all relevant clients.
Conclusion:
As we continue to innovate and improve our platform, we're also looking to expand our team with passionate and high-agency engineers who share our vision for building AI-native products that help collect life-changing insights. If you're interested in joining us on this exciting journey, we'd love to hear from you! - Join the team
Bangalore, India / San Francisco, US
WorkHack Inc. 2023