How I Nearly Got Fired For Running An A/B Test with Vanessa Larco, Former Partner at New Enterprise Associates
Vanessa Larco is a former Partner at NEA where she led Series A and Series B investment rounds and works with major consumer companies like DTC jewelry giant Mejuri, menopause symptom relief treatment Evernow, and home-swapping platform Kindred as well as major enterprise SaaS companies like Assembled, Orby AI, Granica AI, EvidentID, Rocket.Chat, Forethought AI. She is also a board observer at Forethought, SafeBase, Orby AI, Granica, Modyfi, and HEAVY.AI. She was a board observer at Robinhood until its IPO in 2021. Before she became an investor, she built consumer and enterprise tech herself at Microsoft, Disney, Twilio, and Box as a product leader.

Richie helps individuals and organizations get better at using data and AI. He's been a data scientist since before it was called data science, and has written two books and created many DataCamp courses on the subject. He is a host of the DataFramed podcast, and runs DataCamp's webinar program.
Key Quotes
You can't run an A/B test in a vacuum. You can't just say, I know nothing about your users, nothing about the product. I don't know the value prop. I'm just going to run this. Should it be a blue button or red button? I will build an amazing product off of these optimizations. I have yet to see where that works.
A/B testing is really helpful in some cases and not helpful in others. And it can be game changing in some ways and detrimental in others. And the art is in picking when A B test is applicable and is a strength and when it's actually not.
Key Takeaways
A/B testing can provide a competitive edge in user acquisition by allowing you to optimize the cost per user and target specific demographics effectively, as demonstrated by the success of niche themes like Gnome Town in social gaming.
When implementing A/B testing, it's crucial to balance short-term gains with long-term user experience, as excessive short-term optimizations can lead to user churn and platform penalties, as seen in the social gaming industry's decline.
A/B testing is most effective for optimizing small, contained elements like ad copy or button placement, but less so for creating entirely new products or major pivots, where a combination of data and creative intuition is necessary.
Transcript
Richie: Welcome to the show.
Vanessa: Hi. Thanks for having me.
Richie: Brilliant. So I'd like to start with your first experiences of AB testing. How did you get into it?
Vanessa: Yes, so I, when I left Xbox up in Seattle and moved to the Bay Area, I joined pla. And this was in the like peak social gaming when everyone was using Facebook and everyone was getting spammed to water their crops in these virtual games. That's when I joined Plato and the culture from the very beginning was AB testing.
I think Zinga was one of the, probably the pioneers of that idea and the whole ecosystem just adopted it. As we're gonna build games. Not off of gut. we're gonna build it off of data. And coming from Xbox, which is console gaming, which is very creative driven, gut driven I was fascinated by that.
Richie: And so I bought into the, ethos that everything should be AB tested and all decisions should be backed with data. And it's a bit extreme the right thing is probably something in the middle, but it was a really fascinating time to be a data-driven product manager.
Yeah, that's kind of wild how the rise of AB testing and games came from farming games. It's not something you would imagine would be the driver for like technological innovations.
Vanessa: Gaming, surprisingly, is an early adopter of tech innovations. There's, actually a lot of money in gaming. But it's hard to have a hit. So you'll take any edge you can get, which is why you ado... See more
Richie: So do you wanna sort of spell it out like how might doing all this experimentation provide some sort of competitive edge then?
Vanessa: There were some really clever things we did because we knew that getting users in the door was the key to making money. You can't make money if you don't have any users. How much you paid to acquire a user was very important. Now, that seems very trivial today, but back in the day, there wasn't a whole lot of attribution.
You couldn't figure out where you were getting your customers from, right? You put up a billboard and hope someone saw it, and you never knew where your signups came from. the internet made that a little better, but when you acquired a user on Facebook, you knew. Exactly, exactly what ad they saw, how many times they saw it, how much it cost to click, and then they installed your game.
So you saw the whole funnel and then you knew exactly who they were. You knew their age, you knew their gender, you knew where they lived, city, state and you just had all this information you don't really have before. And so the insight was like, man, if we could figure out how to target. That makes it so we can really optimize how much we paid to acquire which types of users.
And we don't acquire anyone. We don't need, so we don't, that's not wasteful, right? So, the user acquisition cost control was a huge unlock and we backtracked that. We're like, well, would it cost less if you had a game that the title and the art and the characters appealed more to the demographic you were targeting?
Logic would say yes. But how do you know what they want besides a focus group and whatnot? But we're like, you know what? We're gonna build a game. We're gonna know it's gonna have farming mechanics but we don't know what theme and we don't know what to call it. So we would brainstorm 12 different themes.
Nome Town Candy Land Chocolate Factory. Rainbow Town. I mean, , we had one that was like dinosaur. We had voodoo New Orleans themed, and we came up with like a dozen themes and we had a boy character, girl character. And so we standardized the little ad icon. And so the boy character and the girl character we themed to, dinosaur theme, voodoo theme, Candyland theme, no theme.
And then the background would also match that. So it was very templatized, so you could control the variables. We knew it. Girl character, boy, character, background, square. All the squares are same size. And so we would just test the theme, just run these ads. It would land you in a sign up for a wait list.
Game doesn't exist yet, right? But we would just wanna see which one of these ads would be the cheapest to convert the user. Meaning that they'd only have to see it a couple times before they'd click install. And the more impressions, the more expensive it got. And it was funny because we wanted AB testing to be part of the culture of our studio.
So we would take bets, we would ask everyone on our team like, which one do you think is gonna win? And so everyone would, , bet on their theme with money. Probably not kosher these days, but it was very much allowed back then. We'd put the money into a jar, we'd make our bets, and we'd run the test for two weeks and then we'd crown the winner.
And it was funny because when we ran all these tests, the the one nobody thought would win was Gnome Town. Like a gnome themed game. Turns out for the demographic we were going after, they really like gnomes. It's not that the whole demographic did, but there was a niche demographic that if they saw a gnome, they would absolutely play the game.
So our cac, our cost for user acquisition was in the pennies for that theme, and it was like 50 cents, 70 cents for other themes. And so we're like, I guess we're building a Nome game. You know, like we would've never picked it, but our audience is, it's the cheapest theme to acquire a user. And then we're like, all right, well let's test a bunch of different names.
Noam Town, g Nome, village G, no story. Gnome Candy. No. Like, so we ran a bunch of tests like, which name converts better? Noam Town one. I'm like, all right, we're building a game about gnomes. It's gonna be called g Nome Town because our cast acquire users is 10 cents. And then we build the game.
We're like, should it have this storyline? Should it have these characters? And so we would test different quests. We would test different mechanisms to earn the yellow digital house. Should you do a spinner? Should you, finish the quest to get it, should you just buy it off the little marketplace? And we'd run all these tests.
We were running probably like 30 tests a week just to determine the direction of the game and the game mechanics.
Richie: That's pretty amazing. And I suppose that's kind of the beauty of the testing, right? None of the people on the team could predict what the best user game was gonna be. Like, , I certainly wouldn't have been able to guess that gnomes were gonna win. So, yeah. I suppose, yeah, the, the daily shows something the humans couldn't have got to themselves.
Vanessa: Well, here's what we got wrong though. We tested the conversion. So it turns out if there's a really enthusiastic niche, your numbers will look really great if you hit that niche. The problem is you don't know the depth of the market. And we couldn't figure out how to measure TAM for a theme. And we ended up landing on.
Choosing a city, a small city where we could flood it with ads to see how deep, we'd see like, okay, cac, 10 cents, 10 cents, 10 cents, $20, Once you hit all the gnome enthusiasts, whoever is not a gnome enthusiast would be very expensive to acquire. We didn't see that coming, right? So our game grew super fast.
And then once we acquired every new nom enthusiast in the us. We couldn't acquire anyone else because CAC was so high and we didn't see that coming until, like, we hit a wall about 12 months later after launching the game. So that was another learning is like you can run tests, but then need to go like a couple steps deeper.
Richie: . That's very interesting. Yeah, I can certainly see how you either love it or you hate it if you've got a, known theme. So I can imagine there is that, sort of wall to jump. So, talk me through what do you do then? How do you deal with predicting these longer term things?
Vanessa: I think that's the biggest drawback from AB testing is that it is rather short term focused. Nobody really runs an AB test for a year. You don't have the time. And if you do, by the time you have the learnings, it's retroactively. So like you'll learn what you should have done a year ago and it doesn't help you a whole lot at that moment or forward thinking because you probably already knew that you were in trouble at that point.
And so where I landed was like AB testing is really helpful in some cases and not helpful in others. And it can be game changing in some ways and detrimental in others. And the art isn't picking. When AB test is applicable and is a strength and when it's actually not. And the general rule of somewhere I've landed is for optimizations for the best.
Like you have a button, you have something small contained great thing to test, add, copy, where should you go? Slogans, titles, all these things. Great. For net new, like if you wanna launch a whole new product or if you wanna do like a step function change or pivot, it's not that helpful you have to start from something.
So if you're trying to use AB testing to design that thing, you're trying to start, you're gonna end up with like a Frankenstein with like no opinion. It's just gonna be a hodgepodge of micro optimizations. Which is not cohesive. So I think you have to go with gut and then you can test things around the edges to validate some of the decisions.
But it can't be how you start, like it can't be the spark of the initial product or idea. Now, with Noam time, we knew it was gonna be a farming game. We had certain constraints, we had certain budgets, so we were like, we know it's a game. We know it's farming mechanics. We know there's gonna be characters, it's gonna be a Quest system.
Like we had all that engine already built out. We just needed to. Skin it and theme it.
Richie: So AB tests are much better for these smaller decisions rather than like the grand strategic sort of vision, I guess. Yeah. It's harder to do a test for like the whole game. You.
Vanessa: Yeah. And I would also say that one thing I did really appreciate about AB testing is it I learned a lot about my users. So by having my team bet on what we thought was gonna be the winning shard. We started getting really good at predicting our users' behaviors and getting in the mind of our users, and then we would supplement that with actual focus groups.
So we knew that if we put up a virtual good for sale in the marketplace, it would convert like 10% if we made people spin for it, like you had to do like little spinner and try to win it, we would get. Close to a hundred percent conversion and people would spend quadruple on it trying to spin to win it versus trying to just buy it from the marketplace.
and we also knew if we did a candy themed item versus like a strawberry or chocolate themed did really well, but if we did banana, it didn't. And so we started to know by AB testing these designs, this artwork we're like, okay, our users like strawberries and they don't like. I can't tell you why a banana fountain probably looks not great.
But a strawberry one sells a yellow home does better than a pink home. Can't tell you why. But we just know that our users like homes that are yellow. 'cause we ab tested a bunch of different colored homes. So towards the end of like the game 'cause games peak and fall. We knew our users backwards and forwards.
We knew the colors they liked, we knew the home styles that they would buy. We knew what kinds of surprises they liked and animations the sounds, the more bubbly sounds did better. And so by AB testing, you start, and if you do it frequently and you're trying to unpack why, and you do these on a weekly basis, you really do get into the minds of your users.
Richie: That does seem incredibly important. I think it's something that almost every business they want to know. What do their users want? Better. so I can better serve them. And actually I know it is a bit of a sketchy culture, but betting on what the users want, like what the tests are gonna be successful, that does provide some sort of good incentive , for all your staff to really care about, what the users want.
Because, you know, , they got some money riding on the line.
Vanessa: Yeah, it was a playful way to do it. But you want your team to care. I know it sounds super trivial. Of course they should care about the customer, but it's not just caring about them, it's knowing them, like deeply knowing them. And so our target demographic was women mid to late forties in non-cost cities.
And our team building, this was Late twenties, early thirties, mostly men in the Bay Area. And so, uh, it's, you have to build user empathy. How do you build it? And I think AB testing really helped our team develop that. We almost gamified it. Like how well do you know our users and how well can you predict their behaviors?
Richie: I guess that's also a really important learning is that like you are not your users and it said so often, particularly in software development, but I think people forget it sometimes. They build stuff that they want to, they would want to use themselves, but if you're not your own target.
At
Vanessa: much easier.
It's so much easier to build for yourself. But it also means that if most of the builders are pretty homogenous, then those areas are very competitive. And if you build something that you're not the user of, there's probably less competition there.
Richie: So I'd also like to talk a bit about how you go about scaling this because it said you running what, 30 tests a week? , that seems like a lot. How do you get to that level of testing?
Vanessa: We built on in-house, so this was like before all the AB test platform, so we built our own ab test platform in-house and to release anything. The way you released it was through the platform. So it didn't make any sense to not test it. If you had to press the button, you'd have to say like, to how, what percentage of our users does this release go to?
So to not test it, you have to put a hundred percent in like the new release and zero in the test group. So you'd have to actively try to not test it. So you have to make releasing an AB test very easy. You also. Have to have people care about the results. 'cause it can be easy to launch, but it's such a pain in the butt.
So like two weeks later, draft the report and then try to make sense of it and try to see what the users were doing, especially if they like adopted an abandoned it or had some wonky, unexpected behavior, trying to make sense of it. And to do that every two weeks across 30 tests.
One of the things we did to combat that one was the, betting, the gambling. But two, we also held brown bags, lunch, brown bags where we would invite everybody in the company to sit in and learn about how our test perform and what our hypothesis was. And we opened it up for people to also help us in figuring out, like, especially if there was a weird.
Where some people loved it, some people hated it, or people adopted and then dumped it or adopted it but never monetized. And we're like, but if they like it, why wouldn't they buy it? If they tried it, why didn't they share it? So for those head scratchers, people really loved to hear about those.
And it was a company wide thing. And so knowing that we had this cadence of having to have this presentation to the company also helped us. Stay motivated in looking at the results and trying to make sense of them.
Richie: I like the idea that by trying to teach the rest of the company about what you are doing, that then helps you understand what your own results are. Because if you don't understand it, you're gonna be able to explain it to everyone else. . And I also like the idea that. You share your weirdest results.
'cause that's gonna be more interesting to people. If you don't quite understand what's going on, then, just open it up to everyone else.
Vanessa: Yeah.
Richie: Okay, cool. So, the Facebook sort of social gaming bubble was, pretty short lived, I think. So do you wanna talk about what went wrong and how did it fall to pieces?
Vanessa: Well, a few things. look intuitively, we knew that something was up because we would test a popup and it would increase monetization or virality. So we'd add in another popup and it would increase monetization, virality, and so the user would get two popups in a row, and it didn't seem to hurt retention, engagement, or monetization or virality.
We added a third popup seemed to just do better. We added a fourth popup, seemed to do better. And intuitive. You're like, this is a shitty experience. Like you log into the game, it's like pop up, pop up, pop up, pop up. That doesn't seem like great product design, but the numbers were saying. It's they would either share more or buy the thing and it had no impact on retention.
They would still come back, they'd still play the game. So we did a lot of those things where I'm like, that's counterintuitive. That doesn't seem right. And so if you erode the user experience, if you spam users over and over again, they may tolerate it, but at some point they will just flip. And it's not like they'll flip slowly, they just have enough and you're dead and you can't get them back.
But the numbers don't show you that. I wish it would be like, oh, you know that 10th popup really? Did you in like the engagement fell by 20% from these users, right? They log in less times per day. No, there was nothing, nothing, nothing. And then. Also turns out that if you spam users and they get pissed, the platform will shut you down.
And so Facebook was like, gosh, Zynga and Plato and all of these, like the newsfeed is no longer what your friends are doing, which is the point of, was the point of Facebook the newsfeed was to tell you what people are up to, what changed, where they're going, where they were, the pictures they uploaded.
The newsfeed was just all, what are your plants? Vanessa invited you to buy a gallon of milk. Your home needs tending, right? It was all these Facebook games just spamming the entire newsfeed, and so it didn't only erode the gaming experience, it eroded the platform experience, and platforms don't like that, right?
If you mess up Apple's user experience. They'll shut it down. I don't care. Like what API was that they gave you access to, they will shut it down overnight. They don't care. These platforms will protect their user experience above all. And so those are some insights we didn't have, right? We didn't know that if you piss off your users, they churn overnight and you can't get them back.
And we also didn't know, which is now like pretty obvious, but if you erode an experience on a platform, the platform will shut you down. So overnight Facebook was like, that's it. You gaming companies, you're not allowed to post the newsfeed. And they shut off newsfeed access for us and we were dead shut off Newsfeed access meant that we couldn't remind you to water your crops.
We couldn't remind you to tend your home. We couldn't remind you of anything. And turns out the games weren't good enough for it to stay in the back of your mind until they all stopped coming and we'd been spamming them for months. So the games accumulated all these popups and it was just a shit experience.
And then everybody jumped ship and it was over so fast.
Richie: Yeah, is kind. Of a disaster story. It sort of feels like slightly comic in some ways. Just like, yeah, we were spamming users and then they just revolted. So it's hard to be too sad about it, but I think there's some important lessons around how you interact with your users there, right.
Vanessa: And that AB tests can tell you, this is all looking great but if intuitive, you're like, this doesn't seem right, it probably isn't. And maybe you can design a test to figure it out and maybe you can't. But I think at the crux, you still have to use some amount of judgment. And I think if. You have a hunch that the AB test is telling you something, but maybe it's not the whole story. Dig in. Dig in a bit deeper. Run the user study or the focus group. Look at other pieces of data. Maybe design a different test. Yeah, I think there's just sometimes there might be more to the results you're seeing, and that's where the creativity falls into place.
Richie: Yeah. I love the idea of combining, well, we've got some data that says this, but also I'm using my own brain to think about , what does this actually mean? I guess the other trade off here is, between , the short term benefits of you've spam some muses, you've increased immediate engagement, but you also want to optimize for like long-term engagement and long-term.
So do you have any sense of like how you go at balancing these short and long-term objectives?
Vanessa: I said AB tests are really good for short-term objectives, not so great for long-term objectives. And gosh, if someone listening to this has figured it out, please shoot me a note, my LinkedIn dms are open. I would love to hear about how you do long-term AB testing where the results are influencing your decisions today.
It just seems like it doesn't add up, but. I think when you're looking at the long term, you have to ask yourself some questions of what do we think our users are gonna get from this? Just if they stay on our platform in a year, in two years and three years, how does this get better for them?
And if you're running a test and you're like this isn't aligned with what we've decided the product should look like a year from now. Then it, probably doesn't work. and maybe you shouldn't do that in the short term, but I think the question I always push our teams to think through is, how does this get better and more useful in a year or two, or three, or four or five?
Richie: I like that. Yeah. So you do have to just like, as a product manager, you gotta remind the rest of your team. We do need to think a bit more long term rather than just like, yeah, what,
Vanessa: it's not just long term, it's compounding value. So some of the tests we can do, then you can ask yourself, not does it convert better today, but does this compound value? And if you ask yourself, does this feature compound value? Then you may design the test a bit differently than just pure. What's the lift on monetization or what's the lift on virality?
Well
Richie: , how might you test compounding value?
Vanessa: at box, we knew that if you had all your documents in Box, that it would be more useful to you. And we could run some analysis on that and see that the more files a user had, the more active they were at box or like the more likelihood that a company would retain if they had just more files.
So it's not hard both from like analysis perspective to prove that if you have more files it's more useful. But just logic. If all your stuff is in one place, you're just gonna go to that place more often because that's where you're gonna find things you need. And so when we thought about companion value, we're like, well, how do we get you to put more files in?
Because the more files you have in the more useful this is to you. You know, at some point we realized that for some users, more files wasn't super helpful. And when we dug into, we were trying to figure out like why. And it wasn't obvious in data. So we ran some focus groups and they're like, well, now, we have so many things in there when I search, I can't find the thing.
And so then I have to organize it to find it. And so then we're like, oh, okay. How do, if you have everything in there, yeah, it could be pretty hard to find something. So we added like favorites, we added other shortcuts. So there was a, oh wait. There's too much. Maybe it's hard to find maybe we should solve that problem, but, overall, when you think through, like where, at a high level, where does this product fit in the user's life?
What is the value? Can you describe in a sentence what is the point? What are you solving for them? What's your value proposition? And then the features should add to that, should push that forward.
Richie: I like that. Yeah. Again, we're going back to the idea that you need to understand what do your users want and what problems. You solving for them?
Vanessa: You can't run a B test in a vacuum. You can't just say, I know nothing about your users, nothing about the product. I don't know the value prop. I'm just gonna run this. Should it be a blue button or red button? And I'll build an amazing product off of these optimizations. I have yet to see where that works.
Richie: So a lot of thinking about context, otherwise you're just gonna be doing nonsense. I guess that's true of a lot of things working with data. If you just, you can do data's cheap, you can do anything with it, but if it doesn't make sense, then.
Vanessa: look, actually there was this, like the rise and fall of the growth pm I got a job offer um, a while, a long time ago when I was in product to go, be head of growth. I was like, I don't wanna be head of growth. I don't wanna be head of product and growth. But I think product and growth need to go hand in hand because the growth people or the departments that had been built at that time when it was super popular to do growth hacking, a KAB testing, they didn't care about.
They were just like, I just need to get more users in the door. And so like, I will design all these tests and like they were separate from the product team, so they didn't actually care about the user experience, the value prop or any of that. It was just like I'm compensated on getting as many people through the door as possible.
I was like, I, having lived my experience in Plato, I don't think that's the way to do this. That is very shortsighted. And so I know it seems obvious that you need to know what your product does and understand it deeply and have user empathy to be able to run good AB tests. But that was not so obvious back in like 20 13, 14, and 15.
Richie: Yeah, I suppose, like the rise of a b testing happened over the last decade just because all these tools have got more powerful. It's for. So you mentioned Box. I'd love to hear about your time at Box. Was there a similar experimentation culture there as to what you had at Playdom?
Vanessa: No, and, and for good reason. When you. In a game, when you're changing the character, when you're changing the landscape, the color palette, the Quest system, users will pick it up or not use it. No big deal. When you're designing software that people rely on to get their jobs done that people rely on extensively to meet their deadlines you don't wanna mess with that, Like, imagine if. Riverside just changed all of this overnight, right before an interview that you had and you're fumbling around and your guests can't get in, you are going to be livid. And how much more mad would you get if Riverside's like, oh, sorry, you were in a B test. So we just wanna see how you would like this entirely new system, but we'll change it back if it doesn't perform well. What would be your response?
Richie: yes, that happens a lot. Actually Riverside. This is the podcasting platform. They do change their user, user interface quite frequently, and I'm often like, Hmm, not quite sure what I'm doing here. I'll look like a professional when we're recording. So, yeah, that is very difficult. When it's your job on the line, you want to understand how your software
Vanessa: Yeah. And if you're paying a lot of money for a platform, you don't take kindly to someone saying, oh, we're just testing this on you. You're like, no, I'm paying you money so I'm not your Guinea pig. Go Guinea pig, someone else, right? And so you have to be careful, especially there's this like, I call it evil word, but it's a prominent thing you should think about if you're an enterprise software.
It's called change management. And basically when you deploy to big enterprises they have to train all of their employees and they hold webinars and they hold office hours and they'll have boots in the lunch room to get everybody excited about the new piece of software that they're gonna roll out.
They spend six months to a year rolling something out and getting everybody trained. Now, if you get, you know, a thousand people in your organization trained, here's where the upload button is. Here's where the download button, this is how you share. Here's the permission. Do not click this button. This will make it a public link.
And then here comes, Vanessa runs an AB has changed everything. You have to update your instruction manuals. You have to do retraining for everyone. So. It's just different than a game, you know, or than like an Amazon shopping experience, or like, you're gonna buy the thing, whether the buy now button's orange or yellow.
But if your job depends on it, and the button change from orange to yellow and you're like, oh wait, it's yellow. Now maybe, I don't press it. I don't know what to do now. Maybe it's a different button. I don't wanna get fired for hitting the wrong button and publishing this to everybody.
And so it's different.
Richie: Yeah, it just seemed like the stakes were a lot higher here. And I'm getting the sense from what you're talking about that there's a, disaster story looming here. Uh, Can you tell us about that?
Vanessa: Okay. So I came from gaming and I was like, I need to know my users. I need to know what they like, what they don't like. We have this thing that, because when you launch a feature, some enterprises, some teams, some employees adopt it, but if not, if like 0.1% of my users adopt it and are using it. Is it worth it to keep it alive, right?
There's UI debt, there's tech debt. We have tested every time we do any launch on any other part of the product. So it's like, why don't we ever deprecate features? Why don't we ever just get rid of them? If nobody, if like 10 people are using them, we have millions of people on the platform, we should just kill the stuff that doesn't work, right?
So I wanted to launch this new feature, which needed some space in this menu option. And I was like, well, I wonder if people will find it, if it's in that menu or if I put it on the button at the top. So that was gonna be my AB test. We'd never really ran one before in the actual web app. And I was like, this will be great.
We'll learn a lot about how our users find things in the app. Right? Do they always go to the menu or do they typically look across the top? where do they first go when they wanna find something? then while I was at it, I was like, you know what? I should add another shard in here. There's this feature in the menu that really crowds it and.
I think if I got rid of it, nobody would notice, but I should run that as a test instead of running two separate tests, one for a while. And then the next one, I decided to do what I did in gimme a four-way test. One with like the control one with a button at the top, no menu item, and keep the old one one with it in the menu item and kill the menu item.
And so I, I created these four shards. And then I just decided, and the engineering team that worked on it was a data team. They were so excited to do a first AB test. They built this platform to be able to do it. We were super jazzed. We didn't tell anyone that we were gonna run this. We just said, we're launching this new feature and it's gonna be a button on the top.
We didn't tell anyone that. We also were gonna have it as a side menu option and that. A quarter of our users weren't gonna see the thing on the top, they were gonna see it on the menu option. So, okay, this may seem very logical that this is a bad idea, but at the time we were so excited we're doing a test, and in gaming, I would just launch these tests every week, every other week, and nobody cared.
So we launch it we put it in for the release to get like compounded and everything and get out throughout the evening. And we had teams all over the world using box. So in Japan, they got my test when we were all asleep. And turns out there was like this insurance company that really used that feature I was gonna deprecate.
They were the 0.01%. And remember, I actually, I started by user id, not by company id. So four different people at the same company sitting right next to each other could see four different UIs. You can now you can imagine how this is gonna go, right? So this one poor person was like, I need to do this repetitive task over and over again.
And oh my God, my button disappeared. Did your button disappear? And my button doesn't disappear. What do you mean your button disappeared? Well, I can't do my job. I need this button to like file this thing the way I'm supposed to. Like, let me see your screen. Yours is still there. Yeah, well reboot your computer.
Re log out, log back in. Oh my God. Maybe your account got corrupted or maybe you got hacked. So they call customer support. And the customer support agent, the way it worked is like it would create a new UDID, like a new enterprise ID in that account. They weren't in that shard, they just magically a placed in a different shard.
And so they're like, I have no idea. Why my screen looks different than, your screen looks different than that person's screen code red like sound, the alarms ring the beepers wake swat team up. We've gotta figure out what's happening here. This is bad. So they wake up a huge chunk of the security team, engineering team, the on-call team.
Everybody's trying to figure out what's happening. They're up all night. I stroll into the office at 9:00 AM before all my engineering team thinking I'm an overachiever for getting there early after working late. And they're like, people like waiting for me on dust. Like, did you hear what happened to your product?
I was like, what happened to my product? They're like, it's a disaster. I'm like, what's a disaster? They're like, this team can't find this button and we don't know what's happening. We can't replicate the issue. And I was like. I uh, we're running an AB test. We're just trying to see how people like this new ui and they're like, I'm sorry.
What are you doing? Yeah, yeah, yeah. We're running an AB test. We just wanna gauge user preferences. And also this button, no one ever uses it. So we figured what would happen if we got rid of it. They're like, one, you can't just get rid of things without informing people who are using it that you're gonna get rid of the thing.
I was like, yeah, but that runs against how you run AB tests. You can't tell people they're an AB test. You ruin the whole test. They didn't think that was funny, nor did they care about how you run an AB test and like the sanctity of the data, they were livid and they're like, you cannot run AB tests.
And this is like a very important account that's supposed to renew next month and now they are livid. And I was like, oh yeah, it's probably not a good idea to run an Navy test on a customer that's supposed to renew next month. That was sad. So then that account manager flips out and they're like, Vanessa ran a test on your customer.
So within like two hours I get called in. It's the CEO, the head of sales, the head of like customer success, the head of customer support. And I just get reamed. I'm like, what were you thinking? And I was like, I just wanted to get to know our users better. And they're like. Nope, we don't care. Like you're gonna cost us this account.
This is a disaster. Customers don't wanna be rent. So I was able to diffuse things and I was able to convince them. Like, we do need to understand, we have to be able to deprecate features. we just do like this platform. We can't just add, add, add forever. It's gonna be a disaster. So when we get a feature wrong or it doesn't hit the mark, we have to be able to get rid of it.
We can't just keep it forever because 10 people use it now. We could probably go about it differently. Maybe we like notify them. Maybe like it, the test doesn't have to be perfect because I don't wanna piss anyone off. But yeah, we have to be able to like understand what happens when we get our features.
So I convinced them on that. So we ended up , coming up with like a whole, like how do you end of life a feature, how do you test the impact of ending? Feature, So we agreed on that and then when I launched something new, I wanna know if it actually hits the goal, which is either increase engagement or increase files uploaded or whatever.
I wanna know if it's actually worked or didn't work. And that's the only way we were gonna get better as PMs on my team. That was a lot harder of a sales. So what we ended up agreeing on was we would not shard by user id. We would start by company id and then we'd bucket companies into like different categories, large, medium, small.
We could categorize them by industry, so we'd be like, okay, we wanna test this on, like, have the same number of companies in legal versus same number of companies that are in healthcare. So we'd have like a good mix. But I would never again, chart on user id. So no one, everyone at the same company will always have the same experience.
Not the perfect way to run an AB test, mathematically speaking, but. If you wanna survive and not get destroyed by your team, that's probably the right way to do it. And then what was the other concession? Oh, we'd have a blacklist. Customers we would absolutely not run a test on because they were either in the middle of a deployment and training their customers or they're about to renew or something.
But they were like completely excused from AB testing. And that's where we landed.
Richie: Okay. I say that is. It was probably not hilarious for you at the time, but it's hilarious to me now. Uh, Yeah. I can certainly see, oh, like you are like, oh yeah, I'm doing something amazing. And then you've just uh, yeah, the CEO's not happy. Uh, It's not a great place to be in CareerWise, but.
I'm glad you did manage to sort of eventually convince all your sort of management around this. Actually, do you have any more insight on like, how did you go out diffusing this situation? I'm curious as to like, if you're planning to start your own ab testing program, what's the most sensible way of convincing people that this is a good idea?
Vanessa: I would first get their buy-in before you do it. Don't do what I did, which was like, everyone obviously wants us to do this. It makes so much sense. I'm an order IB test. if you're in consumer, fine. But if you're in an enterprise company, absolutely get the buy-in for the people that it could affect their accounts sales customer success.
Customer support. So like if there is an issue, they're trained and they know what to do about it they can roll things back. So you definitely wanna hear people's concerns because people are gonna have a lot of concerns, especially if they've never done it before. And I would come up with protocols for like how you would manage different situations.
I would say not to be too dogmatic about the proper way to run AB test. Yes. Like in an ideal world, you want it to run for so much amount of time because you want statistical significance and you want as much diversity. And so you want it to be totally randomized, but you're gonna need a lot more flexibility.
The real world doesn't work that way. And having data that you think is good enough to give you directional feedback, I think that's a win. So I know that a lot of people out there who are very into data and AB testing will be like, no, what are you saying? But it's like you have to crawl before you walk.
You have to get your organization to buy into the benefits. And then I would say share the results broadly. once you get the buy-in and you can run some tests, share the learnings with everyone who you think would be interested and help them. Get something out of this effort because it, does require a lot of work for so many people in organizations when you do a program like this that you want to reward them.
You want them to get something out of it too. And funny enough, there were some areas we felt more comfortable ab testing with and others that we were less excited to do testing with for this. Steer low risk. I would solicit feedback from customer support and customers, like, is there anything you you wanna know about?
Like, if we got rid of this, would they care if we added this? Would they notice like, Ooh, we get to play too. I'm like, yeah, gimme some suggestions on a v tests. And so kind able to feel like they're part of it too. it is a heavy lift. It does create friction in the system, so just make sure everyone has something to gain from it.
Richie: That is really fascinating, the idea of getting support involved because I think support's one of those teams that just gets forgotten. And about until there's some sort of disaster. So, I mean, it's kinda sad for support, but yeah, getting them involved 'cause they are gonna have to deal with customer requests if something goes wrong.
Like your example of the disastrous test. So yeah, , that's a very cool idea.
Vanessa: side note, customer support was my secret weapon in being in a. Effective product manager, I would pull in customer support to help. I would manually QA everything before it went out the door because the buck stopped at me, right? Like , if we did a release and it was buggy or it was bad, or it was something was off by a pixel, it just looked sloppy.
That's on me. And so I would manually qa, it doesn't matter like what time we were done with the build. I manually curate everything. And having customer support come in. If they were interested in getting the build early or getting early access to the build and file bugs on us too. They knew exactly where customers were gonna trip up because all they do is see customers trip up.
And so we're like, ah, that, that's gonna give us a lot of customer support tickets. You gotta change it. People don't like it there. That's a right click. Nope. Nobody right clicks, like, you can't do that. And they would just see all the areas that were gonna explode tickets in ways that like, I couldn't predict.
And so having customer support be a part, a lot of like your development cycle is so valuable, so valuable. And also having them really like you and your product team is also really helpful. So when there is a bunch of bugs, they'll like jump in, run upstairs, be like, Hey, this is looking like it's gonna be a big problem.
You need to prioritize a fix for this asap. and so customer support teams, the closer you can combine that with product and testing, the much more success I think you'll have.
Richie: That seems like a very useful tip. I'm not sure many organizations do that, so, yeah. I think it, maybe it should be done more widely. Okay. So how about. Like, do you have any advice for the data team on what they need to do to run a more effective AB testing program?
Vanessa: Data teams are super important. They've always been important but I also feel like people don't understand where they spend the majority of the time. They think that all they do is Run reports or people ask them for the most like trivial stuff and they're like, that is not what we do. And so I think the one advice I'd give to data teams is to, continuously educate the organization on what are the big data investments they're making and what does that unlock for them.
I would try to set up as much self-serve as possible so that the data team can work on more strategic stuff rather than like. Tasks that are not additive to the data pipeline and data strategy. I think some PMs are really well versed in how you can use data and others aren't. And so spending time with the product managers of like, Hey.
These are the tags we want you to have in your data or this is how we want you to think about it because this is in return, what we'll be able to do for you. you just have to educate consistently and in the lens of like, this is what it could do for you. This is what it could do for your team and for your customers if you collaborated with us in this way.
and data teams change and their ethos and their goals change like every five years as their bigger unlocks or different company priorities. And so I think the education just shouldn't end. And data teams should, I almost feel like there should be like a data evangelist within the company because I feel like data teams become so insular because there's always so much to do.
They don't have time to evangelize. I think it could help them prioritize and help other teams help them as well. If there was just more of that education.
Richie: Brilliant. Yeah. I do like the idea of every company having a data evangelist. My job title is ex extremely rare. There should be more of us around.
Vanessa: Right.
Richie: Yeah. And certainly I'm right there with you on the idea that self-service analytics is the way forward. We talk about this a lot on the show.
It's, one of those dreams that it's kind of hard to reach, but I think most companies, they getting closer.
Vanessa: I think with AI we can at least get the natural language query closer to a query that a system can understand. And I, think that's the holy grail to be honest. I think from the beginning everyone's like, oh, you know, every PM I hired on my team at Plato what have to learn sql.
And there was this like Stanford class that was online that sign them up for, and everybody had to learn sequel. I was like, that's kind of a high barrier entry. And then it got a little more natural language. E and then there was like some suggestions that the system could do for you of like, could auto generate some reports and some dashboards and insights.
But I think if you could just ask your data a question in just the way you ask GPTA question. Then the training becomes less about how to ask and it becomes about what to ask, and that still needs some training. You still need to tell people like, you wanna know what your user did yesterday or how that jives with the change that you made three days ago. Right? You still have to get people in the thinking of experimentation, so they have to ask the right question, but they don't have to worry about , what words they're using.
Richie: so I guess the important skill is understanding what the business problem is you're trying to solve, and then at least getting a bit closer to turn into a problems.
So I'd like to talk a bit about your new role since you, you've switched from product to being a venture capitalist and since you're working with early stage companies, I guess there's not that much data there yet on each of the companies. So, can you still do experimentation there?
Vanessa: Well, I think you can do experimentation around user acquisition. You can always do that because you just target a bunch of users and see if they convert. some founders do it, others don't. I don't think it's a leading indicator of whether the company's gonna be successful or not.
It's just some founders just wanna lean on data to get some early answers early on. AB testing is hard when you don't have a lot of users. That is true, but understanding, your data for the users you do have is still helpful. And so even if you have a thousand, 2000, 5,000 beta users, probably not enough to run Navy test, but you can see.
How often they're using the product. You can see how long their sessions are on the product. You can see what clicks they're doing, what tasks they're accomplishing. And you may learn some new things of like, Hey, I really thought they'd come to, I dunno, book a trip. Instead of they're just browsing every morning and doing nothing but browsing.
Like, what does that mean? Like, well, maybe they're generating ideas. Maybe they're building wish, maybe you should do a wishlist builder. Like I, I don't know. But that is not the behavior you expected to see and you're seeing it. So let's run some tests on that. Let's launch some things. Let's see what's happening.
Or let's reach out and do a survey or video chat to some of these users. So I still think that I love working with teams when they have. A thousand, 2000 beta testers, because those are your most enthusiastic users. Those are gonna be your early adopters, your power users, they're gonna be the best cohort that you'll ever have.
But it's also the ones with the value proposition resonated and you could really understand why and see how do I make that more broadly applicable to people that have the problem, but it's not as burning. So that's fun. I still think looking at data is super fun. And it's interesting because VCs.
Don't typically understand product data. They understand revenue data where the money comes in and out of, they get that data backwards and forwards. They like put that into the spreadsheet and they know your CAC is this, your LTV is that your CAC to LTV ratio looks great.
It fits in this perfect little bucket that indicates great companies will invest, but that data shows up like way, that's a lagging indicator. That only happens when you've like met the need and understand the value prop and hit it like the nail on the head pretty perfectly. And so it's been fun to be a product person in venture because I can get conviction on companies based off their user data, their product data, and not their revenue data.
And then I know that it'll come, like if they figured this out, if the user behavior matches the value proposition, the rest will come. So it's been fun. it's a different use of my product skills that I did not expect to be applicable in this type of role.
Richie: That is very cool that even if you do only have like, a couple thousand users, you can still do some useful tests. You can still get some sort of interesting results. maybe find out, well, yeah, I guess fewer people to learn from, but you gonna be able to learn a lot more about them.
Vanessa: Well, those people will be willing to like hop on a call with you. They'd be willing to get on a Zoom because your first cohort is your most enthusiastic cohort. And if you make them feel VIP founding members, give them some like badge of first ones here, then they feel some ownership in this product being successful.
And they will, they will give you feedback, they will give you data, they will file bugs. And so that first cohort. That's the one to like learn as much as you can qualitative and quantitatively.
Richie: I feel like particularly like from a data point of view, you don't necessarily think of like, oh, let's have an actual conversation with one of my users. Yeah. But it works. Having, chatting with other humans is kind of a brilliant source of data, even if it's uh, a little bit more inefficient to collect compared to clicks.
But you know,
it's gonna get a lot.
Vanessa: sort of, right? Like if you saw some weird behavior. you'd probably come up with like 13 different tests. You just like, is it this? Is that? I'm like, or you could just call 'em and be like, why'd you do that? Like, oh, I just, in the mornings instead of browsing Instagram now I browse your app.
It's just super zen. I love the pretty pictures. And you're like, oh, how would you have found that out and day you would've had to so many tests. I'm like, that was just faster. And then you can write a test to validate that, But you just like shortcut it a bunch of stuff. Sometimes just talking to your users will help you like bypass a lot of brain damage.
Richie: I like that. Cool. Okay. So, I'm curious, how do you decide which product metrics.
Vanessa: Ah, I love this question. Okay. I know I've said this about a thousand times and it could probably be a drinking game on your podcast, but. Understand your value proposition. Like what is the point? Why do you deserve to exist in the life of your user? What is the thing you do for them? The more succinctly you can define that, I save them money on travel, I save them time on accounts reconciliation, I give them higher conversion rate from lead to close, whatever it is, like very succinct. And then from there you derive. The metrics, right? Then it's like, okay, well then like I wanna see leads come in, leads convert, and then it's like, okay, well what are some leading indicators before that happens? And then what are some indicators before that?
And then you just go earlier and earlier and earlier. And then you have your leading indicators and your lagging indicators, and you track all of those things. But it has to start with the value prop. So many founders just start with like the best practices, metrics. DAU over MAU sessions per user per day.
And then they show me these metrics. I'm like, but this doesn't map to your value prop, right? If you're looking at DAU over MAU and you're a travel app, people don't book travel every day. That's like the wrong metric, And so tell me what your point, what the value is, and then backtrack into the metrics that show that you are actually delivering on that value.
Let's go earlier and earlier and earlier and figure out if we're on track to that.
Richie: Yeah, that's interesting that like the amount of activity of users, that's a very common thing to track, but you don't necessarily always want that actually. Do you have any examples of like, what can go wrong if you have misaligned objectives? So what you're tracking is different to what the, the business goal is.
Vanessa: Look, I think at Box we had an admin console and, my team was the web app, mobile app. So we tracked engagement and so everyone thought like, everybody needs to track engagement. We need everybody to engage. 'cause when you engage, you retain, and if you retain you can you upsell.
And, but the admin console, the admin chose box because it's easier to administer this and set the controls and set it and forget it. No, IT admin wants to go into every single SaaS apps portal every single day. They would hate you for that. And so I remember they're like, ah, we can't get our engagement up.
And I'm like, wait, why do you want the engagement up? They're like, well, 'cause then we'll stay top of mind. I'm like. Do they want that? And they talk to you for admins, they're like, no, no, we love boxes. We like set it up and then it just like autopilots like provisions and the security settings carry and it follows the org chart and the permissioning.
And it's just like so great that I like set it up and unless something like needs my attention, I just trust boxes doing what it's doing. And I was like, wait, so you guys should be looking at like. The less engagement the better, because that means that they love you so much more. 'cause it's working and they trust you.
And everyone's head just exploded. They're like, oh, we want the anti usage. And so then that team changed it to the successes. How much customization do they set up upfront? Like how many of our security features do they enable and custom setups. And so what we want is when we launch these different things, we want people to adopt them and set it up in their environment.
So is it actually feature adoption and not engagement?
Richie: That's absolutely wild, that like if they've been just. Trying to boost this metric that they thought was a good idea. They'd been actively making the product worse. So it would be data-driven disaster.
Vanessa: exactly.
Richie: Okay. Alright. So yeah, definitely some lessons to be learned about, like, think really carefully about like what is the right metric for your product.
Alright. So, just to wrap up what's your final advice for people who wanna get better at running testing programs?
Vanessa: All starts with a why. I think I've been talking about that a lot in this, in this episode, but why do you want a testing program? What are you looking to get out of it? from there, who do you need the buy-in from? How do you provide value? To your customers, to your team members, to the executives.
Like, well, like How will the product, how will the company get better because of this? And really think through that. So start with like, what is the goal of this? What is the crux of it? Why would improve things? And then get buy-in from everyone and then go deliver on it. It sounds trivial, but it's so complicated in practice.
Richie: Yes, I feel easier said than done, but I like, there are some definite steps just in order to do this, like get the buyin first, then do the testing, not the other way around. Yeah. Cool stuff. Alright, wonderful. Lots of insights there. Thank you so much for your time.
Vanessa: Thanks for having me.
blog
Data Demystified: What is A/B Testing?
podcast
Make Your A/B Testing More Effective and Efficient
podcast
Scaling Experimentation at American Express with Amit Mondal, VP & Head of Digital Analytics & Experimentation at American Express
podcast
Behind the Scenes of Transamerica’s Data Transformation
podcast
Scaling Enterprise Analytics with Libby Duane Adams, Chief Advocacy Officer and Co-Founder of Alteryx
code-along