Getting the Data For Your Data-Driven Decisions with Jonathan Bloch & Scott Voigt
Jonathan Bloch is CEO at Exchange Data International (EDI) and a seasoned businessman with 40 years experience in information provision. He started work in the newsletter industry and ran the US subsidiary of a UK public company before joining its main board as head of its publishing division. He has been a director and/or chair of several companies and is currently a non executive director of an FCA registered investment bank. In 1994 he founded Exchange Data International (EDI) a London based financial data provider. EDI now has over 450 clients across three continents and is based in the UK, USA, India and Morocco employing 500 people.
Scott Voigt is CEO and co-founder at Fullstory. Scott has enjoyed helping early-stage software businesses grow since the mid 90s, when he helped launch and take public nFront—one of the world's first Internet banking service providers. Prior to co-founding Fullstory, Voigt led marketing at Silverpop before the company was acquired by IBM. Previously, he worked at Noro-Moseley Partners, the Southeast's largest Venture firm, and also served as COO at Innuvo, which was acquired by Google. Scott teamed up with two former Innuvo colleagues, and the group developed the earliest iterations of Fullstory to understand how an existing product was performing. It was quickly apparent that this new platform provided the greatest value—and the rest is history.
Richie helps individuals and organizations get better at using data and AI. He's been a data scientist since before it was called data science, and has written two books and created many DataCamp courses on the subject. He is a host of the DataFramed podcast, and runs DataCamp's webinar program.
Key Quotes
Behavioral data now lets you start to fold in intent in a weird way by programmatically understanding how was someone's mouse moving? Was it moving with intent? When they hit the site, did they know where they were going? Were they frustrated or confused? And believe it or not, those heuristics are in every visit to be able to understand the behavior of the visitor, even without knowing them. Then you can start to cluster those behaviors together to start to either change the experience and give a more personalized experience or remedy some sort of frustration.
There has been a paradigm shift in the way that companies are collecting this behavioral data. It's exciting to see. So up until a few years ago, the paradigm was, if I ran a website, I had to know as a product owner which moments in the customer's journey were important. The problem with that is that it's, it's a very biased set of data and that people have a hypothesis what's important. Recently there's been this movement to something called full-capture, where by default, you're just collecting all of that data privacy friendly way, and then you're able to get retroactive. I think the result of that is going to be just better digital experiences for most of the users out there because site owners, app owners are going to finally know what was working and what wasn't.
Key Takeaways
Use behavioral data to personalize interactions, identifying key frustration points or indicators to improve the user journey and tailor marketing responses based on user intent.
Adopting a full data capture approach enables comprehensive analytics and retrospective insights, allowing teams to identify trends and behaviors they may not have initially recognized as valuable.
Synthetic data can be used to mimic real user interactions and behaviors without privacy risks, providing a solution for testing and improving customer experiences without compromising sensitive information.
Transcript
Richie Cotton: Hi, Scott and Jonathan, welcome to the show.
Scott Voigt: for having us, Richie.
Jonathan Bloch: Thanks for having us.
Richie Cotton: Big question to begin with, where do organizations get their data from?
Scott Voigt: Obviously from lots of places, Captain Obvious would say. In, in my world. So I'm CEO of a company called Full Story. And so most of our customers, are paying a lot of attention to where they're getting their first party digital data. And in our case, that really comes from the visitors to their websites or their web apps or even, the mobile device.
And so, they're just thinking about every specific action that a visitor is taking place and making sure that they're paying attention to the good ones and ignoring the bad ones or the private ones.
Richie Cotton: Okay that seems fairly straightforward. You get some data directly when people come to your website, and then there's going to be data from elsewhere. Jonathan do you want to talk a bit more about, like, where people get their data from in your world?
Jonathan Bloch: Sure. So I'm CEO of a company called Exchange Data International, and we aggregate data from all exchanges around the world. So we get data from exchanges, from regulators, from central banks, from ministries of finance. All in all, we have probably 700 sources coming in daily, and then we have to translate them into English.
So we translate 39 languages a day because the world do... See more
Richie Cotton: Okay that's a very good point is that there's data all over the world and you want it in the right format that people in your organization can understand. Okay, so, Scott, you mentioned the idea of, like, the difference between first party data, third party data. So, the third party data coming from elsewhere, that's maybe a little murkier on what that involves.
So, Jonathan, you mentioned some examples of just banks and exchanges that provide data. Okay. Where else might you get your data from? What, what else constitutes third party data?
Scott Voigt: Well, in the digital realm there are all sorts of third party sources where marketers can go to sort of buy profiles, if you will, if you've ever accepted some weird terms of services on different sites out there, they're collecting data on you, Facebook et al and those, those are aggregated in different ways, sometimes sometimes.
Anonymous, sometimes mapping back to you or your phone. Those sort of things are third party data, where you're collecting data from around the web. First party really is if you want to think about it in, the real world, if someone were to walk into your store, They're in your store, your relationship is with that customer, you know them, you've created that relationship, that's first party data.
And so the same would be true if somebody went to your website or your web app, you have that relationship with them they've accepted a terms of service usually on the visit, certainly in, the EU that would be true. And that's first party data.
Richie Cotton: All right. So if it's coming through a separate organization, there's going to be third party data where you've not got that direct relationship with the customer. And so, with this third party data, it seems like there's going to be some sort of licensing involved. Jonathan these sort of regulations around data your area of expertise, what sort of data licenses are there for getting hold of this third party data?
Jonathan Bloch: in terms of data and licensing, it's a very fraught area, because there's a whole issue of copyright, there's a whole issue of trademarks, and increasingly, you find lawsuits about this. So for example, the New York Times and various other publications are suing the various AI operators saying, this is our data, how can you use it in your AI machines?
And those cases are still to be decided. And I spend a lot of my time dealing with this issue because the exchanges often say, we own the copyright to this data, but you can't own copyright in figures and numbers. So what they try and do is they try to circumvent the copyright in a very underhand way.
When you sign a contract with these exchanges, they say, You recognize that they have copyright, so they then bind you contractually. And I think you're going to see that in other areas start happening now, because as data becomes more valuable, particularly for AI, people are going to assert their rights over things they do not own, but then they'll bind you contractually.
So I suspect terms and conditions on websites and various other places will change significantly because of that.
Richie Cotton: That's fascinating. I didn't realize you couldn't copyright numbers, but it sounds like there's gonna be some extra legal language on top of that that says, okay, this is my data. You can have to pay to use it. And I guess just to follow up to that, if you are trying to monetize or sell your own corporate data, what do you have to do to protect it?
Jonathan Bloch: The only way you could protect it. is by building moats around it, making it so unique or making the process of obtaining it is so difficult that you are becoming unique provider. So we specialize in corporate actions data and we really only have four competitors worldwide. Because it's so expensive to accumulate and then you've got all the translation costs.
So you need to somehow make your data unique and find a niche where it's very difficult for others to enter.
Scott Voigt: It seems like we're entering a bit of the Wild West right now, Jonathan, if I'm, if I'm reading you right. We actually we were at headquarters. We brought some of our best engineers and our best product managers on site this week to really just talk about AI, what we're doing with it, how we should think about it, how we should think about it on our customers behalf.
And again, coming back to the idea of behavioral data. Well, when we as humans go to websites, And go to a site and read, we're having that first party relationship. If a bot, goes to that same website and reads and navigates around. Well, in the past I don't think that was a big copyright violation if the Google bot hit your website, but if all of a sudden the open AI bot is hitting your website or the anthropic website is doing something, it gets a little bit murkier there.
And so we, we were trying to figure out, well, cool. Like you said, Jonathan, you can create a moat around your content. But I do wonder if there isn't going to be some interesting moments where sites Are trying to understand when a Ai bot is hitting the site and then in real time watch that ai as it's moving through the site And trying to understand.
Okay. Do we want to embrace this one and reject this one? Do we want to create a different experience for the bot with different prices if you're an e commerce customer or Different watermarked type data for a publisher that maybe has something out there. And again, it's, might be able to put some terms and conditions.
Good guys will honor them. Bad actors won't honor them. So are we going to see this moment where websites are putting up chaff to kind of like confuse people when they come across, I don't know
Jonathan Bloch: I think you're absolutely right. It's going to be the Wild West. Because it's a bit like if you look at the invention of drones. So now you've got anti drones to shoot down the drones. Right? And the issue with AI is AI is going to be ubiquitous. Whether Google will exist in 10, 15 years time is a real open question because your search engines are probably going to be AI powered. So,
Scott Voigt: they are for me already as a human, I like, I feel bad for Google because I just use open AI to say, Hey, time is the football game on? And it just gives you the answer. And if you go to Google now and ask that same question, it doesn't give you the answer unless they're using Gemini, in which point they're spending money to give me the answer at Expense of not letting me go click through the sites where they would get paid.
And that's tough.
Jonathan Bloch: well, I mean, if you look at the history of technology, who remembers Netscape, AltaVista, Wang computers? It's the way of all flesh.
Scott Voigt: To create, you must destroy.
Jonathan Bloch: Exactly. Absolutely. Yes. A dangerous game being in tech. Okay. So a lot of great points there on like, what bots should be allowed to do. And I think yeah, historically. Most websites were happy for Google to scrape them because, they benefited from being in search results. But now, yeah, with AI, they're not necessarily getting things back.
Richie Cotton: Now, Scott, you mentioned the idea of behavioral data before. Can you just explain what that is? Is it the same thing as customer data?
Scott Voigt: Yeah, I would say it's probably a click deeper than just data. I mean, to us customer data is who you are and how much you spent. Behavioral data is, at least in the digital realm of something, is everything you did on your journey to buy that thing or accomplish that goal. And in the past, that was sort of a series of discrete steps.
I went to a page, I went to another page, I completed a transaction. Behavioral data now lets you start to fold in intent in a weird way by programmatically understanding how was someone's mouse moving. Was it moving with intent? When they hit the site, did they know where they were going? Were they frustrated or confused?
And believe it or not, those heuristics are in every visit to be able to understand the behavior of the visitor, even without knowing them. And then you can start to cluster those behaviors together to start to either change the experience and give a more personalized experience, or remedy some sort of frustration.
If all of a sudden there's a sudden spike and people We created this concept we would call a rage click where people just hammer on the mouse and frustration. We've all done that. That's a behavioral signal that people are now paying attention to so they can say there's been a spike in rage clicks.
And what do we need to do to remedy that so our customers can have a better experience?
Richie Cotton: You mentioned personalization. That's a very hot topic, but what sort of personalization might you want to provide for different customers?
Scott Voigt: I'm gonna struggle with a very simple example and then, you can start to extrapolate it, but, it's Friday night, you want to go buy a pizza, so you go to your, favorite online pizza place, and perhaps, I'm in the mood for a deal. And so I'm slowly scrolling through the page, I'm kind of hovering with behavioral intent over the coupons that are available there, but I'm not really doing something, and my tab goes dark all of a sudden, because maybe I've gone to my second favorite, Pizza place.
That's a behavior you can start to pay attention to during that experience. And perhaps, if I light that tab up, Given that I'm on the fence as to whether I buy or not, I can give me a really good discount and save that play. Now, I don't give that same really good discount to everyone because on a different night, I might go there, I go straight to my previous orders, I click on it, and there is high intent, high behavioral signal that I am going to convert come hell or high water.
What do you do there? Well, you offer me the great cross sell on those, Chicken wings that, you know, if I eat, I'm going to be hooked to him and I'm going to get them every time going forward. And it's really starting to personalize that experience around intent and behavior of what those visitors are doing.
And you can imagine really starting to evolve that concept in, in many ways.
Richie Cotton: Okay. I like the idea of just having a last minute save for the undecided customer is going to bring you more business. And Jonathan, is there an equivalent personalization in the financial data that you deal with? Do everyone just like standard issue stuff or do you customize for different groups?
Jonathan Bloch: mean, we do in terms of data, one of our characteristics or one of our major selling points is the customization of data because our big competitors that we have only four competitors, which are three stock exchanges at Bloomberg, they hate doing customization. And the reason they hate it is first, it's expensive to set up.
Secondly, it's even more expensive to maintain because you can write a special script. Sometimes they don't work, sometimes things go wrong, so you're going to have a lot of engineers to deal with that. So customization becomes very important. So for example, somebody trades only a certain sector on the stock market.
Say they trade motor vehicle manufacturers. They don't want a stock market feed of all the stocks on that exchange. They just want. The motor car manufacturers. And so we do that type of customization. And I think there's going to be a growing market for that. So on the one hand, you have the people providing.
What I call commodity information. They give you everything at a low price. There'll be increasing role for the boutique, which customizes to people's preference and interests.
Richie Cotton: So the value of the data, you can get basically higher margins if you are giving customers exactly what they want. So the personalizations are going to basically bring you more money that way. Since you sort of mentioned that personalization is expensive. Scott, do you have a feeling or like what's a good way to get started with personalization of experiences for customers?
Scott Voigt: my instinct and what we see with our customers is, you start with the biggest points of pain. in, it turns out that. If you haven't been paying attention to behavioral data and you start to pay attention to behavioral data, you'll quickly see that there are areas where you're just frustrating your customers.
you haven't been paying attention to a particular flow or journey. There are errors and lots of low hanging fruit. Get those out of the way and let's let's get customers remove the roadblocks for them to be able to to be successful in whatever they're trying to accomplish on the flip side of that.
I think what you can start to do is understand intent earlier in a journey and start to pull forward the things that meet what it is that that customer is aiming for. And we see lots of customers that are just taking the baby steps towards that through understanding and then design.
Richie Cotton: I like that. Just start with don't annoy your customers. Stop being
Scott Voigt: Doesn't it feel like there are too many places out there that still just annoy their customers?
Richie Cotton: Absolutely. Many terrible websites, many terrible store experiences as well. Yeah shopping can be hard. In fact, visiting most websites can be hard. Alright, so, I'd like to data sourcing. Suppose your chief data officer says, Okay, we need to be more mindful about where we get data from. We need a new strategy.
How do you form a data sourcing strategy?
Jonathan Bloch: Okay, well, important thing to do is firstly to define exactly what data you want. The scope of it. Secondly, once you define the scope, you should go out and search for all the options that are available. And then you don't need to start looking at things like quality, delivery mechanism, price.
Frequency of delivery, to take all the factors into account. Often what people do is, so I just want this data set, and don't do exact definitions, etc. And that costs them a lot of money in the end. You also got to define your, what is the purpose for which you want the data? So you don't land up with extraneous data.
So clearer definition is the key for data sourcing.
Richie Cotton: Okay. So a lot of planning up front. Think about what you actually want in order to solve your goal rather than just going and buying things. All right. So once you've done your planning, what's the sort of next step to implementation?
Jonathan Bloch: Well, the next step would be usually what people do is they go on to Google and they will search for people providing the data. Often what they will do is what they call an RFP, request for quotation, and they will set out exactly what data they're requiring. How it should be delivered, when it should be delivered, and for what purpose.
And then they start to get the proposals in, and then they start negotiating. They should then ask for sample data, or even better, a trial of the data. Because what people say the data is, and what it actually is, there can be a huge gap between the two.
Richie Cotton: Yeah, I can certainly see how you want to sample and just make sure that you are getting what you asked for. Uh, You'd think there'd be some sort of like free sampling process like you get when you go to the stores and they give you a little a taster of what you're going to get. Okay, so who needs to be involved in this?
Like who is typically responsible for sourcing data?
Jonathan Bloch: Well, I just, just want to say the reason why you've got to be so particular, most data is supplied on subscription. So you're not just buying a one off, you're committing for usually a one year period, might be a bit longer. So you want to make absolutely sure that you're getting what you expect, because it's not a one off payment.
And people involved were quite a few. The user, firstly. Second of all. your compliance officer or your legal officer to do the contracts. Thirdly, and most importantly, your IT people, because usually the data has to be integrated into your systems. And that is often easier said than done.
Richie Cotton: Okay yeah, I can certainly see how integration is going to be tricky. Are there any tools or technologies or platforms that make sourcing data easier?
Jonathan Bloch: Well, there's several. I mean, firstly, you've got the whole concept of APIs now, which is really a format agnostic. And you're just taking the data you require. Second of all, you've seen the growth of these various storage companies. which add a layer on top of AWS or Azure, people like Snowflake, people like data, Bricks, and there's several of those.
And we come across a situation where a lot of financial institutions will only take data which is now stored in one of those Storage facilities because it makes it so much easier for them.
Richie Cotton: Okay, so use one of these sort of big cloud data warehousing platforms, and that's going to make your life easier. And is the same true when you're looking at this first party behavioral data?
Scott Voigt: I actually think it's a little bit different, and I think the reason is just the underlying cost. For Jonathan, I don't know your business very well, but, you know, rich financial information that has been cultivated and translated sounds like a heavy process. If you think about the amount of data in a visit, And the storage cost to collect that data, it really is just not that expensive.
And so I, we sort of, as I talk about our team, to our team, you know, we believe that we, we should, we believe our customers should collect every valuable visit. And by and large, that's every visit. Because if they hit your site, you want to know, even if it was a bot that bounced, Who was it? Why was it in the cost?
So our job is to collect every visit and then extract the value from each one of those visits. The problem with that is, there is so much that can happen within every one of those visits. You might as well collect it all. use Google as sort of a parallel. wouldn't have been successful if they said let's only spider the biggest websites out there.
Or let's only spider the first page of every website. Or let's only spider top of each website. They went out and collected the entire internet. All of it, every word on every page sits in an index, and they were able to do that cost effectively because you didn't, they didn't know who might want to search on what, how that would fill into the algorithm.
And so I think if you're a company that cares remotely about digital experience, then you want to collect every bit of behavioral data that you can, even if you don't know exactly how you'll use it. Because the cost to process, the cost to store, the cost to infer they're relatively reasonable at this point.
And, wave's hand says AI, we're getting into this moment where perhaps, over the past few years, you had to have a human That had a hypothesis about the data to go query it to get an answer. A result with a I, you're now going to have machines that are constantly looking who was happy, who was sad, who was confused.
How do we make this better? And you're just going to need every bit of data you can to make that good and true.
Richie Cotton: That's really interesting, the idea of just being able to ask more questions than you have data scientists for, which is because you can use AI to automate a lot of the data analysis.
Scott Voigt: Yeah, that's coming. it might be here at this point. Data scientists are expensive, and I don't know how much you guys have played with Claude and ChatGPT, but they were very, very good at just kind of plowing insights out of things.
Richie Cotton: Absolutely. And so, since both of you mentioned the cost of these things How can you assess the sort of relative value of data versus the cost of storing and processing it and getting insights from it? Uh,
Scott Voigt: The bigger the company, the more the data, the more the data, the more the cost, the more complex the data One of the reasons, Jonathan, when you mentioned Snowflake and Databricks, I mean, people like to take all of their data and then push it together.
And then querying all of that data together is still expensive, that querying process. in limited scope, If you only want to query, say, the financial data and nothing else around it, it's probably relatively cost effective. If you only want to query the behavioral data, it's relatively cost effective.
But, the more complex, the more expensive. And so, therein lies the rub.
Jonathan Bloch: Well, I think what you've seen already is the three major cloud providers, Azure, AWS and Google being disrupted by newcomers. So you've got people like Akamai. Where the costs are 1 20th off a W. S. Because I costed something out. So what you'll see, you know, as well, we said before about potential demise of Google in the next 10, 15 years, you might also see some disruption in the cloud storage business.
And that's also one of the
Scott Voigt: clear for the record, I'm, I'm long on Google. I think they're
Jonathan Bloch: why,
Scott Voigt: They also happen to be one of my investors. So I'm, I'm big on Google. They're going to be great.
Richie Cotton: hopefully around long enough to uh, keep funding you. Yeah.
Jonathan Bloch: but I mean, I'm not suggesting that they will totally demise, but what I'm suggesting is that AI will become much more ubiquitous. And in fact, that's bad news for all those companies with AI in their name, the 17, 000 companies who grazed a lot of money for venture capital. I don't see much future for them because what will happen is the big guys, you know, the Because of the cost of computing power for AI, they will dominate.
So it'll be yet another area of technology where big guys dominate, be it, desktop software with Microsoft, Google with search engines, AI will have big players.
Richie Cotton: Certainly, yeah, at the foundation level where it just costs tens, hundreds of millions of dollars to train the thing, then yeah, there's only going to be space for a limited number of players.
Jonathan Bloch: But, one of the reasons why we see the growth APIs is because people do not want to store superfluous data. because of the cost, they're very how can I say, skillful in just extracting the data they require, and only storing that.
Richie Cotton: I've had quite a few guests on the show talking about like, what should your data storage strategy be? And the opinion seemed to vary wildly from, you should store absolutely everything, no matter how insignificant it seems now, it might come in handy 10 years down the line. And someone like, well, you know, there's no point in storing data that you're not going that you don't know you're going to get value from.
Just store what you want. It's going to save you cost, simplify your life. Do either of you have an opinion on where you land with this?
Scott Voigt: I'll take a first stab. If, zero that's zero to 10 if zero is only collect what you absolutely know you need and nothing else in 10 is any bit of data that you can get, you should grab it no matter what, and stick it in a warehouse. I probably am somewhere in the. 7 realm. I, I tend to believe that don't know what's going to be important and that the costs will continue to move in the right direction.
And the ability to mine Data, extract value from data will continue to increase. But, if you're sure that there's never going to be value in it, then don't collect it. Like, be pragmatic.
Jonathan Bloch: I agree with Scott, but accepting you can never recreate history, and we have done various exercises with people who wanted to recreate history. And it's so expensive, it's almost unobtainable.
Scott Voigt: What do you mean by recreate history just so I'm on the same page?
Jonathan Bloch: Okay. So for example, somebody says, give me 20 years of stock market prices, and you've got 10 years. You now have to go back and get another 10 years. And that is almost impossible.
Scott Voigt: my version of that, by the way, is, there are customers out there that will, they will look at costs and they might say, well, You know, we're pretty sure that those visits have no value at all. And so we're going to just, we're going to make sure that we don't ever collect the bot visits.
We've been talking about bot visits, right? We're not going to collect them. it's not even that it's expensive to reconstruct history. Once you don't collect certain things, you can never collect those sort of things. And so there's no way back machine if you didn't collect a certain bit of data. so I agree with Jonathan there.
Yeah.
Richie Cotton: Okay, so better to collect stuff just in case. And I mean, I guess you can always delete stuff later on. Yeah.
Scott Voigt: That's my cheat code, that makes me feel okay as long as I say it within reason.
Richie Cotton: Actually, this reminds me, there's like a sort of famous example at BBC, the British Broadcasting Company. So, Doctor Who, like the early episodes, they recorded in the 60s. And then at some point in the 1970s, someone was like, you know what Tape is expensive and they're recorded over the Doctor Who episodes and those things, I was like, just saving a few British pounds and those things are worth a fortune now, but they just don't exist because someone uh, recorded over them.
So yeah best not save a small amount of money for something that could be valuable in the future. All right. So, Jonathan, you mentioned that you need to translate a lot of the data you receive in order to increase the quality and the value. What sort of quality metrics you need to look for, particularly when you're getting external data.
Jonathan Bloch: So, I mean, the problem you have is no data provider can be 100 percent accurate. Anybody who says that they're 100 percent accurate is basically bullshitting, to put it bluntly. And there's reasons why you cannot be 100 percent accurate. Firstly, somebody can mistype something, and it's only picked up a lot later, because it doesn't correlate with some other sources of data.
Second of all, translation is not a science, it's also an art. And words can mean different things in different contexts. And that's a big problem with AI, for example, that it doesn't often understand context. And I can give you a very real example which I've dealt with. So we pay a third party company to use AI to scour the web for companies who are rumored to be going to So they've not yet registered to IPO. So a very big client comes back to me and says, we've received details of an IPO, but it comes out of a television script. Because the AI picked up some television script, which said, you know, such and such a company is going to IPO in five years time. So it misses context.
Richie Cotton: Yeah, I can certainly say there's probably room for disaster with like episodes of Succession or something or Silicon
Jonathan Bloch: the way, the way we measure quality of our output is By looking at the support tickets raised by our clients to see are there any systemic errors that are appearing? And that's the only way that we've discovered you could really keep a nab on quality without having two sources or three sources for every piece of data, which often is not possible.
Richie Cotton: That's interesting though, the idea of two or three sources. So I guess if there's different sources match and they're hopefully independent, then you've got some guarantee that the number is going to be right. And Scott, is the same true for first party data? Like how do you assess the quality of it?
Scott Voigt: Yeah, I think it's a little bit different. So at first party data you can just think of it is a machine taking a record keeping and it is just a straight record of what that visitor did. And machines are just very good at keeping clean records. There's you visited a site, you went here, you went here, you went here, And a very elegant, elegant log, if you want to think about it. And so, quality doesn't really play into the quality of the behavioral data. Where it does start to creep in is the interpretation of the quality of the behavioral data. So, it might not be intuitive if you run some sort of report that talks about a journey that, where you thought something was going to be true.
And it wasn't true. And so then the human has to sort of, check their intuitions and you can spend a lot of quality cycles. just validating. Yes. It recorded everything. Yes, it recorded everything correctly. It just didn't happen to line up with what you thought was gonna be true.
Richie Cotton: Okay, so, you got some sort of machine check to just say, is this the same as what I thought it was going to be? If it's different, then that's going to give you some impetus to go and look at it in,
Scott Voigt: In essence, it might be, I mean, the situation would be, Like, this is not a, did you miss something along the way? And there's probably a tier two, tier three check. Oversimplifying, was the machine running, did it see this? Was there a secondary source that also validated that, yes, it saw this?
pretty rare in that world.
Richie Cotton: I guess, the rated thing is, so suppose you've got high quality data, but how do you, what do you need to worry about with regarding data privacy? Especially maybe the the customer data, that's going to be a big thing. Cause it's got kind of some personal information in there.
Scott Voigt: Yeah, you hit it right on the head. the way that behavioral data is collected is by, harking back to our earlier discussion, basically capturing everything, where everything is everything, except Very private information. And you just need to be very cautious of it. So we've had to develop lots of tools and techniques and process to make sure that when you get to a page and you're a company that wants to not understand anything personally identifiable.
Now, it's first party, so you know your customers, and often it's quite a right to know that Richie visited the website, and he typed Richie in the first name. But if you're a company that does not want to know that information then you have to have tools that say, for this block. Don't ever collect that data, ever.
and that's one of those things that you constantly have to pay attention to. And fortunately building those tools, learning those patterns, understanding those, have given us a bunch of different approaches that say, for example we would call it private by default. You can still capture all of the behavioral data when a visitor hits a site, but you wouldn't collect any text for what the visitor saw.
Anything that they typed in, it's just, no text, all behavior. And then you can start to say, well, we know those are some safe areas, so we'll, allow that text to come through or that image to come through. if that helps paint a picture.
Richie Cotton: Okay. Yeah. So I guess most websites, it's like, I'm happy if people know my name that I visited, if it was, I don't know trying to think of an example like, I don't know if it's like a, therapy website or something like that, where it's like, okay, I'm looking for mental health advice or something.
You maybe want that to be more anonymous. I I'm sure there's plenty of
Scott Voigt: No, I, health information, financial information, like we serve, we have a number of, bank customers. And that bank already has that data. They understand your account balance. But. You just need to be so thoughtful about the propagation of financial data, where else might it show up, and what are their systems, and who has access to it, often best just left alone.
Let's not bring that data across ever.
Richie Cotton: Okay. That makes sense. Just be really mindful about who gets to see what. And Jonathan is since you're dealing with financial data do you also see data privacy issues?
Jonathan Bloch: Well, we try not to gather. Any information relating to the customers holdings, which identify them as the holder of that information. So when we get data from banks, we ask them to delete any customer information and just give us the securities for which they want information. Because better not to own that data because in case you get a data breach, the repercussions could be enormous.
Richie Cotton: Yeah, certainly I think most people wouldn't want their financial holdings, their portfolio being made publicly available. So that seems pretty important. So one of the big hot topics at the moment is synthetic data. So data generated either programmatically or with AI. Have either of you seen any examples of its use?
Because I think this is one of the things that can help with data privacy.
Scott Voigt: I mentioned the AI summit that we did in the office here, and we were actually talking a fair bit about it, is, is there a way to take the abstracted of a visitor and create sort of a synthetic visitor, if you will, that mirrors mouse movements and patterns and all of those things such that you can step around those privacy, you know, you just don't have to worry about privacy because you can understand interaction by using an artificial visitor that really mimics human interaction.
As opposed to just, synthetic button clicking, we haven't seen much of it yet. But is, you know, as of the writing of this yesterday or the recording of this yesterday, Anthropic launched sort of a computer interaction layer where you can Download Claude on your machine and tell it that you want it to fill out your expense report and it can just open up spreadsheets, open up pictures, do what it needs to and fill out an expense report.
And so this new frontier of having, is it synthetic? You know, having a machine be you and do things for you. It's going to be pretty interesting.
Richie Cotton: Okay. So, machines pretending to use websites as a fascinating.
Scott Voigt: Oh, it's coming. Yes.
Richie Cotton: And Jonathan does synthetic data exist within the world of finance?
Jonathan Bloch: Well, I mean, I think people are beginning to use AI. to write reports. And you've already seen a couple of lawsuits where the lawyers have been reprimanded because the AI has produced some precedents which didn't exist previously.
Richie Cotton: Yeah, when you're making use of AI to generate things, then you absolutely want to make sure that some humans checked it to make sure that it's, it's manufactured
Jonathan Bloch: I mean, I think anybody using AI now has got to put a human layer over it because it's not quite there yet. And I think, you know, it's great if you want to do a wedding speech or something like that. But if you wanted to do institutional or business information, you need a human overlay.
Richie Cotton: Absolutely. I mean, certainly for writing questions for the podcast, I will use AI to research companies and come up with some questions and then I mostly ignore them and just make up my
Scott Voigt: Rishi, we just assumed you were an AI this whole time. You're not? You're real?
Richie Cotton: Uh, Yeah, I've I'm going to clone myself shortly and then I can record more episodes. All right. So, I'd also like to talk about regulations cause it seems like there are so many regulations around data, around AI as well. So are there any regulations you need to be aware of when you are trying to find data sources?
Scott Voigt: Well, on the behavioral data front, certainly. I mean, there are lots of regulations. In Europe, GDPR has been a really good standard. In the states, we're seeing each state, CCPA, for example, covers California, have its own privacy regulations about what data can be collected, how most of those regulations are aimed back at third party data and the collection of third party data and buying and selling data.
But there are things that you certainly have to pay attention to on a first party basis. Don't want to collect health information, don't want to collect financial information, those sort of things. And you want to make sure that the visitor. Is aware of what data is being traded and a lot, you know, I mean, the emergence of every except cookie banner that is in the world, I think in large part because of those regulations, Jonathan probably knows more about this than I do, but it does seem that as it pertains to AI, there's a lot of talk about regulation, but there isn't anything that's very solid yet.
So everything's just Kind of being decided in the courts and that's going to take forever for it to shake out. So we know what that really looks like
Richie Cotton: Yeah, certainly. I mean, you've got the EU AI Act, which is sort of, coming into force gradually but the rest of the world is sort of still playing catch up. I think. And Jonathan, are there any regulations for finance that people need to be aware of?
Jonathan Bloch: Well, I mean, the huge amount of regulations which cover financial data, for example, insider trading. So the use of data to give somebody an advantage because they're trading on non publicly available data. And how do you define non publicly traded data becomes a very big issue.
Thank you.
Richie Cotton: Yeah, I'm curious about that because I always thought insider trading was very much about, well, overheard someone having a conversation about an IPO, which probably wasn't actually a TV show, and then I made a trade on this. So, yeah, if you're just using, data sources, tell me what can go wrong here.
Jonathan Bloch: Well, for example, what people have done is they've supplemented the data sources with the hire of consultants who have insider knowledge, and that's how data has historically been conveyed, and there have been several court cases as a result of that. So data can be communicated both verbally, by machine, in print, etc.
So there's those, that's one issue. The second issue is there's a lot of data which is mandated by regulation. People can manipulate that data. And we've seen court cases about that.
Richie Cotton: Okay. All right. So, basically be careful of what your consultants tell you and just make sure you're not breaking any laws if you act on that. But yeah that's pretty interesting, but you just need to be a little bit careful about making sure that you've not got data that's not publicly available when you're making your investments.
Jonathan Bloch: In fact, these days, when you supply data to hedge funds, they ask you to complete long due diligence questionnaires to make certain that they are not in receipt of any data they should not be receiving.
Richie Cotton: Okay. That sounds very sensible. I know hedge funds have good legal departments in general. So, yeah. Best pay attention to those people. All right. Super. So, just before we wrap up, what are you most excited about in the world of data at the moment? And you're not allowed to just say generative AI.
Jonathan Bloch: I think more and more data is becoming available. I think more and more things, you know, are being measured. More and more companies are realizing that. The data they gather in the normal course of business can be made commercially available because it has other uses. So the general availability of data is increasing and we are definitely not short of projects of bringing more data to market.
Richie Cotton: I do like that. Yeah. There's more data, more uses of data, and it's no longer big data. It's just. Data now and you know, you, you do whatever you want with it. Alright, super. Scott
Scott Voigt: Jonathan, thank you for giving me time to think, because my answer wasn't going to be generative AI. Look, there, there has been a paradigm shift in the way that companies are collecting this behavioral data and it's, exciting to see. So, up until a few years ago, the paradigm was if I ran a website.
I had to know as a product owner, which moments in the customer's journey were important, right? Add to cart change the size, hit checkout, complete. So I had to know that, and then I had to go convince an engineer to write code around each one of those moments. Wait for that data to populate and then get sort of a lackluster report telling me who did the problem with that is that it's a very biased set of data and that people have a hypothesis.
What's important? That may not be what's important. It's expensive, and it's incomplete. There's been this movement to something called full capture, where by default, you're just collecting all of that data, privacy friendly way, and then you're able to get retroactive. And so I think the result of that is going to be just better digital experiences for most of the users out there, because site owners, app owners are going to finally know what was working and what wasn't.
Richie Cotton: That sounds like it's sort of, radically changing the whole ab testing approach that product teams have,
Scott Voigt: I mean, just even think about the idea of A versus B testing. Why is it not A, B, C, D, E, Z testing? It should be all of those things as you develop towards personal experiences. But if you didn't have the data to understand the behaviors, you couldn't really do that.
Richie Cotton: wonderful. Okay. Yeah. Sounds like we've got a, a revolution on our hands. So, thank you both. This is a fascinating conversation. So yeah thank you both for your time.
Scott Voigt: Yeah, Richie. Thanks for having us.
Jonathan Bloch: Thanks for having us.
podcast
Data Trends & Predictions for 2023
podcast
The Future of Marketing Analytics with Cory Munchbach, CEO at BlueConic
podcast
Monetizing Data & AI with Vin Vashishta, Founder & AI Advisor at V Squared, & Tiffany Perkins-Munn, MD & Head of Data & Analytics at JPMC
podcast
How Data Science is Transforming the NBA
podcast
Using Data to Optimize Costs in Healthcare with Travis Dalton and Jocelyn Jiang President/CEO & VP of Data & Decision Science at MultiPlan
podcast