Weiter zum Inhalt

The Forecast for Time Series Forecasts with Rami Krispin, Senior Manager of Data Science at Apple

Richie and Rami explore time series foundation models and the case for scaling, feature engineering in the business world, communicating forecast uncertainty to stakeholders, the evolving role of data scientists as architects, and much more.
20. Apr. 2026

Rami Krispin's photo
Guest
Rami Krispin

Rami is a leading thinker on forecasting at scale. He is the author of "Hands-On Time Series Analysis with R" and the forthcoming "Applied Time Series Analysis and Forecasting" as well as the DataCamp course "Designing Forecasting Pipelines for Production".


Richie Cotton's photo
Host
Richie Cotton

Richie helps individuals and organizations get better at using data and AI. He's been a data scientist since before it was called data science, and has written two books and created many DataCamp courses on the subject. He is a host of the DataFramed podcast, and runs DataCamp's webinar program.

Chat with AI Richie about every episode of DataFramed - all data champs welcome!

Key Quotes

We are becoming more architects or designers rather than writing code and building modules from scratch. Think about someone doing architecture for buildings — they understand the engineering, they understand the materials needed to make it stable. This is the same transition we're going through, where we are vibe coding and becoming more designers rather than hard coders. And in this transition, when you understand what's happening under the hood, you can get better results.

If you remember a few years ago, Zillow were relying on their forecast to estimate the growth of prices — and they got it wrong. That really damaged the stock price. They were relying on an auto forecaster, and when it's doing great, it's great. But you need to understand that when you're using auto algorithms, sometimes you get it wrong, and this is where you need to think about how to mitigate those risks. Blaming the algorithm isn't fair. It's risk management rather than just algorithm.

Key Takeaways

1

Foundation models for time series are best suited for scale — when you have hundreds or thousands of series to forecast, they reduce manual effort significantly, but accuracy trade-offs are real and need to be actively managed.

2

Feature engineering remains a critical skill even as AI automates more of the forecasting pipeline. Knowing how to represent events, seasonality, and trend shifts in your data still separates good forecasts from bad ones — foundation models haven't solved this.

3

Risk management matters as much as model accuracy when forecasting at scale. Not every series carries the same weight — identifying your highest-priority forecasts and applying more scrutiny there is a more effective strategy than treating every series the same way.

Links From The Show

Forecasting: Principles and Practice External Link

Transcript

Richie Cotton: Hi Rami, welcome to the show. 

Rami Krispin: Thank you. I'm happy to be here. Thank you for the invitation. 

Richie Cotton: Yeah, great to have you here. So the hot new thing with time series is Time Series foundation models. So similar to we have foundation models for text. We've now got them for time series, but time series modeling has been around for decades now.

Why do we need foundation models for it? 

Rami Krispin: I think the short answer is scaling. When I started to work on time series data. Back years ago I use R and then there were the common Time series object was the Ts. For those who remember, this is a very notorious one that you can, it was built for a time that people use the monthly or quarterly dataset.

And since then we. Started to collect data in an exponential manner and think about it like your phone or any electronic device collecting logs. And so now when we are working with time service data, we have, this is probably the most common format of data, and we have more than we can actually process.

So trying to handle it with the traditional methods that are not very scalable, it's very challenging, and I think that was in recent years, we moved from the statistical models that are really great to what it was designed, very structural time series and to more like machine learning approach.

And now we are seeing the models that are based on foundation models that are taking it to the next lev... See more

el where they're being trained on. Large amount of data sets and new different use cases to handle a broader use cases at scale. And I think scale is the critical point here that now we can really handle more data sets.

With less effort. 

Richie Cotton: Okay. I, you mentioned the ts objects in half with dealing with time series, I had yeah, a lot of spent a lot of time like painfully wrangling those things. But yeah I love the idea that this enables time series analysis at scales. So you need better forecasting you a lot more forecasting.

Maybe tell me through what are the business benefits? Have you seen any success stories where you can do things now that you couldn't before? 

Rami Krispin: Think about the, the general use case of retail. You have. And also usually the competition, like the M five competition. I believe it was M five or M four that they took, I think warm mouths, data sets, like SKUs.

You have thousand of those. And the challenge in time series that if you think about the traditional times, there is a process you need to go analyze to understand the behavior, to understand what type of models you want to use, and. It's literally impossible to do it at that scale. If you have thousands of SKUs, think about like Walmart.

They want to know for the capacity planning, how many bread they're going to sell or to catch up or any other products, and do the planning. They have thousands or maybe hundreds of thousands of products that they're selling on their stores, and this is a great use case where you can get a decent forecast.

And again, like there is a trade off when you're going to scale your trade off on some level of accuracy. There is the, you need to choose your battles, right? You are getting instead of, accuracy of the level of a very low accuracy, you may miss some, but in generally your overall picture, you understand what is your.

De demand signal, and then you can plan better your storage capacity, all the process to get the products. I think those are the use case that people would like to use it. And I think this is where we are going. 

Richie Cotton: Oh, wow. Yeah. So I haven't really thought about. Having to have a time series to forecast the demand for every single piece of stock.

'cause you wanna know, just have exactly the right amount of that stock that you're gonna sell. And once you got hundreds of thousands of products that comes, that really is a scaling nightmare, yeah. We've had what, just in time delivery for sort of decades now, at least in theory, but having the data to like, make it right and like really optimize it.

Yeah. That's a real challenge. 

Rami Krispin: You see some of the development is coming from companies like Amazon and the reason that they also have the same problem, right? The managing inventory. 

Richie Cotton: Yeah, so certainly once you do that, better stock management, you can save on warehousing costs, you can save on shipping costs, you can save on I guess spoilage costs if you've got like food or things that are gonna have rotten.

So yeah. A lot of business benefits there. I'd love to talk a bit about how these new models work, but I guess to figure out that we've gotta take a step back and talk about how time series work traditionally. Can you start me through what does traditional time series evolve? What's the process?

Rami Krispin: Yeah, so when think about traditional time series, usually like the first name is coming as the arima or the old winter models that are models that have been here for close to one years and they're working like magic when you have a very structural time series and a with a low frequency, like a monthly data or quarterly or daily.

Those models are based on trying to, like breaking the series into the components the trend, the seasonality, and apply some different modeling. It's a combination of models that you are, stack them together to model those components. So for example, Ari has three layers, right? The auto aggressive, which is like the foundation of all.

Time series models like time series is very correlated with itself, right? That's the Magic on in Time series. The hour now is very correlated with hour, an hour ago. So if you know what was the hour ago temperature, it's easy to know what will be the next hour. And you will take this seasonality.

You have a course today, you put it together. You get a forecast that is, could be very accurate, at least short term. So those models are taking advantage of those kax seasonality. They took care and the trend and altogether. So when you adapt, you are adding those altogether. You get a decent forecast model.

The challenge with those traditional that they are. They're working really well. When you have a structural time series the demand signal is very clear. For example, the demand for electricity usually demand for electricity or natural gas or those type is very driven by the seasonality across the year.

If you are looking at the hourly demand for electricity we're using more when we wake up toward the end of the day, and then there is a drop, right? So those things are very. Where they will struggle is where the real world that if you're going into the business world where things change because policy change or there are some new products or something else that shift COVID came, right?

And this is where those models doing are less successful. And then where you want to either use some progression model machine learning model. That this is where the feature engineering coming into the the picture where you can tell the model about how to handle events, how to treat them, how to project them in the future in your forecast.

And I think that's the magic feature. Engineering, I think the time sales forecast. In the business world, it's mainly feature engineering. 

Richie Cotton: Yeah. It's interesting that, I do remember from like my sort of earlier days as data scientists where you think, okay, we've got this model, and then you're trying to figure out you need to have a feature to say this is a weekend, or this is like a public holiday, and then yet this all kind of like subtle things you need to understand.

Things that are going on in your business, and then turn that into some kind of like code representation in the data and yeah, you spend a lot of time just messing about do we believe this has an effect on, on, on the data? Yeah. It's a very manual process. Are these foundation models solving that how do they differ from traditional time series analysis?

Rami Krispin: Yeah. The, I think there are, and that's the still the challenge that I don't think anyone solved. They're good on understanding that A, there is some events here. It's outlier. Do they cannot tell if this event will occur in the future and put it in the forecast there? There is no way today to my best knowledge that, and that's a over the challenge of time series.

You see a spike? Does this spike is just a one time or is it going to. Related to some events that could occur in the future. And if you know the data that is going to occur, you put it and then you are adjusting your fog. You're learning from this event and apply for the future. And I think that's still like a challenge that they I don't think we have a clear solution what they're doing.

They're good at identify those events programmatically. They cannot tell about the forecast. They can, like they can, maybe you can. Let LLM to have a estimate if this is a one time or not. But it's very challenging for a model to understand, to make assumption about the future of those outliers.

Is it going to occur when it's going to occur? Because if it's not seasonal event, if we don't have indication, it's outlier, right? So if it's reoccurring, it's not outlier. But if it's just the one time, why did. We see this event, right? Is it just a, some new trend that we are going to see reoccurring or Right.

So that's the challenge. I don't think those models still have answer for this challenge. 

Richie Cotton: Okay. Yeah, so I remember last year, the April, so Taylor Swift wanted the most huge tour and whenever she would go to play a gig, it's broke all the hotel booking models 'cause everything's got booked out.

And I guess the same is true yeah, we got the World Cup the fifth World Cup happening later the summer. That's gonna break everything in all the cities there. So I guess. There's a challenge of like, how do you feed the models with the right information about these rare events?

Rami Krispin: Yeah. And I think what could augment, and I'm sure someone out there is doing it, or if not this is idea for people that are listening is take an LLM that go and research. If you see outlier create agent that go to. The internet and trying to understand what caused this spike and then conclude about is it an outlier or going to be in the future.

That's something that I think at some point we'll see it if it's not already someone build it. So I think we'll marching toward ev a really autonomous forecasting process. But you need, depending on what you're doing, focus. If you are doing focus on some sensitive data for your company, I'm not sure if people would feel comfortable yet to give the power of those decision making for autonomous agent.

In some cases, it's okay to be wrong. Like the repercussion of getting a wrong focus is, maybe you. Going back to the Walmart, you overestimate the demand for catch up. And maybe you got stuck with some units nobody get, there's going to be maybe some money involved. It's not live or right, or it's not like a huge loss for the company and you get it in other places that you probably overset this.

There are some cases when you need to make a decision that are very critical, and I think in those cases you want to validate, and this is usually like when I'm doing a forecast at scale, I'm looking at the, like maybe you have a, let's say, series that you forecast or but you will always have the top Foca series that are covered more and they have more weight and there you will spend more time.

So there is some kind like when you're doing a scale, you also need to sometimes think about where you'll be, want to be more strategic and where you are okay to let the model do it stuff under the risk that the model get it wrong. That's the risk. That's the threat of when you're doing focus at scale.

And you need to also think about risk management when you are building your strategy about how to go for. You how to focus at scale. 

Richie Cotton: I agree that having too much ketchup, it maybe like it, it's costing money, but it's not a real disaster. You mentioned there are some more serious situations. Have you got an example of that?

Like what's the worst thing that could happen by getting your forecast wrong? 

Rami Krispin: Think about it. You're doing a capacity plan for energy. All right? Then if you don't plan, let's say that you. Power plant requires some kind of, they're consuming natural gas or they are doing some other natural resource.

So you are depending on wind or solar power and you get it really wrong, and so they, you could end up without ability to produce electricity and then cause for. A lot of unhappy people. And so that's where you want to have control. And there is examples. I know remember there is a good example that is not necessarily involved with ai, but if you remember a few years ago Zillow, the website, they is 

Richie Cotton: their house, yes, 

Rami Krispin: we're relying on the forecast to estimate the growth of the, if I remember correctly, the growth of the market or the prices and they got it wrong, and that's really damaged the stock price. The company was really affected by this wrong estimate, and part of it were at least for. I was not there and I don't know from the company inside, but for what I read, and I remember they were relying on profit where the model is auto forecaster and when it's doing great, it's great.

But you need to understand that when you're using auto models like. Auto algorithms that sometimes you get it wrong, and this is where you need to think about how to mitigate those risks to make sure that you cannot, I think it's like blaming algorithm is not fair to blame the algorithm. And it's risk management rather than just algorithm.

Richie Cotton: Absolutely. Oh, but so this is Profit the Python package for Time series. You should probably clarify rather than

Rami Krispin: Yeah. 

Richie Cotton: Profit making 

Rami Krispin: new time series. Yes. Profit. Yeah. A library that was released long time ago by a team in, at the time, Facebook now Meta was really great library. It was introduced new features.

That will not exist at time and ideas in time series. But the, it was like a side project of a team in meta that never went to a production level. And so the code architecture of it was not a production ready, it was like three languages layers, right? If you think about, I use the, our version of all the Python version is the same.

It's calling Stand and Stand called. C or c plus. And so when you're trying to deploy it in production and mess up with the, all the dependency, it's very painful. And this team at some point left like the people that created Left Meta and also at least one of them that I know. And so it was not other maintenance or not really receive the care and love we should get from, as an open source, and I think that's where the concept remains, but it was not a production ready. And I also think about a lot of people took literally like the walls that they were describing in the library description of auto forecaster that can do automation for forecasting. And without understanding, I think, there were great features and if you use this feature in profit, you can really get a good forecast. And when you always, when you are using some auto algorithm, you just need to be aware that sometimes it get it wrong. 

Richie Cotton: I think that's true of a lot of machine learning. A lot of AI is it's very easy to just believe oh.

Bot says this, and then you don't question it. And that's always been a program I think. But so we just bit a profit now. The latest generation models so these are the foundation models for times too. We talked about. Amazon's got Kronos, Salesforce has got Moira admin's Wonder for Googles a few startups doing these things.

Do you have a sense of what are the good ones? Which ones should we, you'd be looking at? Like, how do you make a decision about which model to use? 

Rami Krispin: There are a couple of things that you need to consider when you are choosing a model. One is the, you want to benchmark it. It's not always, you need a foundation model to do a focus that maybe arima and old winter will do a great job with the cost of, a fraction of CP instead of starting to try to run it on GPU or stuff like this. So I think the first thing is to identify the needs. Then the second thing is you need to experiment and understand which type of models is we work on your data. And it could be a combination. You can use different and I, that's what I love to do generally when I'm being forecasting, not necessarily foundation models, but the additional forecasting.

I like to run all tracing between different models using back testing. And the idea is that you are using back testing to just to get it's like cross validation in machine learning. You test each model on multiple politicians. You want to make sure that model is consistent. The results. And you are slitting your data into multiple training and testing.

The difference between back testing and cost validation and time series, it's se sequential, but you have a window that you move it and you, each time you are having a training partition and you predict the next data points on as a new testing and then score it and get some score. And so this is give you.

You can understand if the model is stable and can reproduce good results over time. And then when you have a lot of time series, it's enabled to, not every time series look like the same, right? The distribution the underlying patterns might change between time series and some models will be, performing great with linear regression.

Others will in some ar right? So it's inevitable to find what is the best fit on each one. And I think that this should be the same when you are using foundation models. 'cause even within the, if you go to Corona or the Salesforce, they have a different size of models, right? In terms of, it's like a foundation model, like a LLM, right?

You have billion parameters and you have a. I know like a larger number of parameters that the model was frame and not necessarily need the big models to, to get a good result. And I think that's where you need to identify what is working best for your data and based on this, make a decision.

And it could be a combination. You don't need to lock yourself to only one model. You can try different and then choose the one that works best for you. 

Richie Cotton: Okay. Yeah. I love the idea that. You wanna try a few different models and then do some proper testing on them. And the idea of back testing is incredibly important because it's one of those things that is quite peculiar to time series forecasting.

You don't see it in other fields of machine learning. So I the idea like, test them up till See if the model correctly predicts And if so, then it's okay to use it for your forecasting. Okay, so we talked about choosing a model. We talked about doing some testing. I feel like if you want to get a time series forecast into production, there are a lot of other different steps as well that we need to discuss.

So yeah, talk me through what's your workflow from I'm gonna build a time series. I'm gonna use this for real stock price for sorry real, I guess stock forecasting in your warehouse or whatever, or you wanna do some real predictions? 

Rami Krispin: Yeah. So it start all, start with the data, right?

You need to get, and I think people don't realize that. I think in most, like every data science project, you spend most of your time to get the right data and in the right format. And you think about production. You need to think about how often you want to refresh the forecast, how often you want to refresh your data and start with the data, building a pipeline for the data and they get the right KPIs that you want to forecast.

And once you have the data, it's, you start with the experimentation. You want to identify. What models do you want to use or what is the appropriate models to use? And then at scale you may have different approaches and there are different methods to kind analyze your data at scale, right?

Cluster analysis and so on. And you can find different cluster and based on the patterns fit the right type of models and experimentation is the goal is to. You spend a lot of resources on identify different, you're doing like different experimentation of identify what is the best, like we talked about the foundation model that you can try different.

So that's the idea. You run back testing you iterate and you look at the results and if there is any room for improvement. And if there is, you keep iterating until you get the point that you think that your marginal improvement is low, that it doesn't make sense to continue. Or fa more features or training, more models.

And once you get to the point, this is where you move your models to production. You just select the one that you want to deploy instead of running all the things all the time. And you, when you register your models to production. Then comes the iterative process. Let's say that you are predicting demand for electricity.

And you do it every hours. You predict the next three days. So you need to build a scheduler that you can go and get the data and fetch it with the model and generate forecast. And so that's a something you need to think. And then the last and very important component in production is that you want to monitor your forecast, right?

So while you were getting a great results in your training, that could change in your training, testing could change in production because you may have a drift in the data pattern change. There are some something that might cause the models to go out of tune, and so monitoring is very critical in production and you want to every time to refresh, to go score your.

Previous forecast and see if you are aligned with your expectation, what you results you got on the back testing and set some threshold. You want to set alert that you a, your model is start to lose its accuracy. Let's take it back to the experimentation. Fix it. It back to production. 

Richie Cotton: That is interesting that like the idea that once you've launched it, once it's in production, you're not done, you do to keep monitoring.

I think it's the same, a lot of machine learning and AI stuff as well. Is that like the underlying situation if the world changes and then maybe your model breaks and so you do need to keep track of things that way. Alright. So I was thinking about a lot of business intelligence and every company's just filled with all these kind of.

Time series dashboards of oh, this metric is going up and down over the last few months. So we're often, it's often backward looking. And I guess, how do you go from that situation to now we're starting to think about the future and building in forecasts into your business thinking 

Rami Krispin: You mean like referring to specific events?

Richie Cotton: Like in general? I was just thinking just, I spent a lot of the morning staring at a load of Datacom dashboards and I'm like, oh yeah, A lot of these actually, they're just like, this is what happened in the last year. And most of the stuff on dashboards doesn't ever look forward in time.

And I guess is there is there a mindset shift there? Or like, how do you go about changing your analytics to start being forward looking? 

Rami Krispin: That's a good question.

Richie Cotton: It's a big one. 

Rami Krispin: Yeah. I think, again part of a and again, like this is what I like to do and it's made not represent the broader, but when you are building a focus and delivering your focus to a stakeholders you want also to be able to explain this is where the, there is a gap between data scientists and.

Business stakeholders that are not necessarily technical. And so dashboard are a great way to explain some of your decision that you made about the forecast and ability to also communicate your forecast to provide the your results. So usually I like to. Explain some of, like the process, we talked about, the back testing to show how back testing looks like, and then show the forecast.

'cause sometimes I get asked like, why did you use Heline regression or use those features. So and usually when I am providing a dashboard with the forecast, also highlight the feature that were used, highlight what the impact and enable. The stakeholders also understand how the focus is playing on those decisions.

So that's one way to, I'm not sure if it's answering your question, but it's enabled to understand the focus better and it's give like a ability to how to communicate it. I also like to like I think visualization is very critical in communicating or expanding forecast. For example, at scale, if we're going back to the scale.

Thing in time series when you're using some, like all stressing between different models, one way is like you can start by, you have some knowledge and you can start to apply this knowledge and then score and test. Or you are coming and you don't have knowledge, but you don't think that there is something special in your time service that are very organic growing.

So you are just applying a set of models and then you get the map of your score and you start to look for the one that are not performing well, and then go and tune those as opposed to try to overthink about the problem. This give you ability to move faster in some cases. Not always. We have the time to go and think about like when you are doing forecasting and you are like a.

You have a production line that you are a team coming with a problem. They give you a series and you need to give them results fast. And so that's something I found that is very helpful is to build those pipelines or like templates that you are just on board to understand what's going on and generate those forecast and then figure out what is working and what is not.

And then go fix. There are some cases that you are seeing from the get go that it's not going to work, and then you really need to be more strategic. But for nice time series that is, looks like a academic that is like in nature, that is like you can take a pen and continue the seasonal pattern or.

Align. This is work really well. 

Richie Cotton: Yeah, certainly. I can see if you want to make this a sort of an enterprise wide standard that you are gonna be trying to forecast as much as possible, then that communication aspect is gonna be incredibly important. Making sure that like everyone, not just the data team can understand what's going on.

You need all your business teams to be able to, look at the forecast and also believe them. Where do you start? Are there any metrics that you think are like good first use cases for if you want to start introducing forecasting into your organization? 

Rami Krispin: Yeah, I would think about the stuff that people resonate and understand really fast, like seasonality and growth, right?

So those things that if you're working in a financial world, this is like the bread and butter like. You want to show like year over year that your series growing and that's a good one that people understand like hey, I don't have seasonal, but oh yes, I do have seasonal five. So we are using a model that end seasonality.

The second one that I really like to use is decomposition, specifically. STL decomposition is my favorite. And if you're familiar with this, what it does, it's break down the. The series to its components, the trend, its estimated trend, the seasonality, and then the remainder. And essentially the remainder is all the patterns that left that are not represented by the seasonal or the trend components.

And what I like to do is to enrich this by calculating the standard deviations of the remainder and set like a. Ranges between two to three star deviation as a orange and above. Three in absolute value is red. And then point those points on the original series, and then you can start every conversation about why do we see a spike?

There is some anoma. And you can have a better conversation with your business about identify events as opposed to just stare at a serious and see some spike and start to talk. 'Cause this is give you the ability to, and again, like it's algorithm, so it's good also wrong, but generally it's doing a great job on explaining where you have points that are not normal.

There are anomalies. So it's easy to facilitate the conversation with the business stakeholder when you're coming with this type of visualization, and try to understand, again, going back to our previous conversation about is it just one time or we are expected to be in the future and then annotate it accordingly.

The other component, the trend, I think that in forecasting the most important component that you. Want to spend the time is getting the trend correctly because the seasonality, it's easy to capture. But if you are, if you're wrong on the seasonality, you can get away. But if you are wrong in the trend, you might get penalized very hard as the you start to get like this delta and they're in, they're separating your series with the forecast, the actuals.

The forecast over time. And this is like SDL give you like a estimate, a great estimate of your trend. And I augment it with another algorithm to de detect the eh, change points in the trend using piece wise regression. So when you have this and you can automate it you get a lot of information.

That could help you when you are feeding it to the model. So again, like we, before you are going to use foundation, all the stuff you, I think when you're understanding those concept, it's easy to understand how the foundation models are working on the backend. 'cause they're doing more, less the same. The learning for many series, and maybe they are more like a following the.

Reinforcement learning approach, but the idea is the same, right? You can build those algorithm that help you to extract those insight automatically and feed it to the models. 

Richie Cotton: Okay. You mentioned the idea of divergence between like reality and what the model is showing. And actually we had bill Azia who he runs the data team at Duolingo.

So he came on the show a few months ago and he said when he joined Duolingo, the data team, they were creating all these kind of forecasts that was completely wrong and no one trusted them. So I think we, we've talked about a few of these things but like how do you go about making sure that. Any forecast you produce are trustworthy because you're never gonna get perfect forecasts all the time.

So how do you make sure that the, the people have the, the appropriate level of trust in this forecast? 

Rami Krispin: I think first and foremost, it's communication with your stakeholders. And knowing your limitation forecast is good if your past reflect the future. There are some cases that you see there, like there are some recent change that, it's not something that your data can explain, for example, the COVID, right? When the COVID started, nobody knew where it going to go. And it's impact a lot of serious, if you think about it, right? And so I think telling your stakeholder, look, the historical cannot represent the future. We need to go different approach.

Let's do scenarios. Back your business model, back to the forecast. Let's say apply some different scenario, what if, and build it in using features. And I think be transparent with your stakeholders where you know that you cannot create forecast that is accurate because of the limitation of your data is important.

Okay? Sometimes people think it's like a magic. It's not magic, it's science. And science has limitation like every field of science. And knowing those limitations and communicating those com limitation with transparency, I think it's first and foremost to build this, to trust in your focus. 

Richie Cotton: Yeah.

I do think the idea of communicating with who, whoever's gonna be using your forecast, that's incredibly important. There's some ideas that are quite. Tricky to communicate. So I moved to the US in It was the week Donald Trump first got into power and there were a lot of people complaining at the election forecasters 'cause, if you cash your mind back a decade, Hillary Clinton was the favorite to win. And then she didn't win Donald Trump one and said, and there a lot of people go, oh, election science is absolute nonsense. But there was just a lot of uncertainty in the models and that wasn't communicated very well, I think to the public at large.

So talk me through, how do you communicate uncertainty around models very well? 'cause otherwise people tend to get a shock. 

Rami Krispin: Yeah, I, so first, the first thing that I'm trying to do, and I'm. First to admit that I'm not always successful is to advocate for prediction intervals. Use any type of methods that you want, but at least the goal is when you are sometimes stakeholders, they just want to know the point estimate because they need to give one number and it's very hard to understand why there is a range because they need to deliver one number.

But I think it's important to explain the range. Expand it. It's not just we are the point estimate is what we think that most likely will be, the conflict, the middle point. But if you think about % con prediction of us, there is a range that the two value could be between this range and expanding it.

I think it's very critical. I had a professor in my master. I took a statistical course at the University of Michigan at the business school, and he was a start professor. It was really funny and he had a sentence that resonate with me since then, and I understand it today more than I used to.

And when he told me the sentence like, sta statistician are very set people, they know that they're wrong from the get go and they go and measure it. And I think that when you are coming with this state of mind that you know that you're going to be wrong. The question is how much you're going to be wrong.

I think that's a, a very good start, that, your limitation. And yeah, so I think knowing the risk is of your forecast or how how long you're going to be. Is fundamental, and that's part of me, for example, doing back testing. I love back testing because it enable you to measure. How long you're going to be, or at least estimate how long you're going to be.

Richie Cotton: Absolutely. Yeah. I do have the idea that knowing that you're gonna be wrong some of the time is incredibly important. Mindset. If you're gonna be a statistician I guess communicating that to your boss is. Often a bit tricky of yeah, we might be wrong here. It's no, get it but I guess on that note are there any other skills that you think are important? If you want to get involved in time series forecasting or you want a career in this, 

Rami Krispin: we're going like, we are going to see more and more ai. The foundation models that can like self-drive and, but I think that you still need, if you want to be successful, there's no.

Right or wrong, but I always think that people that understand the statistics behind of it could go farther than people that don't have this background. And again, it's going back to like models, like profit, people that use profit out of the box without understanding how it works or how they can modify.

I think they got less from the algorithm than people that actually know how to. Use the features inside the models and tweak it and use it. And so knowing statistics, knowing how regression is working, knowing how feature engineering is working, I think will give you more regardless if you are using machine learning statistics or ai.

Foundation models 'cause. Also foundation models. You, when you use like any model, they're wrong. And then you need to, when you look at the series, you, if you understand how those models are working, you will have sense of why they're wrong. Maybe there is a features, okay, let's build this feature, let's say feed this features and so on.

And I think that's still requirements that I think that I see it in the near future. Maybe like we get, things are moving so fast. In a year from now. My kids could go and do a forecast without knowing anything. But I think there's still going to be some in requirements for some statistical knowledge, 

Richie Cotton: I guess a problem in a lot of areas of data.

I guess with machine learning as we DataRobot, we trying to figure out like automated feature engineering for a modern, a decade now, and then I guess more recently. In the field of generative ai, it's reached the point where a lot of, at least exploratory data analysis can be automated.

You just need to give it a bit of business context and it comes up with some quite good analysis. It sounds like with time series forecasting, we're not quite at that state yet, and humans are still gonna be, at least for the near future. 

Rami Krispin: I think we're getting close to, we are improving. We are, for where I started years ago and where we are today, it's completely different world. And I was spec very skeptical about the foundation models when it all started, but I think that's where we are, like where we are going, definitely they are going to be very dominant, I believe, in the future. And so things are moving really fast. But I feel like still we are becoming, and generally in data scientists, not just like a forecasting, we are becoming more architecture or designers rather than.

Writing a codes and building, modules from scratch. And I think in this process, when you understand as architecture, think about like someone that doing building like architecture to look at the creating buildings, right? They not necessarily. Familiar with, like they don't really are building it from scratch.

How they have, they know, but they understand the engineering, they understand the materials that need to be to build the buildings and make it stable and so on. So this is the same where transitioning to this phase that we are vibe coding and we're becoming more designers other than hard coding that we used to do for many years until recently.

And in this transition I think when you understand what's happening in below the hood. You are, I think, can get better results. 

Richie Cotton: I'm totally there with you that there's this sort of layer of abstraction now, like you're not going so much into low level. I can write this specific line of R code or Python code, and now it's more I'm doing something higher in trying to get a project result.

Do you wanna spell it out a bit more like what does this new data scientist or machine learning scientist time series, forecaster role look like to you? What is this architect? 

Rami Krispin: It all starts with some blueprint of what you want to build. At least that's how I like to do it. Think about a, I want to build a pipeline to generate a forecast.

Okay, what do I need? Writing a requirement documents and then start the brainstorming of maybe you already have the idea what you want to build and we direct with your edge with your AI tool that you're using Claude, or. Codes and say, this is what I want to build. Let's create a plan. And they're really good.

They're really good on give you the recommendation. And then my workflow is to, instead of ask it, to build it from A to z from scratch, it's go step by step. Make sure that I understand like the component or I modularize it, that I can then lift out something, going to break, that I can understand where it's break.

I cannot read. Go, like in the past we used to do like a PR that we code review and we go, bigger chunk of code and understand now the amount of code I'm creating in one day is exponential. We cannot review it as we used to do it in the past. So I think part of it is building it in a way that we can then review it with AI tools.

Is critical, but also that understand, not let it just write stuff like a black box, but we are building it as a building blocks that we can then if something went wrong or whether we need to maintain. It's easy to understand where we want to go. The other thing that's really the opportunities that I'm really excited about is that today.

When, we have a, we all have time limitation. That's our big constraint, right? And in the past we, we used to be like a library supporting library for forecasting or any application. Usually. At least that's what I like to do, is get me, you have MVP what give you the minimum requirements that you can start to build something.

Then you iterate over time, you add more features, and this is where you play with your time. You want to allocate to for development and to deliver, and you need to have the balance between the two 'cause your expectation from, normal working place that they want you to deliver and not always aware about the time required for development.

And so you are put most of your effort on the critical features now. If there is a feature that I always want to have, but I didn't prioritize it, I just go and go ask and in five minutes and you know it's cost a few dollars or tokens and you have a feature and that's what is amazing. Like you can really go faster on the amount of stuff that I enjoy.

Building now is all the list of features that I always have on my back truck. That I just didn't have time. I can now build them and really use them. It's amazing. 

Richie Cotton: Yeah. I think that seems to be the case of anyone working in data is you're never short of things to do. It's things are being automated, but actually I think there's just been a lack of.

Capability to get through all the things you could possibly do in a data team, like by several orders of magnitude. And I will now get into the point where, oh yeah, if I automate most of my job, then finally I'm be able to catch up and we can do all the things with data we wanted to do across the organization.

So maybe like it's finally we're reaching like. Golden at age point. I love it. Okay. Just to finish, I always want more people to learn from. Tell me whose work are you most interested in right now? 

Rami Krispin: So there is like the, the, I think the one that I really like the work, or this is where I learn all, a lot of the stuff that I learn is a, professor Rob Mann from Manish University.

That is, I think all of us that started the now. Started with his forecast library and then he has this book about the forecasting and practice principle and Practice. I think that's the full name. And now there is a Python version. So that's something I really recommend for people that start to go and check is a blog and his book I like really the work that Nsla are doing in the domain of time series.

They. Did a great job on building this and modules that are if you are coming from R to Python, they give you like the stuff, the good stuff that you had in r and I think part of it, they work with the professor or Winman on bringing all the statistical models and they have four blocks or four modules.

One is like core statistics. Models, and all winter. And then there is the machine learning, there is the neural network models, and the fourth one is the foundation models. So they have a really good library and functionality that I like. There is another one forecast SK forecast is two folks that are working on it.

And this is like a, I think they're doing it as a. They have a full-time job. I believe they're working in IKEA or a similar company, but in their free time, they're developing this library and it's a great project and I really enjoy to see the work they're doing. This is where you see that some of the staff that I see that is okay.

They probably, this is people that are working in the industry. Because that the features are building, it's, you can understand that you need them only if you are working with real life problems. And so they, this is also, they have a great documentation and I highly recommend. So I think those are the one that I usually like to work with.

There are other many others that I'm sorry, I'm probably missing the, I just cannot use all the all the one the one that are available. But there is a great community bought in our Python for Time series beyond just next and SK forecast all the fee bill in our, so yeah, I think they're a good place to start with.

Mainly because they have a great documentation and easy to get started. There are other companies or other libraries like doubt. Also providing similar functionality and are also good option. Python. 

Richie Cotton: Okay. Yeah, so many options. Such a big community because I guess every organization needs to have this capacity in some form.

Actually, so Rob Hinman, he was a very early, camp instructor. So he taught a time series course like way back in the day. Every now and then, I think I should get him on the podcast, but he's based in Australia. I'm like, time zone's very tricky. But maybe one day we'll get Rob on the show 

Rami Krispin: back to, sorry to cutting you back on Rob Ironman.

I just when I created my course on DataCamp and I went to see, the portal. So it was really a closure because it showed me that I took years ago. When I started time, I took years ago the rope payment courses on DataCamp. That's how I started. 

Richie Cotton: Oh, that's amazing. And it's come full circle now.

You've made your own course. Yeah. You're the new Rob. 

Rami Krispin: Yeah, so that was really CRO really that's a lot of stuff that I learned is from the courses to email on DataCamp, his books, his blogs. Yeah. 

Richie Cotton: Oh, wow. That's a fantastic story. Yeah. Congratulations you've become your heroes.

Wonderful. Thank you so much for your time, Rami. 

Rami Krispin: Thank you so much for having me here.

Themen
Verwandt

Blog

AI Time Series Forecasting: A Beginners' Guide

Learn how AI is transforming time series forecasting through adaptability, uncovering hidden patterns, and handling multiple variables.
Stanislav Karzhev's photo

Stanislav Karzhev

8 Min.

Podcast

Data Trends & Predictions 2024 with DataCamp's CEO & COO, Jo Cornelissen & Martijn Theuwissen

Richie, Jo and Martijn discuss generative AI's mainstream impact in 2023, trends in AI and software development, how the programming languages for data are evolving, new roles in data & AI, and their predictions for 2024.

Podcast

[AI and the Modern Data Stack] Adding AI to the Data Warehouse with Sridhar Ramaswamy, CEO at Snowflake

Richie and Sridhar explore Snowflake and its uses, how generative AI is changing the attitudes of leaders towards data, the challenges of enterprise search, management and the role of semantic layers in the effective use of AI, a look into Snowflakes products including Snowpilot and Cortex, advice for organizations looking to improve their data management, and much more.

Podcast

Reviewing Our Data Trends & Predictions of 2025 with DataCamp's CEO & COO, Jonathan Cornelissen & Martijn Theuwissen

Richie, Jonathan, and Martijn review the real-world adoption of genAI, the shift from hype to production, why AI hype continues to thrive—plus what they got right and wrong for their 2025 predictions, and what comes next.

Tutorial

Time Series Forecasting With TimeGPT

Learn how to fine-tune TimeGPT, the first foundational model for time series datasets, for forecasting and anomaly detection with just a few lines of code.
Abid Ali Awan's photo

Abid Ali Awan

Tutorial

Time Series Analysis using R: Tutorial

Learn Time Series Analysis with R along with using a package in R for forecasting to fit the real-time series to match the optimal model.

Salin Kc

Mehr anzeigenMehr anzeigen