Data Science in Finance
The financial world has been irrecoverably changed by the advent of data science. Find out how.
Dr. Yves J. Hilpisch is founder and managing partner of The Python Quants, a group focusing on the use of open source technologies for financial data science, artificial intelligence, algorithmic trading and computational finance. He is author of the Python for Finance (O'Reilly, 2nd edition, 2018), Listed Volatility and Variance Derivatives (Wiley, 2017), Derivatives Analytics with Python (Wiley, 2015) and Python for Finance (O'Reilly, 2014). Yves lectures on computational finance at the CQF Program and on algorithmic trading at the EPAT Program. He is also the director of the first online training program leading to a University Certificate in Python for Algorithmic Trading. Yves has written the financial analytics library DX Analytics, conferences and bootcamps about Python for quantitative finance in Frankfurt, Berlin, Paris, London and New York. He has given keynote speeches at technology conferences in the United States, Europe and Asia.
Hugo is a data scientist, educator, writer and podcaster at DataCamp. His main interests are promoting data & AI literacy, helping to spread data skills through organizations and society and doing amateur stand up comedy in NYC.
Hugo: Hi there Yves and welcome to DataFramed.
Yves: Hi there, and thanks for having me.
Hugo: It's a real pleasure to have you on the show, and I'm really excited to have you here to talk about your work in finance, how you think about the use of Python in finance, and the implications of all of this with respect to data science in general. But before we get into that I just want to get a bit of context and learn a bit about you. So, maybe you can start by telling us what you're most known for in the data community.
Yves: Yeah, I think this is an easy one. So, I'm known for Python for finance. I started using Python more than ten years ago for finance. We started with computational finance and people said, "Well, you can't do that. It's too slow," and all these kind of prejudices and reasons why you can't do something, and today the biggest institutions around the world use Python for exactly that. In particular for algo trading, so it has moved a little bit from the comp finance side to the algo trading side, and I think people now know me for this as well. So whenever somebody thinks of coding up their strategy and trying to deploy it automatically for trading, when they use Python, and many people book our online courses or listen to talks or read the books.
How did you get to where you are now?
Hugo: Fantastic. And how did you get there? How did you get into Python, finance, and now data science? What type of journey led you to where you are now?
Yves: Well, afte... See more
Yves: But in the meantime, I started working as a management consultant because I thought well, my finance is nice and challenging, but if you later want to work in any kind of institute, any kind of organization, corporation or what not, I've seen a little bit more. It might be good. So I started as general management consultant working for different companies and different types of projects, but again the focus was on finance. I didn't leave the financial industry. That much, yeah, after that I finished my PHD. I moved for the first time within Germany to Hamburg. Back then it was around the hype of the internet startups. The web was then the hot thing and so with the company I joined we did some consulting around web topics and so forth, but after the birth of the bubble, there was no work anymore. So I needed to look for something else. Something else what we did, we started our own company.
Yves: I founded my first company in 2001. Actually pretty nice because last Friday I was at the company for the summerfest so I left the company due to personal reasons, moved back to my hometown in this island area due to the family, but we're still on good speaking terms and the company is going pretty well, I must say.
Hugo: That must have been an interesting experience 17 years later being back there. Right?
Yves: Yeah, actually the company is more than 70 consultants these days has been taken over by a bigger large company. So I hardly knew all the people around there, but of course the founder isn't gonna, the other ones this was pretty nice for me. Had a nice evening there. When I moved back, this was actually the starting point where I wished for the first time Python for finance because I discovered it before moving back and I founded my own company which I still own which is now the Python quants and immediately I got started with some side projects which I couldn't pursue in the other context with other co-founders that didn't want to do something like that. So now I've got the freedom to pursue my own stuff and this was, among others, this was Python and we started with them where even numpy wasn't around so we used numarray and numeric and all this.
Hugo: Well that was going to be one of my questions. Like this is before numpy, this is before pandas, this is before so many of the technologies that people equate with the data science Python stack now, so it must have been a wild landscape.
Yves: Exactly, exactly. So I've done things that are now kind of like standard and that you might teach. Had I known the third hour of training or what with pandas, we needed to code up on our own, like time series analysis or wanted to do something with computation of finance and all these things, but I'm more than happy that for example, Wes McKinney started the pandas project and have re-bought the project having grown that much and providing us with all the nice capabilities that today use in finance. But for sure, it was really a different landscape that many people can't imagine how it'd look like then but I was convinced due to the beauty of the language and the whole approach that this might be the future. And now well, this time I think I was proved right given the success of Python in our industry and yeah, today we don't do anything else, it's all about Python in finance and algo trade.
Hugo: Yeah, and particularly with the emergence of Python as well and Jupyter notebooks, which back in 2001 these weren't around either, so that's very exciting that all these developments have converged in this landscape that'll allow us to do what we do.
Yves: Yeah, it's fantastic, yes. I think from a scientific point of view, from a developer's point of view, data science point of view, or financial data science if you want to put it that way. Yeah, for sure.
Hugo: So what happened then, in your trajectory?
Yves: When I was talking about Python being a side project, indeed it was a side project, so we couldn't make any money out of that. But we did regular other consulting work that I was doing before, and I think it was maybe like six years ago probably when we started getting real traction. Maybe seven years, I think it was 2011 when we got the first big client in Germany, the derivatives exchange EUREX approached us, they actually, one of the executives had seen a talk of myself at euro Python in Florence and this is how things work and said, "Well, do you training in this regard? And can you support us?" And I said, "Well, for sure, this is what we were waiting for!"
Yves: So this was more or less the formal starting point and yeah, it's been growing after working with many banks, big hedge funds around the world and having hundreds of training students, thousands of people on our online platform that use Python, and yeah, so today we don't do anything else. People look up Python finance or whatever, I think on Google or wherever they usually find us, and get to our trainings, to my talks, I've given I don't know how many talks. Well more than a hundred over the last five years and conferences around the world.
Where does your focus currently lie?
Hugo: For sure, so you write books, you consult, you develop training, you host meetups, are there other- I mean, not that these aren't enough, but these are the main things, that's where your focus lies currently?
Yves: Yeah actually, so I basically see us these days as a more or less content oriented company. So this is what I think our core is, indeed. So writing books, of course, is about content creation but also designing and delivering online training. Also I put in the same category events, where of course you also need some content. We used to do also more conference, but these days you focus for example with our partner Fitch Learning on the bootcamp side, so this is also more about training and probably to deliver the content.
Yves: The content, of course, is at the core, and also I think skills and knowhow which is then used, in addition to the content, where we consult clients around the world, like we're working with a big broker in New York, Manhattan, We're working with a hedge fund in London which of course requires certain skill set, knowhow, but this is more or less where I'm coming from. The content side has become more and more important over these years, in particular with our online training which is growing tremendously, which I'm really happy about. And you mentioned the meet ups, running a few meet ups actually in London, Berlin, Frankfurt, Paris also, one in New York, and whenever I'm there I try to do there something as well, so it's keeping me busy.
What is a quant?
Hugo: Definitely sounds like it. So you're known as the Python Quant, you work with a team of people and you call yourselves the Python Quants, and I think myself and our listeners know what Python is but I don't know whether everyone has a lot of context around what a quant is, I definitely don't feel I have enough. So maybe you can kind of unpack this term for us: quant.
Yves: Yeah, now if we want to get a little bit academic you would say, "Well, we have different types of quants." But I think these days it has narrowed down a little bit. So back in the days, when I started with the quant side of things in finance, what people typically understood as a quant was kind of the model quant. Somebody who is sitting down and comes up with a specific model, for example, to price a specific type of financial derivative. So a complex financial instrument that might depend on a couple of risk factors, like interest rates, or let's say a stock index, or maybe a basket of stocks, for example. So people sitting down and really doing research on the blackboard or these days probably a white board or pen and paper, back in the days. A little bit maybe on the computer but more or less to document, but these were really kind of what mathematicians were.
Hugo: Yeah, and physicists as well, right? A lot of physicists started this.
Yves: This is what people called back then the rocket scientists. Because of the fact that many physicists came to, well it's because of the mathematics that you need to price financial derivatives is pretty similar to what physicists use on a daily basis in many of their areas. So engineers and so forth. So this was the origin so to say of the quants, the rocket scientists, and so forth. But these days I think it's much more data driven.
Yves: So I would say these days a quant is more like a financial data scientist because they need to crunch huge numbers of data and I think it might differ to some extent to the area you have a look at, for example if there is, let's say, an equity research analyst who crunches numbers of certain companies, let's say Apple. You typically are not faced with that amount of data. But if you have others, it would kind of like more systematic, for example for systematic trading strategies, they might call themselves even data engineers.
Yves: So one of the biggest hedge funds out there and most successful ones, Two Sigma, headquartered in New York, they called these guys the data engineers. And because they are probably just more or less independent of what the financial professors, the people coming up with financial theories, say how the market should behave. They have kind of like a pretty neutral approach in saying, "Well, let's apply whatever technique to the data that we can get our hands on and see how we can profit from that." So this is what I call, in all my talks these days, data driven finance instead of kind of this equation driven finance. Where people sit down and think of how the financial world should behave, they'd rather have a look at the data and try to figure out something, maybe not coming up with this kind of fantastic nice single equation which might award you another prize in economics some day in the future, no, but with things that might work.
Yves: A similar project every big data company out there, like all the social based companies like Twitter or explaining how they come with their recommendations is simply have a look at the data for the recommender engines, use machine learning approaches and recommend a song number three when people usually hear three in combination with one and two, and this is what these data engineers, these quants, do these days as well. So having a data driven mindset and say, "Well, let's have a look at the data and apply whatever technique might bring us something in this regard." So people working with quantitative things, usually numerical data and more and more unstructured text based data, this is what the quants do these days. So we might have still a few handful of model quants who I started with, but it's just a handful compared to thousand others that work with financial data in this area.
Hugo: Okay, and so to reframe that slightly, one of the ideas there is that in the large data limit, theory laden models may be unnecessary, essentially.
Yves: Essentially yes because when you have a look at the history, people have been pretty successful with coming up with nice models when they've made, let's say, appropriate assumptions in the form typically of normally distributed returns and linear relationships. But this is already where all these theories are doomed to fail, because we don't normally have distributed returns in markets. Across all market classes you can analyze the returns in the markets, they are not normally distributed, in general the relationships that you face are nonlinear and therefore having a different look at the nonlinear changing world, with different algorithms might prove more fruitful than to rely on the old theories from the 50's, 60's, 70's, 80's that are still in use today.
What subdisciplines of finance will data science impact?
Hugo: Yeah, absolutely. So in general, what are the major subdisciplines of finance that you think data science is and can have a large impact in?
Yves: So I must say, I'm in general only on the investment and corporate banking side, so to say. I hardly have any point of contact with the retail side. So most people, everybody who listens to what we're talking about, do of course their financial stuff maybe on a daily basis with their apps on their phones or maybe online banking on the web. But this is not part I'm involved in typically. There, I think there are tremendous success stories of kind of like data science. For example in credit lending, which is mostly automated these days using machine learning algorithms, like scoring there and so forth. But again, this is not my field of expertise.
Yves: So what I'm mostly concerned with is the corporate investment banking side and there in particular, to derivatives and the trading side. And in general, this is how I think of this little world. Smaller than the world of others, but it's financial data science, so whenever you need to crunch the petabytes of the internet that are available these days, it is to financial data science and we might find we discussed this with regards to quants, ho want five people in different areas of bank, for example, hedge fund the world. Other buy side company, like an asset management company that are concerned with financial data science, on a smaller in general these days on a larger scale. So crunching any type of financial data, market data, unstructured data like news data and so forth. Then we have the trading side of things where people try to come up with algorithms and there are different types of algorithms. And more and more we see people trying to apply AI based algorithms, contrary, let's say for example, to some deterministic algorithms that you need to execute larger trades, they try to come up with some machine learning AI based algorithms to predict markets and if they are well enough in predicting markets that might benefit also.
Yves: So we recently saw it's such an endeavor for ourselves as well. Then of course, computational finance, which includes areas such as derivatives and options pricing. This also includes risk management. For example, risk management is still a big topic. So when you have a big investment bank that is sitting on, let's say a million plus derivatives positions overnight, and one of the major tasks overnight is to come up with some risk numbers for the complete portfolio that they face. So maybe people that are filled about value at risk or credit value at risk and it's really computationally demanding and such jobs are running on huge clusters with thousands of computers overnight to crunch the numbers in an appropriate way for the bank to get a, let's say, a decent view on the risk position. So computational finance usually is kind of the most demanding in this field.
Yves: And this is I mentioned before we got started on a small scale, but these days for example, the biggest banks in the world, like Bank of America, Merrill Lynch for their trading and risk management platform, they mainly use Python for example, as the implementation language, although the hardcore calculations are still done in C++. Not so much since Python is too slow, but they started developing the pricing libraries like decades ago. And back then there was no way around C++ in this computationally demanding area. So financial data science, Algo trading, competition of finance are at least our areas where we focus on and apply data science techniques in the financial field.
Financial Data Science vs Computational Finance
Hugo: Fantastic. Could you just slightly unpack the difference between financial data science and computational finance a bit more?
Yves: Yeah. Typically what to do in data science is that you have a look at the data that is there, meaning historical data, be it on a simple level, end of day data of what apple stock over 10 years, then you have probably some 252 data points per year. After 10 years you have 2,500 data points. So this is not really challenging these days as we know, but this is basically where every financial theories based on. But rather when you have a look at the apple tick data which is submitted and provided by NASA or data providers such as Bloomberg and Thomson Reuters with which we work, Then you might get some 2000 points per quarter of an hour or 8,000 per hour So this is then where we get to bigger data and people need some different techniques and let's say, an Excel spreadsheet for example, to work with such data. Yves: But no matter what, it's typically historical data and you might try to come up with some predictions, some forward looking numbers or whatever based on the historical data. Computational finance, it's more or less with regard to the areas that I've been describing, like risk management over time is based typically on Monte Carlo simulation, which is by definition a forward looking technique. So while I might have a look backwards, 10 years in Apple's stock in computation of finance when I want to price the derivative, I have a look forward, let's say over three months or 12 months or two years, three years, and try to simulate the markets and model correlations between different risk factors to come up with a somehow good understanding of what the future might look like in terms of market prices and other relevant quantities. So my thinking is that sends that data science, we look at the existing data and tries to come up with certain points to predict, but computational finance in and of itself has a forward looking element that is dominant and they're trying to better understand what the future might look like, not coming up with a single. Let's say forecast for the apple's stock price in 12 months? No. Rather with the distribution of possible apple stock prices in 12 months based on 100,000 or 500,000 simulations of the apple stock price.
Derivatives and Options
Hugo: Fantastic. Now there are two terms that you've used that I just wanted you to explain briefly for me and the listeners, derivatives and options because none of us necessarily know much about finance.
Yves: Yeah, sure, sure, sure. These are ... Actually they are involved and typically as, let's say general investor or when you say you want to save for retirement, you typically don't get in touch with these instruments, but they are used in many different areas. They are first used to do some risk management. You can use derivatives in general, options is a subclass of derivatives to do risk management. You can also use to speculate and so forth, but basically what they typically are is, this where the name comes from.
Yves: Their price is derived from another financial instrument, so for example, the apple stock is traded on Nasdaq and you can buy it and price might go up or down. This is a straightforward thing, but there are options traded on things like apple stocks or on the S&P 500 or on other instruments to this end, that derive their price directly from what the underlying. This is how it's called and in this case the apple stock is doing and the option, for example, a call option for example, would represent the right to buy the apple stock at some certain point in the future at a predetermined price.
Yves: So, options in that sense typically represent some rights which you can exercise, but typically you are not required to exercise them. So this is what the optionality comes from. The other derivatives like futures that are unconditional. So you buy them the price is also derived from something else, but this is more or less live or die. Once in your in, you can only sell this thing, but with the option you have the right to buy something at a predetermined price at the future of predetermined date or over certain period of time or to sell it that we speak of a put option, put options to sell, call options to call. And the pricing of these instruments might get really tricky and involved and they need advanced financial mathematics in order to come up with a proper price. The pricing of Apple Stock is actually pretty simple for a single trader.
Yves: You simply open your browser and you look at the price and it's there so, but to price derivatives, it's not straightforward. So this is where many people and many books have been written regarding this topic.
Hugo: For sure. Great. That makes perfect sense. Now the other thing you mentioned, a couple of times when talking about the major subdisciplines of finance that data science is having an impact in, you talked about machine learning and artificial intelligence, so I was wondering what you see the role of these two coupled technologies and ways of thinking about modeling the world. What impact they're actually having or whether we have a healthy skepticism as well concerning things that are buzz terms as well as things that provide a lot of value. So how does this apply in finance?
Yves: Well actually I'm getting to the trading side of things. Just today a book arrived which is called Pattern Trading. So I discovered this in a magazine on the weekend. I said, "Well let's have a look at this book. Looks pretty nice." and already the name suggests: Pattern Trading means trading based on some patterns or price formations that you see with regard to financial instrument. So for example, the apple stock or this can be, let's say the gold prize or it can be the Euro, US Dollar exchange rate. The theory goes that when you see certain patterns in the prices, this might in one case signal further upwards movement or another case downwards movement or that the market is most probably to move sidewards. But again, it's all based on patterns and I think most of the listeners are completely aware of the fact that machine learning techniques are pretty good in learning about, let's say the value of patterns.
Yves: First of all, recognizing patterns and second of all, coming up with predictions based on patterns. Make sample from before Spotify. When people listen to a song, ABCDEFG than the fantastic new song might be something for you, because you have also been listening to Song ABCDEFG. And this is the same with patterns. If you see patterns in markets then you might say, "Well, the scale was, let's say up, down, up, down." Then the machine might learn that with a high likelihood it's more probable that the market goes down afterwards or up afterwards. So this is what the machines of course should do, they should learn what is happening. When I give talks, I typically have a couple of pictures showing patterns and people starting to nod and say "Yeah, I know this one, I know this one, I've been trading on that one." But my argument is, and you have been asking me about how machine learning, deep learning, all these tactics might influence markets.
Yves: I'm saying usually, I'm not saying that there is nothing in these patterns, nor that there is something in these patterns. What I'm saying to people is that, if there is something in these patterns, then for sure machines are better at recognizing these patterns and learning these patterns and then at executing trades based on these patterns. Because I mean, we all know maybe a human being might learn over the course of his trading lifetime, I can know 20, 40 patterns maybe, machine doesn't have any issue in learning patterns which are pretty complex. Let's say 100 based on 100 features for example, and maybe a hundred 50,000 relevant patterns that it immediately recognizes. And of course, when we are about trading, it's about seizing opportunities. The fact that you can trade on what you see, the better it usually is, and the less emotionally you are, the better it usually is.
Yves: And this is, I think, what the advantages of the machines are compared to human beings. I think we are not yet there that in every single area the machines and the algorithms will replace the traders. But there are good examples, like, Goldman Sachs. Always read this quote I'm using and also from The Economist for example, that in 2000 Goldman Sachs had 600 equity traders on the single trading floor and of the 600 there are just two left, just two people and the rest is kind of replaced by technology. And of course technology, it doesn't build itself. So let's say the human resources have been replaced from the trading skill to the technological skill.
Yves: I mean, you need people of course, who built the systems and so forth. So when people say, "Well, this job is about to be replaced by machines." But there must be people who built the machines and who built to write the software, that really replaced the people. And this is what we see in the financial markets, I think in spectacular fashion that they are looking for more and more technologists, data scientists, programmers that are able to build the machines that you acquire these days in order to be successful in markets, but other stuff that has been done manually in the recent past is not en vogue anymore. And even high paying jobs are suffering in this regard as the equity traders that I mentioned here in the Goldman Sachs example.
How is data science disrupting finance?
Hugo: For sure, and actually this reminds me, you've got a great two page piece on your website which we'll link to, which you've written called Computational Finance: Why Python is taking over. And you actually, you quote Robin Wigglesworth from the Financial Times there. And what Robin wrote is, traders used to be first class citizens or the financial world, which is exactly what you were just saying about them. So many thousands being on the floor, but that's not true anymore. Robin continues. He writes. "Technologists are the priority now" And this was three years ago, so I went on what you've seen now in terms of technologists being the priority and I presume by technologists you actually include working data scientists in that. So I'm wondering how data science and technology is continuing to disrupt finance.
Yves: Yeah, I mean this is exciting. This describes it and it's already quite a while ago when this statement was made actually and what we see now, I mean this is usually when you have something new then people try to rush in one particular direction. But I think now we're getting back to a point we'll say, ""Well we might need the market savvy people still." You know, when you hire somebody who is pretty good at programming but has no experience in markets, what would you expect from these people to program into the trading applications in terms of risk and their safety measures and so forth. A little bit of understanding is simply required in that sense, and what we see and think this will be kind of the near future at least is that, these days they try to merge the worlds in the sense of that people doing, let's say simple or rather from a data point of view, simple equity research.
Yves: They start using our fantastic technologies like pandas for data crunching, a visualization with, let's say, I didn't know with plotly and all these things that we use on a daily basis to become better at their jobs and maybe at the same time accomplish more or being able to crunch ever increasing amount of data and also the traders, I mean, history was kind of like in a way that there was one trader, maybe two people on the left and the right hand side of traders. They were in real time programming the excel spreadsheet applications. If the trader had a new idea, so these days the trader and people were responsible. There are probably quiet come up with their own solutions that use different techniques than excel spreadsheet where some people are sitting on the left and the right and doing kind of like real time tweaks while the trader is trading.
Yves: So this is one example that I recently retweeted from. The Fortune World said, "Well, forget what's choosing the language Citigroup, once it's incoming investment bank analysts to knows Python." So even in a field like investment banking or let's say people working for consulting companies, they are these days required to have some programming knowledge. This hasn't been the case before which M&A banker used Python like 10 years ago or five. No body there, but these days people are instead of expected to know a little bit about Excel, they are just because in every field they are facing these huge amounts of data and people now know that, it's much more efficient to crunch these numbers and data by technology such as Python. They are expected to know a little bit about programming as well. And I wouldn't say that everybody these days should become a software developer, architect or engineer, but you know, kind of like with a little bit of training, you can accomplish quite a lot compared to the traditional approaches in this field. So more like emerging like hybrids. I think hybrid is kind of a trendy word anyway. So like the hybrid skill set that is required, market knowledge, your background, maybe more banking side, market side or which department do you belong to, but nevertheless to know coding afterwards. It's a little bit like math, you know, math never hurts. And I think to know about programming and data science doesn't have either these days. So English, math and Python. These are, I think the three most important language that anybody should master before getting a job or changing jobs.
Hugo: Exactly. And that actually reminds me that I've heard you say when asked why Python for finance so much, you've spoken to the fact that you can see a Python to be the English of the data and financial world. And I'm wondering why that's the case.
Yves: Sure. Many people say, "Well, you know, this language has this fantastic feature" or "Julia is faster" or this and that. I think most people, and just having talked about the investment bank analysts that are supposed to learn Python, how many languages should such a person in their expected capacity learn? I mean, it's hard to master a single language properly. So, and I'm coming more from a time constraint, resource constraint point of view where I say, "Well, if you only have time to learn a single programming language, then it should be Python." If you only have time to learn one foreign language, spoken language for almost everybody around the world, it is these days English. And this is where I see kind of the parallels and say, "well, not too many people easily learn three, four languages. Neither spoken language programming language."
Hugo: But why Python? What is it about Python?
Yves: Python, why Python? I mean, of course it's the proper one. When I started back in the days, this was for me, the first proper contact, I must say with a scripting language, which on the high level allowed. Yeah, fast into activity. Even back then even you without IPython you could do on it, Michelle, amazing things and so forth. So I wasn't used to that. So when I grew up I started actually coding Assembly & basic on a commodore C64, you know, this is where I came from.
Yves: Then I did C at university compile cycles and so forth, and then for a couple of this fantastic scripting language, and this was on top of the interactivity, it was so close to mathematical language. So when you have the financial theory, an equation or whatever, you have few equations there, it's pretty straightforward and you without that much of a train, you can translate what the math, the finance says to Python, this is what got me hooked there.
Yves: I think this is not the major argument these days anymore. From my point of view, Python is kind of the orchestrating language for all the technologies that you need there. I think is the best language to use for data science as first class citizen in the world. When people these days of tensor flow, they use Python as their interface. We have the fantastic scikit-learn package and many, many others. I can't even get to some what's comprehensive list in this regard. And I think this is what actually differentiates Python compared to all the other competitors. Like let's start with the established ones in our field, C++, Java or with C# or with Julia, which is a typical competitor and not in terms of numbers but in terms of being pretty close with regard to the syntax and their approach and so forth, but the ecosystem is missing.
Yves: And I think the ecosystem is what makes Python unique. And today, everybody who wants to enter the financial field, I think on top of all my arguments which might be subjective, is simply almost every institution has chosen Python as the core language. So if you have some kind of career at the time of year in our field, Python is simply a good thing to have on your CV because you might prefer some other more exotic, maybe faster or whatever language, but if your potential employer doesn't use this language, probably you won't get too many plus points on your evaluation afterwards. So this is more of the career aspect that if you know Python, you can work in many, many fields and many companies these days.
Hugo: Exactly. Having Python on your CV and resume is incredibly important, but also now having Python in your github repository is also a huge step in the interview process as well. Right? When when applying for jobs in finance.
Yves: Sure, this one of the fantastic aspects of living in open source age that you really can showcase what you've been working on. Of course, most people having kind of a professional job, they are usually not allowed into talk about what they're doing. The financial industry is really secretive in this regards and this is why the industry loses many fantastic people we might say to the more open companies, but due to regulation, legal and all these things, people are hardly ever allowed to really talk about what they are doing. But of course you can do stuff on the side and you can build your programming or data science CV easily on platforms such as GitHub. On the other hand, of course these days, these are the side effects.
Yves: There are some companies specialized crawling pages like Github or platforms like GitHub and looking for people, so replacing maybe the search on Linkedin or via other means for talent in this regard? So every once in awhile I get approach there as well. If I would be interested in changing jobs, maybe they should improve their research when they would have seen that I'm running my own company, maybe they wouldn't even ask. But for other people, this might represent an opportunity to present themselves to showcase what they can do and also to learn to get feedback from the community. If they're working on something of interest to others, they might even win over contributors or get feedback and that's maybe some fame even in the community.
Hugo: For sure. And something we've really been talking around in the emergence of Python as such an important tool in finance and data science, in the use of GitHub and Jupyter notebooks, we've actually seen a huge shift in the past several decades from the use across the board, not only in finance but academic research and all types of quantitative, computational disciplines, a move from proprietary tools to open source tools, right?
Yves: Yeah, sure. I mean this is sorts of one of the benefits if you have a huge community behind it, I think hardly any commercial company can keep up because such community to an effort. If you have millions of users of Jupyter notebooks and not only for Python, of course we have multiple other kernels that can use those or Julia kernels and use the same environment. This is fantastic and the project has been growing tremendously. I'm really happy about it. Also for us providing content is that we are more or less a content creation company as a fantastic means of grading content and sharing it. For example, I'm currently writing on a second addition of my Python for finance book. Basically all Jupyter notebook based, all the codes that Jupyter notebook and I have one of my books was actually written 100 percent in Jupyter notebooks. And then I programmed a little workflow behind it, translated it into LateX and finally Wiley published what ... I got all started down in Jupyter notebooks.
Yves: So many, many things that you can do also with regard to what I'm a big fan of: reproducible research. So when I was growing up, doing my research, you just had back then mostly printed research papers and people presented just the result, but you never could somehow get to the point where I said, "Well, what data have they been working with. How did they crunch the data? Are these results that they presented reliable and so forth." And so they with access to many open data sources and providing, for example, Jupyter notebooks, not only with text but with the code that crunches the data and presents the results, it's fantastic when you're after reproducible research again, which I'm a big fan of.
What do you need to get started in Financial Data Science?
Hugo: Exactly. So for someone who wanted to get started in Financial Data Science the subject we've been talking about, what are the main types of skills and technologies that they would need to know to work at the intersection of finance and data science?
Yves: Python for sure. I mean, if you believe in kind of like polls and overviews and surveys and so forth, people typically says that the most common combination is Python, R and SQL. So you need somehow some databases, I think we have multiple other options these days, like HDF5, we don't need SQL but Python for sure. Maybe some R here there if you don't find, say a statistical package in the Python Universe, but this has hardly ever been the case for what we have been doing for ourselves and also for clients. Then of course, machine learning AI in general to know about the basics of statistics, to know about the algorithms, what unsupervised learning and supervised learning all the techniques there, logistic regression, Gaussian, naive Bayes, whatever jumps to mind. Pretty sure you don't need to be an expert in every single area of this field.
Yves: So knowing all the theories, more like from an applied perspective to know what is there and to know what to apply when given certain datasets. Then actually the points which I usually summarize on the basic, so we have, for example, in or certificate program which is the largest, longest running online training program that we have, a complete area tools and skills, where we teach basic tools and skills that people from our point of view to know at least a little bit about like using editors, processes, setting up environments, deploying stuff in the cloud, working a little bit with docker containers and of course along the topics that I've mentioned right now, they probably require complete study over the years, but to know the basics typically help you pretty much like the basics of Linux, a few command line tools, some dev tools, Python packaging, publishing, testing, all of these topics, software engineering and basics in general. Not being an expert maybe after that, but I think first 10% happier already with the 70% your problems.
Yves: Then about data storage, working with data is important of course, particularly in the financial field, thinking of trading, if you implement backtesting programs and you'd be able to fast work with huge amounts of data, to crunch the data, to store it correctly, to store your results and all these things, but also for example, to work with streaming data, which is actually pretty important in our case. Not In every area. You have the need to process data return, but in finance generally, this is the case that, for example, in trading when you do algo trading, you need to be able to digest tick data, streaming data in real time to crunch it, to sample it, to come up with the signals in real time based on your trained models and to act in real time.
Yves: So this is from experience something that people are at least having a hard time getting started with, but it's simply required because otherwise how do you want to automate things with regard to algo trading, if you don't know how to work with sockets and streaming and maybe also even streaming visualization for example, to implement your trading strategies to keep with this example, but I think these are kind of like from my point of view, the skills and technologies that people should know about it.
Hugo: Great. That's very useful. I think it isn't as though you just need to learn all of these things straight away, right? I mean essentially you can learn them on the go in a project based way.
Yves: Yeah, sure. This is what we usually tell people as well that we say, well, we provide you with a basic overview, maybe skip some details in the beginning. So with our program we have let's say a 12 week structured schedule, so to say where we try to cover the bases, but afterwards it's more practical things, practical modules and people are supposed to, for example, to do a final project and there they can select a topic, but they are then expected to apply it to different things and this is from experience where I really learn about this stuff. Maybe you know about it after the formal education part, but you'll learn about it once he gets started applying it.
Yves: This is for people writing me back and say, "Well I couldn't imagine that one day, we're sitting here spinning up cloud instances and deploying Python code in the cloud and doing remote monitoring based on socket streaming and so forth. Now it feels like second nature." But I'm pretty sure when these people for the first time saw what this is all about is, "Oh, I will never master that." So this is kind of a natural reaction, but of course applying it to something that interests you when you have a purpose. This is when you get started learning about stuff and learning to master it.
Hugo: Exactly. you've answered this, but I'm an expert and really was, Where would you suggest beginners who want to work in Financial Data Science look?
Yves: I can hardly say anything else because it's at the core of our business that we provide online trainings, also live trainings, in and the form of Bootcamps ]in London and New York usually. But if you carefully look at our pages, we have kind of broad offering which has been growing over the years and of course influence or we have been done over more than 10 years. I've been working for the biggest hedge funds for big banks and other financial institutions in this regard. We know what is kind of expected and this is on one of our subpages training.tpq.io and the certificate that I mentioned before, this is really our flagship offering where we have a 16 week program after which we hope that you're able to use Python for algo trading or for other financial things. But algo trading is the focus and we even are able to award a German university certificate because we are cooperating here with our local university. If you're doing a masters program within Europe, this is even good for the super five CTS funds European credit transfer system might be interesting, that only of course for current students or future students in this regard. But it's a formal certificate from university for Python and Algo trading or finance depending on you.
Hugo: Fantastic. We'll make sure to link to those in the show notes as well.
Yves: Sure. Great.
What does the data science in finance future look like?
Hugo: So in general, what does the future of data science in finance look like to you?
Yves: This from my point of view an easy one. They decide will become core discipline of finance. I said it before. We come from an area where the brains and math equations have been driving finance and this regard, they don't show them finance is what we replace it, so we might lose quite a bit of beauty and financial equations and modeling and so forth, but on the other hand we might get back what I like to call the scientific method in this context that we start with the data, we have a deep look at the data in any area, every area that we have in the financial industry and apply the new algorithms, and in that sense I think finance will be more driven by stuff that is developed outside of our industry than ever before. So meaning that of course these nice theories that are still around and are applied and some of them successful, others not that much, they usually came from finance professors or finance practitioners.
Yves: But now people start using stuff maybe developed within let's say, Google, where the major point was to have a good algorithm to play go or to build a self driving car so completely different background. But in that sense, the background can easily be changed to financial background with some adjustments and in this regard, I'm happy that we have since March this year, the first proper book about Financial Machine Learning because many algorithms can easily be transferred, but we have some specific to consider when we apply the algorithms with regard to the data, how we crunch the data, how we manage the data, what are kind of special things. When we look at the financial time series in contrast to let's say a physical time series of whatever kind, and in that sense it will become place not something special. It will become a core thing as you say, "Well, what do you think of technology in the banking industry?" Many people say banks are essentially technology companies, and I think the data-driven AI-first finance future is not too far away from my point of view.
Hugo: Awesome. So a question that popped into my head during that was, given that data science and data literacy are such important fundamental skills in finance now and will be more so, how long would it take for a working data scientist to get up and running to work in finance? Like someone who knows the scientific Python stack for example?
Yves: It depends a little bit on the area. I mean, the more quant oriented it is, the more sophisticated the financial stuff is that you are faced with, I think the longer it might take. There are some areas, again, on the retail side, which I'm not that much concerned with, where people might find it pretty easy because on the retail side, facing many consumers, this is similar to what people do on social media or the recommender engines that I mentioned a couple of times, I think there it's pretty straightforward to get up and running immediately.
Yves: But in other fields, depending on what you do, for example, getting back to the computational finance derivatives pricing, there are now a couple of research papers and last year I gave also one day elective, but the potential of applying machine learning techniques in the field of derivatives pricing and quantitative finance in general, so they of course you need to have the background for what has been there before and what kind of the basic rules are in order to apply this stuff and there I think it might take a little while to get up and running, but in the end it all boils down to how good, how well is your math background, what is your programming data science background.
Yves: English of course is something I would assume is there and with these three basic skills, then to learn the financial lingo and so forth. This is then usually the easy step, but the math in some areas, you simply need to have in order to get up and running with the stuff that you're supposed to do that.
Call to Action
Hugo: So my final question is, do you have a final call to action for all our listeners out there?
Yves: Yeah, sure. I mean, focusing again on the Python for finance side and it started some let's say three years ago when people reach out and say, "well, I'm interested in machine. I want to apply it and finance, this and that background, want to make a move." And today when people try and still today, I must say, when people try to get into the field and to apply data science and machine learning and try to profit, be it within a corporate context or as a retail algo trader, let's say, trading their own cash positions there, it's still hard to get up and running. I can only recommend you to experience and since we're doing it to look for a kind of an appropriate integrated training program. You have that many things for free all day. You have that many university based things that might be remotely related to what you're looking for. But many people told me and confirmed that they have wasted months and months trying to look for that stuff on their own. So we have done a living out of putting this all together into integrated training programs documentation, for example, with our program, it gets a 1200 page documentation only about Python for finance and Algo Trading, so it's quite a bit. And once you have found something like this, is definitely to have to do with what we do. Not everybody's interested in Algo trading but something like this where you'll say, "Well this is a good study where this provides her with the product overview, but also with the details that I need to get started.
Yves: Then you should focus on the basics and this is where many people in our program, I wouldn't say really complained, but when they say, "Well, I'm having a really hard time and is with math learning the basics." So for example, for the first time people may be just having a window spectrum, having used the regular tools there and so forth, setting up a droplet, moving around in a Linux environment and so forth, might see kind of a really difficult thing. But I mentioned it before, some people after a few weeks, now it feels like second nature, this means you can get to the point where you know how to use vim via SSH access to a cloud instance. You deploy your automated code and work with sockets and whatever. So master the basics and look at the nice tools that are there in the Linux world and master the processes there.
Yves: I think then once you get around the training, you have the basics ready, be it more on the finance side, more on the programming side. You should get started with implementing many little projects. So that interests you, then this is what we typically do as training samples that you might say, "Well I want to have a little app where I simply put in let's say a stock symbol and my flask app that I host on a digital ocean droplet, then shows me a craft with some simple moving average. So for somebody who knows what this is, this might be just takes less than an hour, if you know what to do, but if it gets started with these things, this might take like a week or two, but to have these projects where you have indeed at the end result that you might even showcase your colleagues, friends that you might put on guitar or whatever this is, I think, where the learning curve gets really steep. Towards the end, I'm pretty sure everybody who wants to enter this field and wants to learn about all these fantastic technologies and the challenges in the field. They might have something in their head like building their own algorithmic trading operation or coming up with their own derivatives pricer or coming up with a machine learning application for portfolio management for example. Then they should, after that, they should come up with such a project, scope it specify it and get started and then work over weeks or even months and build something huge. I think this is then where you collect on the way all the pieces that might have been missing before because you have been really focusing on a few things, but once it gets to the point where your master such a huge project, I think we can say, "Now I'm proficient and work in the industry, I'll get my major project up and running."
Hugo: Fantastic. So to recap, to select an appropriate integrated training program, then master the basics. Then get your hands dirty with as many little Python projects as possible and then do a larger, more challenging project.
Hugo: Fantastic. Yves, it has been such a pleasure having you on the show.
Yves: Thanks for having me. It's a pleasure as well.
What is Lazy Learning?
What is Machine Perception?
What is Symbolic AI?