Skip to main content
HomePodcastsArtificial Intelligence (AI)

Inside Meta's Biggest and Best Open-Source AI Model Yet with Thomas Scialom, Co-Creator of Llama3

Adel and Thomas explore Llama 405B, it’s new features and improved performance, the challenges in training LLMs, the future of LLMs and AI, open vs closed-sources models, current research and future trends and much more.
Jul 2024

Photo of Thomas Scialom
Guest
Thomas Scialom
LinkedIn
Twitter

Thomas Scialom is a Senior Staff Research Scientist (LLMs) at Meta AI, and is one of the co-creators of the Llama family of models. Prior to joining Meta, Thomas worked as a Teacher, Lecturer, Speaker and Quant Trading Researcher.


Photo of Adel Nehme
Host
Adel Nehme

Adel is a Data Science educator, speaker, and Evangelist at DataCamp where he has released various courses and live training on data analysis, machine learning, and data engineering. He is passionate about spreading data skills and data literacy throughout organizations and the intersection of technology and society. He has an MSc in Data Science and Business Analytics. In his free time, you can find him hanging out with his cat Louis.

Key Quotes

I think we are reaching a level of intelligence where this kind of agentic behavior will start to increase more and more. It's already the case. And for that, you need a model that is able to format a structural output to reason, to write a plan, to evaluate state by state, refine its own output.

I would say we have filled the gap between open source and closed source. Think about one year ago, we were releasing Llama 2. I think it was a few days ago, about one year ago. Before that, you had zero open source chat. We have come a long way. Now there's more open source models that are better and better. Hopefully with this one, we'll fill the gap even closer. Now the question is how to have open source being number one and not just number two. We're working hard on that. The trend is pretty good.

Key Takeaways

1

Focus on developing agentic capabilities in models to enhance their ability to use tools, self-refine, and perform complex tasks, which can significantly increase their utility and intelligence.

2

Incorporate synthetic data generated by the model itself during reinforcement learning with human feedback, as this can often be of higher quality than human-written data.

3

Invest in creating smaller, highly efficient models that can serve a large number of users effectively, ensuring they are trained on extensive tokens to maximize performance without compromising inference time.

Links From The Show

Transcript

Thomas Scialom (00:00):

We are reaching a level of intelligence where this kind of behavior will start to increase more and more. It's already the case and for that you need a model that is able to format a structural output to reason to write, a plan to evaluate step by step, refine its own output.

Richie (00:21):

Welcome to DataFramed. This is Richie introducing an episode on behalf of Adele. Since he's on vacation, Meta's Llama 3.1 family of large language models has just been released and it's pretty exciting. It's the biggest open source LLM so far and we are eagerly awaiting to see how well it performs in real world usage against current performance leaders like GPT-4o, Claude and Gemini. The developments in large language models over the last few years have been astounding and both Adel and myself have been keen to hear a developer's perspective just what do you need to do to create these high performance foundation AI models and looking to the future. There are some wildly differing opinions even amongst experts about whether the dramatic increases in performance can continue over the next few years or whether we'll hit a plateau. Again, I'm curious to hear the inside scoop from an AI developer.

(01:22):

Today's guest is Thomas Scialom, a Senior AI Research Scientist at Meta. His research is focused on generative AI, as you might imagine, and more specifically developing the LLAMA series of LLMs and its surrounding ecosystem of tools that is, he's helped create much of ... See more

the technology we're talking about today. Previously Thomas was a partner at TEL and AI research lab and he switched working in AI from being an investment banker, trading in exotic commodity derivatives to all normal that he's ideally positioned to tell us all about LAMA and the developments in generative ai. So I'm as excited as you are to hear his conversation with Adele

Adel (02:15):

Thomas, it's great to have you on the show.

Thomas Scialom (02:17):

Thanks for having me.

Adel (02:18):

You are a senior staff research scientist on LLMs at Meta's AI organization. You've been integral to the training of the llama models, arguably one of the most widely used and best open source LLMs today. Maybe to set the stage and start off our chat, how has it been like being at the center of such an important moment in AI today?

Thomas Scialom (02:39):

Right. Look, it's weird. Maybe when, A quick anecdote I can tell you is I'm based in Paris where it's very high but less than in San Francisco and when I went with my wife in San Francisco a few months ago, she was like, we met people in the street or Oh, you're a guy from lama, and she realized its something here. All my friends are, oh, you're working on ai, cool. Do you do that science?

Adel (03:12):

Yeah. And yeah, in San Francisco people were stopping you on the street. So yeah, I think I recently saw a video of some Altman with paparazzi's behind him. It's a pretty interesting moment in AI today. So you just unveiled LAMA 405 B, which is an incredible milestone in open source ai. Maybe you walk us through LAMA 405 B in a bit more depth, which I think is the biggest open source LLM out there. What makes it unique to other open source lms?

Thomas Scialom (03:40):

Well, the size of course, but beyond that and that just one scaling dimension is a scale we train it on again like 15 trillion tokens. With respect to the preview release we did at the time, we have done much more post-training at LHF, so the alignment is also much better and so the resulting performance follows, let's see the results on the instance and other benchmarks, but I'm hopeful that it'll be the best performing model out for open source. It has long context. It has, although we may not release it like MULTIMODEL and other capabilities. So it's really filling the gap with the frontier models

Adel (04:25):

And we're definitely going to deep dive into the training of LAMA 405 B and also all of these models maybe when we're speaking about training of these models, this is quite a technical challenge. When thinking about the scale of LAMA 405 B, before we talk about LAMA 405 B, just general training, these large language models, what are some of the biggest challenges in developing these models and how do you navigate these challenges?

Thomas Scialom (04:51):

The biggest challenge to me is how to balance exploration and exploitation. When you do some experiments, you cannot do that at the 40 or five beyond 15 trend occurs and do like 20 ablations. That just not possible. That's the kind of run even for smaller size to be honest, like 70 that you put a lot of resources of this one. And so you need to be confident with the parameters, with the data, the training mix, everything. And so how to choose that you need to do smaller scale experiments and to then infer the right parameters at the larger scale. You need to infer the right at mix. Maybe like a simple example, what is the impact of multiple och? So seeing, repeating the same data multiple times at the larger scale, there's more weight, the model memorize more and so you could have some new phenomenon that you didn't observe at a smaller scale and how to trade off is optimal large largescale run because we just couldn't grid search everything. The exploration you want to do at a smaller scale is something tricky and so how we even to balance the resources between exploration and exploitation is very changing.

Adel (06:12):

And maybe walk us through some of the best practices that you've guys have developed while having that balancing act between exploitation exploration. What would be your advice for the small handful of other teams looking to train large language models?

Thomas Scialom (06:24):

I think in engine the best advice is to follow first principle like garbage in, garbage out, which for instance for the data and data is probably one of the most important thing. So making sure that the data you collect is high quality, reviewing it manually with some processes with quality analysis, some classifiers, we use latitude to do that and connect the documents of training and making sure you validate your ideas and your intuition with a run at a smaller scale and you see the improvement. It's never guaranteed, but it's likely to lead to some gains at a larger scale and I will say at every stage of the pipeline being very, very rigorous and ous to get the best effort to have the best class pipeline.

Adel (07:14):

Okay, that's great. And you mentioned here the data and making sure that the data is high quality and doing qualitative analysis, I'd love to learn a bit more about that because massive amount of data is going into training these models. How do you make sure that this data is set up in a way that you're able to ingest it into these models and have confidence in the output? So I'd love to see what are the processes that you adopt to make sure that the data is high quality? Because that's a big aspect of making sure that the model is successful in the long run.

Thomas Scialom (07:43):

For create training in particular, then you just collect basically web tokens. Let's say you have famous data sets like Common Pro or, but the truth is that the web is garbage. It just, if you look at a random paragraph or document from these data sets, it's random noise and clearly there's something wrong to train your model on 50% quantum now it's a waste of compute. So the first basic thing to do is can you train a classifier using some heuristic or machine learning models to select what is right. Then can you do some topic classification? Maybe you can rebalance your data mixture so that it covers not like 0.5% some domains, but it's more like balance. You don't have all in the knowledge. Can you add additional sources? What is your pipeline to clean the data? For instance, we developed nger last year that performs a scientific OCR. Just putting a lot of effort in this model so that you end with a high quality text from A PDF and scientific text leads to improvement on reasoning for instance.

Adel (08:58):

Okay, that's really great. And this here we're talking about the pre-training aspect of it, but you mentioned as well there's a lot of work that happens on post-training and reinforcement learning from human feedback. So I'd love to learn as well about that aspect of training a model and what that looks like in practice

Thomas Scialom (09:14):

Overall, the general stage for people is you do supervised functioning so that you start from your model, which is just the next token prediction and you align it to follow instructions in the chat model style through a supervised functioning In general, it's like what human will have written as the expected answer from a chat model and then you do reinforcement learning with human feedback. An interesting thing to note here is we stated that in LAMA two generation from the model itself with true reinforcement learning feedback, synthetic data, a higher quality than human written outputs for a lot of these. So if you follow this principle on what should we initiate LAMA free model with supervis vening data, we could ask humans or take what we did for LAMA two and take the surprise VENING data we used, but this is actually worse than what LAMA two generated. So we use laude to generate the first bus, first round of supervised fan tuning data, so from the wrong one and we had multiple rounds of LHF, we use LAMA two to initiate with synthetic data and once you have that, you have LAMA free supervis on LAMA two outputs, frontend outputs, and then you start to France, you collect new preferences with this model, you train your sample with your LA free, you do real reinforcement learning and so on.

Adel (10:47):

That's fascinating. And essentially you reach a point where you have a self-sustaining loop of producing data because you're improving the model which creates supervised fine tuning data, which improves the model. At what point do you reach a limit in improvement in that phase?

Thomas Scialom (11:01):

We have so many open research question a HF, and this is one of them. We haven't seen plateau really yet, but we are also in a process where we try to nudge generators to always ask harder and harder prompts at the edge of the model capability in a way that well write a poem about language model is something that now the model does pretty well. So we will ask generators to ask less about that and more about something more complex but not too complex because again, the process of annotation here is that you ask a prompt, you sample two answers from your mother and you ask an editor which one he prefers. And so if the two answers are bad, you don't have any senior basically. So you need to always be at the age. One difference with the material can emphasize is the tissue mode we had, which is basically the two insert are bad. We can sometime not always enter in what we call the teacher mode where an notator can basically write a critic prompt saying, oh this answer if I prefer it to the other one, it's pretty bad for this and this presence and then edit the answer to make it better.

Adel (12:13):

And that creates another way to retrain the model with human

Thomas Scialom (12:16):

Exactly and to move away to have a shortcut when the model is just bad.

Adel (12:21):

Okay, that's very fascinating. And so that's one of the open questions is that that has yet to plateau and I think this is a great segue to discuss the PR generative of AI landscape as well because there's quite a few open questions on LLMs today and what the next generation of LLMs will look like. The leap in intelligence from models seems to be getting smaller with each release or for example, the jump from GT two to GT three, the GT three to GT 3.5, 3.5 to four seems to be getting slightly smaller with each iteration. So one I'd love to see if you think that is correct and then second, do you anticipate another significant leap in intelligence soon? If so, what does that look like and what's driving it?

Thomas Scialom (13:05):

If you think in term of although managers, yes, I anticipate much bigger jump than what we had. I think it's easy to measure the gap of intelligence between or know age not intelligence, let me say between a five years old kid and an 18 years old kid having an 18 years old kid and a 30-year-old.

Adel (13:27):

Yeah, that's true. Yeah,

Thomas Scialom (13:28):

I think we are seeing kind of the same thing with those models memorizing massive amount of knowledge and at some point they know almost everything and so yeah, maybe the gap starts to be smaller and smaller in terms of knowledge maybe beyond the long tail, but it's harder to measure and so the perception from people is that the gap is smaller. I'll add that in term of intelligence reasoning it might not be true. There's all those tasks on, for instance, on math benchmark, we had seen immense progress from GPT two to G PT three, four and now like Gemini plus

Adel (14:08):

3.5

Thomas Scialom (14:09):

And almost 90 I think with one of the reasons that indicates some progress. I will also add that at some point what matters is not the knowledge. It's fine to not know something if you know how to find it by browsing or by looking at the internet. And so I think we are reaching a level of intelligence where this kind of behavior will start to increase more and more. It's already the case and for that you need a model that is able to format a structural output to reason to write a plan to evaluate step by step, refine its own output. Ultimately, I think it's one of the key feature self critic itself. Right now on the LHF stage, everything is synthetic, yet we have a human at some point that say this output is better than this output and we may see the early sign that the model is able to judge in some context it's still limited and while we will not see this kind of skills improve on some main benchmarks which are very knowledge intensive anyway, maybe the gap of improvement on this kind of new skills and abilities will increase, which to me, let me think that the gap is just not narrowing at each generation.

(15:20):

The second thing is when you previous level of intelligence you can compound with some additional order of magnitude improvement. Liketu use, we did tool formula one year ago and now with in Instruc model in the sales chart, you can give a tool and the documentation in natural language to an API and then they will learn to leverage that and what I was saying before, learn to search online information. For instance, if it doesn't know calculator or code execution and all those tools, multiple steps and self refinement and say, oh, now it's a dead end, can I maybe explore another way to solve that? And all that will come point to orders of magnitude more intelligence behaviors and capabilities that may not be reflected with the pure LLM performance on some benchmarks, but with new generation enabling that,

Adel (16:12):

That's a really great insight and yeah, I couldn't agree more. I think the ag agentic path as well will definitely drive a lot more usefulness even if on the LLM benchmark you may not see that come out very directly, but maybe to expand on that, how much of that agentic workflow is enabled by software engineering with having LLMs with a layer of tools access and making sure that there's a recursive nature to using the tools and all of that engineering work versus the training of the model. So how do you balance those to create a really strong agent? You need both, but where are we now in intelligence essentially to enable these set of agent use cases?

Thomas Scialom (16:50):

So we created a benchmark last year called generally system benchmark. It was interesting because so GPT-4 had like 5% GPT, 3.5%. It corresponds to tasks that will require a lot of different steps, so complex tasks for which the models are failing for which the answer is not present on the internet. An example for I can give you is for instance, oh the NASA published a photo at some point on Twitter about this astronaut and they published a similar photo but with another one same condition a few months later. Can you tell me the number of days between the two? And you know that this information is not spent on internet so there's no memorization possible, so the model has to do the steps. This is a kind of question that doesn't require any expert skills. We didn't solve easily given time, but that models are totally failing and with the last year what we have seen is some agent framework like autogens copilot hugging face agent that what deal with what you mentioned connecting tools and solve self refinement loop and fin like that. And this was critical. It moved performance from like 10% with GP four to something like 30%. It's yet to be solved but the progress is immune trendous. That being said, if you power those agents with G PT four, that's a performance again if you power them with g PT 3.5. So performance is zero. Let's say the gap between the two is crazy in term of capabilities.

Adel (18:34):

A capabilities. Okay. And you can imagine that GT five or LAMA 405 E sorry, will drive a lot of the progress there.

Thomas Scialom (18:42):

No, that's true and we put a lot of effort. We emphasize that I think in the paper about the tool used for post-training. I think LAMA three is very good at zero shot function coding, which is a key feature that so many people ask about. So I recommend to test it for

Adel (18:59):

That and definitely the community is going to play a lot with LAMA 405 B and the LAMA three family. So maybe one additional thing on the state of generative ai, do you believe that scaling laws will continue to enhance performance of models as we increase size of data and the training runs and the compute with this famous Twitter image of small circle GT four, massive circle G PT five and it will have 1000 x performance at this point became a meme. Do you see that scaling laws continuing on as we see today or are we reaching a plateau?

Thomas Scialom (19:33):

Yes and no. I think it's a bit money. I think that absolutely scaling will leads to improvement. Yes. That being said, what do you want to scale? And by that, there's so many improvements in terms of algorithms, in terms of training site that leads to a better scaling slope. That two Dell model at seven B are often better than what we had two years ago or than GPT-3 0.5, which was much bigger thing like that. So I think really the seven B model that is performing Minerva, which was something based on if I'm correct, 300 BI forget parameters, so 50 times bigger. So there's so many improvement that comes from algorithmic optimization as well from data quality from the training method and from additional external thing like tulio, things like that, but could end in a better loop with synthetic data which are augmented the next participation method. But yes, scaling is way but not scaling anything. Like if we had to scale LSTM, we will be nowhere today. So yes, scaling will keep playing the biggest factor role, but if we scale by one to three other managers, we can get maybe 10 or 20 out of augmented scaling by improving all the other aspects.

Adel (20:55):

That's great. So quality of data, algorithmic optimization are also big levers outside of just additional data and compute. One thing that we saw as well this year, especially with GT four oh was multimodal capabilities. Definitely multimodal capabilities improve a model's usefulness, but how crucial is it to train models on these different types of data to boost raw capabilities? Maybe intelligence is not the best word, but their ability to perform tasks. So yeah, walk me through the importance of having different types of data to have more performance models.

Thomas Scialom (21:32):

First thing is you want your model to be able to process ideally any input to any output and that's just much more like smoother scalable than a very complex orchestration. That being said, with respect to how it'll improve a multimodal transfer to let's say text is a bit unclear. My intuition is that either we didn't crack the problem yet or another hypothesis is that text is so dense in information al from what it contains, it is a log of the thoughts somehow that you need so many more images to basically map the signal of information you will have in the list. Maybe in ality there's an argument that speech end to end model may with the exams in donations like that leads to improvement. There's a fact is there which we haven't explored yet our side

Adel (22:31):

And then maybe one last question on the kind of the state of ion and like the technical landscape that we're living into today. You mentioned something if you scaled L SDMs we wouldn't be where we are today. Right. Gary Marcus famously repeats this notion that deep learning is hitting a wall and that LLMs and generative AI will plateau as time goes by. Granted he's been saying this for the past at least five years and we've been seeing improvements all throughout. Do you think deep learning and the transformer paradigms are reaching their limits?

Thomas Scialom (23:00):

No. Anyone that say that in the past has been proven wrong in a very short term,

(23:08):

No. What I can tell you is look, what I often say is one, we know that we have, I was saying before exploration with respect to exploitation. I think if we had zero more progress neoma breakthrough in the next decade, we will have still five to 10 years to just explore, understand all these breakthroughs and revolution we just had, which will lead to incremental significant progress but incremental only for the next 10 years integrating all the modalities together, managing everything, agent behaviors, well done, inference, compensation, better post training, we are still at the historical age. All those things together will used to actually probably cure those many better models and yet there's so many resources put in there, not just money but people that are working on that and more and more every day it's likely that what the last decade of deploying dollars that a breakthrough will come again, what it'll be, I don't know. We had seen post-training LHF method, we had seen Transformers, we had seen self supervision and all those things ly it's likely that something else will happen and will again give us further of magnitude more better the planning stuff. So I think it'll be very crazy to better against that.

Adel (24:35):

I personally agree with you. We should have Gary Marcus on the show see his thoughts. Maybe as we shift a bit to discuss the broader LLM landscape as well and other providers, the open source community, the private foundation models with the announcement of GT four oh by open ai, open AI cloud 3.5 philanthropic. First I'd love to see from your perspective, how do you view these models from a technical perspective, what is impressive about these models and then what are your thoughts on their capabilities and then I'd love to see kind of how you view the ecosystems of open source AI versus closed source or private foundation models emerging or evolving over the next year.

Thomas Scialom (25:14):

I think all those models are pretty good. I guess we're at the stage where we are more or less at the same level of capabilities besides maybe G pt four zero being end to end multimodal, which is something kind of breakthrough compared to the others. That being said, we keep going at least incrementally and then for the next generation of compute and scale, I'm really keen to see what will happen. I would say we have filled the gap between open source and cloud source. Think about one year ago we were releasing LAMA two, I think it was a few days. About one year ago before that you had zero open source chat model. We have come from a long way now there's more and more open source model that are better and better. Hopefully with this one we'll fill the gap even closer. Now the question is how to have open source being number one and not just number two, which we are working hard on that.

(26:16):

So trend is pretty good. I guess the question which is an open question to me is in a world where there's more and more companies still leading the space, but with some secrets source like Gemini, tropic open ai, will it mean for them that they cannot compete with the open source committee? Because every time there's a new thing that they will work on, the open source committee will fill the gap and we'll be able to get better and better models and closed source is not the way for open science and will just limit progress. They will be able to make some breakthrough internally that will be very hard to our project externally, which leading to some very important advance for some of them and not others. There's an open question there. We'll do our best. That open source leads again,

Adel (27:12):

And this really segues really well to my next question because I think meta is the exception here in a lot of ways. A few other organizations such as MR maybe, but do you believe private models have an inherent advantage over open source counterparts because of their compute, the amount of money they have, their ability to hire researchers, the data that they're procuring? If so, how can you mitigate this as a community, as an ecosystem rather than as meta?

Thomas Scialom (27:38):

Yeah, that's a question I often ask myself. I don't know to what extent. I mean for instance, we don't train our model on user data. We don't have anything special about that besides working hard on high quality data. So we don't have an competitive advantage to be at meta right now of that kind of alert. Do they have at Google or at OpenAI maybe. I don't know. I'm not sure entirely. It may be helpful but to what extent it's not totally clear to me yet. So I would say the main thing is OpenAI started that game and betting on scaling language model a long time ago. We had our first language models one year and a half ago I was working on Galactica December, it was one year and a half. Then my friends in Paris followed with LAMA one played fan model and we are very, very early in the game on our side. So we move very fast I think to fill the gap and the train is good. I think it's the main reason

Adel (28:42):

And the open source communities is super vibrant, right? LAMA has an incredible open source community and there's a lot of players today in the open source community. Which LLM providers do you think will survive this wave or maybe what will be the defining feature of an LLM provider that will survive this? I don't want to name names here. And why do some models gain popularities while others don't? This is such an interesting aspect of the ecosystem.

Thomas Scialom (29:09):

Yeah, that's interesting. I mean the performance of the models is something that works. There's of course a bit of hype and marketing around that most of the time. I tend to think that this is the credibility of the organization that reads models. You can release everything with putting all the numbers and over fitting benchmarks. If people know that and discover that once they will have much more depth the next release you will do. For instance, we have proven in the past to be very good in our evaluation with them so people trust us and others have done the same and that's good. So I would say then what I'm seeing right now, but that might change, things are changing very fast is a lot of small models that are either specialized or general good generally and you want to take this one for code, this one for science, this one format math, this one for general stuff and then for this bigger size there's less and less competitors though for those five, like at G PT four performance, we are the only one age. There's probably a lot of platform actually that we take advantage of that and serve all the models together. I think that's good but good for the committee, that's good for transparency. It's good to not have a monopoly there. So overall I'm in favor of having more and more than those companies

Adel (30:28):

Couldn't agree more and then because something they hinted that I'd love to also unpack a bit more. There's going to be a generation of really small models for really specific use cases, right? Like code stroll is a good example of that for example. But there are also massive models that you can use across a variety of use cases. How do you see that aspect of the LLM ecosystem evolving? When should you use a massive model versus a small specialized model? I'd love to gain your thoughts on how you see the ecosystem evolving there.

Thomas Scialom (30:54):

I think it's always a question of price to pay and time to insurance for your product needed and capability in the setup unit for your client. If you need something very, very good and you cannot accept any hallucination and the further five B model is giving you enough performance that your product is useful for your client but a smaller model is not reaching well, you don't have choice then it's a balance with how much you are willing to pay. A four, five B model will cost more than a smaller model. It'll take also a bit longer to infer versus being better. So it's always case by case a project question rather than anything else. Now there's smart strategies that I'm seeing more and more leveraging good models at the beginning to create synthetic data that you maybe distillate in smaller models and that so there also all this tag that people are building.

Adel (31:54):

Interesting. So a lot of it is depending on inference time and costs for serving these models and the use case which definitely agree with you and I think the segues as well to my next question which is you're in a research environment that's actually shipping models that are being used in products today, which I think is not necessarily common for all types of research environments. A lot of the research that you produce is being used in production almost immediately as soon as it's shipped. So what's it like working in a research environment that's also delivering real world value as soon as the research is done and how do you succeed in this type of environment?

Thomas Scialom (32:30):

Yeah, it's very new and actually it's reflected with even our organization at Meta. We had, as you may know, split between, we were all working at Fair Fair exactly the foundation AI with the research and we splitted it in a small group at the beginning to a new org called Gen ai which is also responsible not just to create foundational models we and LA to LAMA three but also product met AI and all those things that are now shipped in our products. I think it's a right move and it's very interesting because we are at a level of capability for those models where it's directly useful for some products. And so driving the research with the project feedback what is useful or not, what do you need where the model is bad, just that evaluation of those models which are general by nature is extremely hard. Having some people actually using your model in for direct products, having billions of users telling you your model is bad at that and that and then telling you, yeah, I saw the progress now it's good at that. This is extremely useful.

Adel (33:46):

You mentioned something like serving a model to billions of users. That's I think a challenge that only a handful of teams on the planet have to deal with. How do you deal with such level of scale when serving? What are some of the best practices tricks you've learned along the way to be able to serve so many users with inference?

Thomas Scialom (34:04):

So I think one thing I can answer you technically is having super efficient small models, which is why we kind of from the scaling laws went way beyond what is optimal compute at training time to actually beyond the scaling the chinchilla trap. Basically we train it on so many tokens that it makes small models very efficient. That's the first thing. You have two dimensions basically the size of the model, the weight are the training tokens, the number of steps and the thing is at infant time, one of the two dimensions drops and not the other. Your model is big, it'll remain big at infant time, but training number of tokens of steps, it's not a factor that impacts infant time. You can train on one beyond token or a hundred beyond tokens. You'll have the same infant time and so we train them on a lot of tokens. That's one way first to tackle efficient models for instance, serving a lot user. Now the certain thing I want to add is there's huge teams at Meta and that's in our DNA to be able to scale. There's super good teams here I saw that are able to sell that to start to work on that. And while we haven't served yet on the of user, we are starting to see the on meta you tested it but it's pretty good. We've already integrated a generation of vmu and Tel. That's pretty good.

Adel (35:33):

That is definitely pretty cool. And then maybe as we cap off our chat toma, what current research in the field excites you the most at the moment?

Thomas Scialom (35:43):

Agent barrier, I think pre-training has still a lot to get but it's getting more major. There's yet a lot of different things to get post-training. We are now starting to understand the beginning, the desert of research to improve it in. But agents is to me what will leads to the next many orders of magnitude more capabilities and it's a far waste for now. So when I need to performer, that's when I realized we need language models aligned to flow instruction to move to the next stage. My view is now that we have G PT four level capabilities with LAMA three, that's level of capability where you start to see the preliminary self refinement capabilities. There's this nice paper from Jim Van Nvidia for instance. And from now on, while we will keep increasing and improving on post-training and pre-training, this is where you can start to do agent behaviors.

Adel (36:45):

And then maybe as we wrap up toma, what are some major trends that you expect this year to happen in the LLM space, in the generative AI space and when do you think that next breakthrough will happen? What's your time range for the next breakthrough if you were to put money on it?

Thomas Scialom (37:01):

Right, so about trends, I would say end to end multimodal integration that starting to, we had the Moshi, the first model release actually G four zero is not really released in so startup I did the first release there and I expect to see more and more we will go there as well. Of course that's one, two agents of course we are seeing that more and more compute inference, a lot of study to improve compute and actually the fact that we release for the first time such a big good model, I expect the company to play with it and to find a lot of crazy ways to make it more inference funny for I would say robotics. It's something very, very at the age at the beginning, but I expect to see more and more progress, which is the same, it's very correlated to adjuncting behaviors, but I'm also seeing the cost of robots decrease, which will make these models grounded in real physical work. Basically to answer your question, but when is next breakthrough? I don't come from the future. I don't know. I would say something big will happen in the next five years or I'll be surprised.

Adel (38:14):

Okay, so that's the only prediction I was asking for. Next five years something big will happen. We'll see what it will be. We, yeah, we dunno what it is and yeah, thank you so much. I think that's great to cop off today's episode. Thank you so much Tamma for coming on data frame. I really, really appreciated the chat and thank you so much for coming and sharing all the great things the LAMA team is working on.

Topics
Related

blog

What is Llama 3? The Experts' View on The Next Generation of Open Source LLMs

Discover Meta’s Llama3 model: the latest iteration of one of today's most powerful open-source large language models.

Richie Cotton

5 min

blog

Introduction to Meta AI’s LLaMA

LLaMA, a revolutionary open-source framework, aims to make large language model research more accessible.
Abid Ali Awan's photo

Abid Ali Awan

8 min

blog

What Is Meta's Llama 3.1 405B? How It Works, Use Cases & More

Meta releases Llama 3.1 405B, a large open-source language model designed to compete with closed models like GPT-4o and Claude 3.5 Sonnet.
Richie Cotton's photo

Richie Cotton

8 min

blog

8 Top Open-Source LLMs for 2024 and Their Uses

Discover some of the most powerful open-source LLMs and why they will be crucial for the future of generative AI
Javier Canales Luna's photo

Javier Canales Luna

13 min

tutorial

Fine-Tuning Llama 3.1 for Text Classification

Get started with the new Llama models and customize Llama-3.1-8B-It to predict various mental health disorders from the text.
Abid Ali Awan's photo

Abid Ali Awan

13 min

code-along

Fine-Tuning Your Own Llama 3 Model

Maxime, one of the world's leading thinkers in generative AI research, shows you how to fine-tune the Llama 3 LLM using Python and the Hugging Face platform.
Maxime Labonne's photo

Maxime Labonne

See MoreSee More