A killer application of large language models (LLMs) is answering questions about specific documents & datasets. This enables use cases such as customer service bots, question-answering systems about specific domains, & LLMs that can navigate data tasks.
In this code-along, Andrea Valenzuela, Computing Engineer at CERN, and Josep Ferrer Sanchez, Data Scientist at the Catalan Tourist Board, will walk you through building an AI system that can query your documents & data using LangChain & the OpenAI API. Throughout the code-along, they will share best practices for effectively loading & storing documents using LangChain, building a retrieval augmented generation pipeline for querying data, and building a question-answering bot.
Andrea Valenzuela is currently working on the CMS experiment at the particle accelerator (CERN) in Geneva, Switzerland. With expertise in data engineering and analysis for the past six years, her duties include data analysis and software development. She is now working towards democratizing the learning of data-related technologies through the Medium publication ForCode'Sake.
She holds a BS in Engineering Physics from the Polytechnic University of Catalonia, as well as an MS in Intelligent Interactive Systems from Pompeu Fabra University. Her research experience includes professional work with previous OpenAI algorithms for image generation, such as Normalizing Flows.
Josep is a Data Scientist and Project Manager at the Catalan Tourist Board, using data to improve the experience of tourists in Catalonia. His expertise includes the management of data storage and processing, coupled with advanced analytics and the effective communication of data insights.
He is also a dedicated educator, teaching the Big Data Master's program at the University of Navarra, and regularly contributing insightful articles on data science to Medium and KDNuggets.
He holds a BS in Engineering Physics from the Polytechnic University of Catalonia as well as an MS in Intelligent Interactive Systems from Pompeu Fabra University.
Currently, he is passionately committed to making data-related technologies more accessible to a wider audience through the Medium publication ForCode'Sake.