Optimizing Online Sports Retail Revenue

Project: Analyzing Unicorn Companies

Data conductor, leading the orchestra of insights with analytical finesse.

Project: Analyzing Unicorn Companies


Loan Data

Sasso Tech | Jan 2023 - Nov 2023

Junior Data Scientist

Project for AI Lawyer Assistant Startup aiming to revolutionize the lawyer work in Brazil by providing a service with an AI assistant integrated to judicial data. The aim of the project was to compose a searchable database with an assistant to summarize jurisprudence. Web scraping of judicial decisions from the most important Brazilian Tribunals using Python with BeautifulSoup and Selenium; Routines of data extraction and error messages using Airflow; Search engine using ClickHouse Vector Database to find the decisions of interest based on their official summaries. The embeddings were obtained through OpenAI ada-002 model; Decision jurisprudence summarization using OpenAI GPT 3.5 with LangChain; LLM training for summarization in PT-BR using OpenAI framework.
Sasso Tech | Oct 2022 - Feb 2023

Junior Data Scientist

Project for Data Extraction Textile company which received more than 500 invoices per day (with different formats) was using working people to extract specific information from the invoices to compose further reports. The project aim was to automate this process. Extraction of the Access Key from invoice PDF files using Python; Connection to the government Electronic Invoices API to retrieve desired data; Data processing and preparation; Summarization of invoice context using GPT3.5 with LangChain; Inference endpoint built with FastAPI; Docker containerized app deployed in Google Cloud Run.

State University of Campinas | Mar 2018 - Dec 2022

PhD Student

Developed an innovative data analysis method for detecting contaminants in genotyping data from biparental populations; Incorporated genetic metrics, principal components analysis, and clustering analysis; Rigorously validated using simulated populations; Implemented the method in a user-friendly R Shiny tool for quality control in plant breeding programs. Conducted an extensive assessment of methods for predicting growth and biomass production in a forage grass species; Applied statistical and machine learning models for regression; Leveraged feature selection algorithms to enhance model accuracy; Identified key genes associated with desired traits; Modeled a genes interaction network; Contributed to the development of strategies for maximizing genetic gains in breeding programs while reducing time and costs. Projects executed in a Linux environment using specific bioinformatics tools and R/Python programming.

PhD in Genetics and Molecular Biology - BioinformaticsUniversity of Campinas - Unicamp | 2022
BSc in Biological Sciences - BioinformaticsUniversity of Campinas - Unicamp | 2017

Felipe Bitencourt Martins

I'm a PhD candidate in Genetics and Molecular Biology, working in plant breeding with focus on bioinformatics and data analysis. github.com/bitafelipe

