Skip to content
Duplicate of Do students describe professors differently based on gender? - Sample Code
  • AI Chat
  • Code
  • Report
  • Do students describe professors differently based on gender?

    Language plays a crucial role in shaping our perceptions and attitudes towards gender in the workplace, in classrooms, and personal relationships. Studies have shown that gender bias in language can have a significant impact on the way people are perceived and treated.

    For example, research has found that job advertisements that use masculine-coded language tend to attract more male applicants, while those that use feminine-coded language tend to attract more female applicants. Similarly, gendered language can perpetuate differences in the classroom.

    In this project, we'll using scraped student reviews from ratemyprofessors.com to identify differences in language commonly used for male vs. female professors, and explore subtleties in how language in the classroom can be gendered.

    This excellent tool created by Ben Schmidt allows us to enter the words and phrases that we find in our analysis and explore them in more depth. We'll do this at the end.

    Catalyst also does some incredible work on decoding gendered language.

    1. Scraping the web for reviews of professors

    Text data––especially gendered text data, is hard to come by. Web scraping can be a helpful data collection tool when datasets are unable for this kind of work. We can write web scrapers to compile datasets on job descriptions, freelancer reviews, and, as in our use-case, professor reviews by students.

    ratemyprofessors.com provides a wonderful combination of qualitative and quantitative metrics that we can analyze.

    Although the data on their websites is not labeled by gender, we'll use pronouns used by students to label professors "Male" or "Female". Of course, this approach is not perfect, as it relies on the students' use of pronouns. Professors with non-binary pronouns will also be under-represented in the data, since very few reviews will have them, and so it's not trivial to write an algorithm to detect them. These are definitely important questions in the world of gender analysis though, so we encourage you to pick them up as extensions of this project!

    Task 1a. What relevant packages do we need for web scraping and reading in data?

    # Used to open urls
    ____
    
    # Used to parse html
    ____
    
    # Used to pause code intermittently so that our scraper is not blocked
    ____
    
    # For data manipulation and analysis
    ____
    
    # To access our data filenames so we can read them
    ____

    Task 1b. Which professors will we be looking at?

    The web_scraping.ipynb notebook provided in this workspace provides some code using selenium that was used to find urls from ratemyprofessors.com that we'll be scraping in this notebook.

    Whilst the specific selenium code used to generate this list of URLs is beyond what we can cover today, we encourage you to explore this code to understand how we generated this list of professors!

    For now, we'll open the file profs_888.txt and read each professor's url in a new line, and save this variable as profs.

    with open(r'profs_1244.txt', 'r') as f:
        profs = ____
    Hidden output

    Task 1c. How can we use urls to scrape relevant data about professors?

    Each professor has an overall rating that looks like this

    and a series of reviews that look like this

    The code below can be used to iterate through all or part of the list of urls in profs, and scrape them for qualtiative and quantitative data. You won't need to run through this whole list though, because the data/ folder already contains the reviews of several professors that we have scraped for you!

    • The overall rating for the professor
    • All the individual reviews written by students about the professor
    • The "emotion" corresponding to each individual review: 😎 AWESOME, 😐 AVERAGE, or 😖 AWFUL
    • A numerical "quality" rating corresponding to each individual review

    We won't be using the "difficulty" ratings shown here.

    # USE ONLY ONE OF THE FOLLOWING FOR STATWEMENTS
    
    # 1. Sample code to loop through the whole list of professors    
    # for s in (range(40, len(profs),10)):
    
    # 2. Sample code to loop through the first 10 professors
    for s in ____:
    
        texts = ____ # Initialzie an empty array
        print((s, s+10)) # Iterate through 10 professors at a time
        
        for url in ____: # Iterate through this block
            ____ # To prevent sending too many requests at once
            r = ____ # Open URL
            htmlparser = ____ # Instantiate a parser to parse HTML
            tree = ____ # Parse HTML returned by the url
            
            text = ____('//*[@id="ratingsList"]/li[*]/div/div/div[3]/div[3]/text()') # Extract reviews
            ratings = ____('//*[@id="root"]/div/div/div[3]/div[2]/div[1]/div[1]/div[1]/div/div[1]') # Extract ratings
            emotion = ____('//*[@id="ratingsList"]/li[*]/div/div/div[1]/div[1]/div[2]/text()') # Extract emotion
            quality = ____('//*[@id="ratingsList"]/li[*]/div/div/div[2]/div[1]/div/div[2]') # Extract quality
            texts.append((url,
                          text,
                          [i.text for i in ratings][0],
                          emotion,
                          [i.text for i in quality],
                         )) # Append metrics to empty list
    
        print() # Print new line for readability
        df = ____
        df.to_csv(f'df_{s}_to_{s+10}.csv') # Write result to df in blocks of 10 professors at a time
        ____ # Pause to prevent sending too many requests at once

    2. Reading pre-scraped data

    Task 2a. How can we read a directory of scraped professor reviews and concatenate them?

    Since we have already scraped reviews from several professors for you, let's begin by concatenating all the files in the data folder provided. These have already been scraped for you.

    Since review, emotion and quality are lists but were recorded in string form, we'll apply eval() to them to turn them back from a string into a list.