Skip to content
NBA API Data Ingestion & ETL Process Example
  • AI Chat
  • Code
  • Report
  • Job Link: https://www.upwork.com/jobs/~01d912683b9b6547bd

    1. What is NBA API?

    The NBA API (Application Programming Interface) is an application programming interface that provides access to data and functionality related to the National Basketball Association (NBA). This API allows developers to access a variety of information such as player statistics, match results, team rosters, schedules, news and other data related to NBA basketball.

    The NBA API provides a convenient way for developers to create applications, services or websites that can use up-to-date information about basketball events, players and teams. This can be useful for creating sports applications, analytical tools, fantasy basketball services and other projects related to basketball in the context of the NBA.

    !pip install nba_api 

    2. NBA API Scoreboard Data Retrieval

    This script serves as a simple utility to fetch today's NBA game scoreboard information using the nba_api library. Make sure to install the required library before running the script. The script provides the scoreboard date, retrieves JSON data, and optionally converts it into a dictionary for further analysis or processing.

    """
    NBA API Scoreboard Data Retrieval
    
    This script utilizes the `nba_api` library to fetch today's NBA game scoreboard information, including the date of the scoreboard and the corresponding JSON data.
    
    Usage:
    - Install the required library: `pip install nba_api`
    - Run the script to retrieve and display today's NBA game scoreboard information.
    
    Script Structure:
    1. Import the necessary module from `nba_api`.
    2. Initialize the `ScoreBoard` object to fetch today's games.
    3. Display the scoreboard date.
    4. Retrieve the JSON data and store it in the `json_data` variable.
    5. Optionally, convert the JSON data to a dictionary using the `get_dict()` method.
    6. Write the JSON data to a file on disk with the filename as the scoreboard date.
    
    Requirements:
    - Ensure the `nba_api` library is installed (`pip install nba_api`).
    
    Note:
    - This script assumes a valid connection to the NBA API and availability of today's game data.
    
    """
    
    import json
    from nba_api.live.nba.endpoints import scoreboard
    
    # Today's Score Board
    games = scoreboard.ScoreBoard()
    
    # Display the scoreboard date
    score_board_date = games.score_board_date
    print("ScoreBoardDate: " + score_board_date)
    
    # Get the JSON from the object
    json_data = games.get_json()
    
    # Print the JSON data
    print(json_data)
    
    # Optionally, convert the JSON data to a dictionary
    dictionary_data = games.get_dict()
    
    # Write the JSON data to a file on disk with the filename as the scoreboard date
    file_name = f"datasets/scoreboard_{score_board_date}.json"
    with open(file_name, "w") as json_file:
        json.dump(json_data, json_file, indent=2)
    
    print(f"JSON data written to {file_name} file.")
    

    JSON Structure

    The JSON contains the data for today's NBA games scoreboard. The structure of the JSON is as follows:

    • gameScore: An array of objects, each representing a game and its details.
      • gameId: The unique identifier for the game.
      • gameUrlCode: The code used in the game URL.
      • gameStatusText: The status of the game (e.g., 'Final', 'In Progress', 'Scheduled').
      • gameClock: The current game clock time.
      • period: The current period of the game.
      • isBuzzerBeater: Indicates if the shot was a buzzer beater.
      • clock: The game clock time.
      • isEndOfPeriod: Indicates if it's the end of a period.
      • visitorTeam: Details of the visiting team.
        • teamId: The unique identifier for the team.
        • triCode: The three-letter code for the team.
        • win: The number of wins for the team.
        • loss: The number of losses for the team.
      • homeTeam: Details of the home team.
        • teamId: The unique identifier for the team.
        • triCode: The three-letter code for the team.
        • win: The number of wins for the team.
        • loss: The number of losses for the team.

    In essence, the structured JSON data provides a comprehensive snapshot of ongoing NBA games, offering a wealth of information for developers and enthusiasts alike.

    3. NBA API Data Ingestion into AWS S3

    Several approaches have been implemented to upload daily JSON data from the NBA API to AWS S3, including the following:

    1) AWS Lambda and CloudWatch Events:

    • Created an AWS Lambda function in Python that calls the NBA API and uploads data to S3.
    • Used Amazon CloudWatch Events to schedule this Lambda function to be called daily.
    • As a result, this function will automatically be called daily and upload data to S3.

    2) AWS Glue:

    • Used AWS Glue to decompose and perform ETL tasks.
    • Created a PySpark script that calls the NBA API and uploads data to S3.
    • This script is scheduled to run daily using the AWS Glue scheduler.

    3) Amazon EC2 and Cron:

    • Created Amazon EC2 and configured a cron job to run a script daily that calls the API and uploads data to S3.

    Each of these approaches has its advantages and can be used depending on specific needs and conditions. AWS Lambda and CloudWatch Events are easy to use and provide seamless automated calls, while AWS Glue can be more powerful with ETL tools and Amazon EC2 can provide full control over a virtual machine.

    from nba_api.live.nba.endpoints import scoreboard
    import json
    import boto3
    
    # Function to check if an object exists in S3
    def does_object_exist(s3_client, s3_bucket_name, s3_object_key):
        try:
            s3_client.head_object(Bucket=s3_bucket_name, Key=s3_object_key)
            return True  # Object exists
        except Exception as e:
            return False  # Object does not exist
    
    # Function to upload JSON data to AWS S3 with a dynamically generated key if it doesn't exist
    def upload_json_to_s3(json_data, s3_bucket_name, aws_access_key_id, aws_secret_access_key, region_name, score_board_date):
        # Create an S3 client
        s3_client = boto3.client(
            's3',
            aws_access_key_id=aws_access_key_id,
            aws_secret_access_key=aws_secret_access_key,
            region_name=region_name
        )
    
        # Convert JSON data to string
        json_string = json.dumps(json_data, indent=2)
        
        # Extract year and month from the current_date
        year = score_board_date[:4]
        month = score_board_date[5:7]
    
        # Modify the S3 object key to include year and month as part of the path
        s3_object_key = f'year={year}/month={month}/scoreboard_{score_board_date}.json'
    
        # Generate a dynamic object key based on the score board date
        #s3_object_key = f'scoreboard_{score_board_date}.json'
    
        # Check if the object already exists in S3
        if not does_object_exist(s3_client, s3_bucket_name, s3_object_key):
            # Upload JSON data to S3
            try:
                s3_client.put_object(Body=json_string, Bucket=s3_bucket_name, Key=s3_object_key)
                print(f'JSON data successfully uploaded to S3 with key {s3_object_key}')
            except Exception as e:
                print(f'An error occurred while uploading JSON data to S3: {e}')
        else:
            print(f'The object with key {s3_object_key} already exists in S3. Skipping upload.')
    
    # AWS credentials and S3 details
    aws_access_key_id = 'Your_ACCESS_KEY_ID'
    aws_secret_access_key = 'Your_SECRET_ACCESS_KEY'
    region_name = 'Your_Region, e.g., us-east-1'
    
    # Specify the S3 bucket name
    s3_bucket_name = 'nba-api-bucket'
    
    # Today's Score Board
    games = scoreboard.ScoreBoard()
    
    # Get the JSON from the object
    json_data = games.get_json()
    
    # Extract the score board date from the object
    score_board_date = games.score_board_date.replace("-", "_")  # Replace hyphens with underscores in the date
    
    # Call the function to upload JSON data to S3 with a dynamically generated key if it doesn't exist
    upload_json_to_s3(json_data, s3_bucket_name, aws_access_key_id, aws_secret_access_key, region_name, score_board_date)
    

    4. NBA API Scoreboard Data Ingestion to AWS S3 using Python Script

    This documentation provides an overview and usage guide for the Python script that retrieves NBA Scoreboard data using the nba_api library and uploads it to an AWS S3 bucket. The script includes functions for checking the existence of an object in S3 and uploading JSON data with dynamically generated keys based on the date.

    1. Prerequisites

    Before running the script, ensure you have the following:

    • Valid AWS credentials (Access Key ID, Secret Access Key)
    • Appropriate permissions for S3 operations
    • Installed Python with required libraries (nba_api, boto3)

    2. Script Overview

    The script performs the following steps:

    • Imports necessary libraries (nba_api.live.nba.endpoints.scoreboard, json, boto3).
    • Defines two functions: does_object_exist and upload_json_to_s3.
    • Retrieves NBA Scoreboard data using the nba_api library.
    • Converts the data to JSON format.
    • Extracts the current date and modifies it for use in S3 object keys.
    • Uploads the JSON data to AWS S3 with a dynamically generated key.

    3. Functions

    3.1 does_object_exist

    This function checks if an object exists in the specified S3 bucket.

    does_object_exist(s3_client, s3_bucket_name, s3_object_key)

    Parameters:

    • s3_client: Boto3 S3 client.
    • s3_bucket_name: Name of the S3 bucket.
    • s3_object_key: Key of the S3 object.

    Returns:

    • True: If the object exists.
    • False: If the object does not exist.

    3.2 upload_json_to_s3

    This function uploads JSON data to AWS S3 with a dynamically generated key if the object does not exist.

    upload_json_to_s3(json_data, s3_bucket_name, aws_access_key_id, aws_secret_access_key, region_name, score_board_date)

    Parameters:

    • json_data: JSON data to upload.
    • s3_bucket_name: Name of the S3 bucket.
    • aws_access_key_id: AWS Access Key ID.
    • aws_secret_access_key: AWS Secret Access Key.
    • region_name: AWS region.
    • score_board_date: Date used for dynamic key generation.

    4. Usage

    1. Replace the placeholder AWS credentials (aws_access_key_id, aws_secret_access_key) and S3 details (region_name, s3_bucket_name) with your own.
    2. Install the required libraries using pip install nba_api boto3.
    3. Run the script to fetch NBA Scoreboard data and upload it to the specified S3 bucket.

    5. Notes

    • The script dynamically generates S3 object keys based on the date and uploads data only if the object does not already exist in the specified S3 bucket.
    • Ensure proper error handling and security practices are implemented when deploying this script in production.

    5. Scheduled AWS Lambda Job Application for NBA API Scoreboard Data Ingestion to AWS S3

    This application sets up a scheduled job using AWS Lambda to fetch daily NBA API scoreboard data and ingest it into an AWS S3 bucket. The Lambda function is triggered on a regular schedule, enabling automated ingestion of NBA scoreboard data.

    Architecture

    Usage:

    • Ensure you have the necessary AWS credentials and permissions.
    • Install the required libraries (nba_api, boto3) in the Lambda function environment.
    • Customize the Lambda function code and configuration based on your specific requirements.
    • Configure the CloudWatch Events rule to define the schedule.

    Application Components:

    1. Lambda Function:

    • Utilizes the nba_api library to fetch daily NBA scoreboard data.
    • Uploads the data to an AWS S3 bucket.
    • Can be customized based on specific data processing needs.

    2. CloudWatch Events Rule:

    • Defines the schedule for triggering the NBA API Scoreboard Lambda function.
    • Configured with a cron expression or rate expression to specify the frequency of data ingestion.

    Steps:

    1.Lambda Function:

    • Write the Lambda function code to fetch NBA scoreboard data and upload it to S3. Below is an example of an AWS Lambda function code written in Python that fetches NBA scoreboard data and uploads it to an S3 bucket. The data is partitioned by the year and month attributes.
    • Ensure necessary environment variables (AWS credentials, S3 details) are configured.

    2.CloudWatch Events:

    • Create a new rule in the CloudWatch Events console.
    • Set up the rule with a cron expression or rate expression, specifying the schedule for NBA data ingestion.

    3.Lambda Execution:

    • When the scheduled time is reached, the CloudWatch Events rule triggers the NBA API Scoreboard Lambda function.
    • Monitor Lambda execution logs for insights into job performance.

    Requirements:

    • AWS account with Lambda and CloudWatch Events permissions.
    • Install necessary libraries (nba_api, boto3) in the Lambda function environment.
    • Adjust Lambda function code and CloudWatch Events rule as per specific NBA data ingestion requirements.

    Note:

    • Regularly monitor and test the scheduled job to ensure proper execution.
    • Validate CloudWatch Events rule expressions for accuracy.
    """
    NBA API Scoreboard Data Ingestion to AWS S3 using AWS Lambda
    
    This AWS Lambda function leverages the `nba_api` library to fetch today's NBA game scoreboard information and uploads the corresponding JSON data to an AWS S3 bucket. The function is triggered to run periodically.
    
    Lambda Handler:
    - The `lambda_handler` function is the entry point for the AWS Lambda execution.
    - It fetches today's NBA game scoreboard using the `scoreboard.ScoreBoard()` class.
    - The JSON data is extracted from the object and stored in the `json_data` variable.
    - The S3 bucket name, AWS credentials, and region details are specified.
    - A dynamic object key is generated based on the score board date.
    - An S3 client is created using `boto3`.
    - The `does_object_exist` function checks if the object already exists in the S3 bucket.
    - If the object doesn't exist, the JSON data is converted to a string and uploaded to S3.
    - The Lambda function logs success or any error during the upload process.
    
    AWS Credentials and S3 Details:
    - Update the `aws_access_key_id`, `aws_secret_access_key`, `region_name`, and `s3_bucket_name` variables with your AWS credentials and S3 bucket details.
    
    Note:
    - Ensure the `nba_api` library is included in the Lambda deployment package.
    - The Lambda function should have appropriate IAM roles and permissions to interact with S3.
    """
    
    
    
    import json
    import boto3
    from nba_api.live.nba.endpoints import scoreboard
    from datetime import datetime
    
    def does_object_exist(s3_client, s3_bucket_name, s3_object_key):
        try:
            s3_client.head_object(Bucket=s3_bucket_name, Key=s3_object_key)
            return True  # Object exists
        except Exception as e:
            return False  # Object does not exist
    
    def lambda_handler(event, context):
        # AWS credentials and S3 details
        aws_access_key_id = 'Your_ACCESS_KEY_ID'
        aws_secret_access_key = 'Your_SECRET_ACCESS_KEY'
        region_name = 'Your_Region, e.g., us-east-1'
    
        # Specify the S3 bucket name
        s3_bucket_name = 'your_bucket'
    
        # Today's Score Board
        games = scoreboard.ScoreBoard()
    
        # Get the JSON from the object
        json_data = games.get_json()
    
        # Extract the score board date from the object
        score_board_date = games.score_board_date.replace("-", "_")  # Replace hyphens with underscores in the date
    
        # Generate a dynamic object key based on the score board date
        s3_object_key = f'scoreboard_{score_board_date}.json'
    
        # Create an S3 client
        s3_client = boto3.client(
            's3',
            aws_access_key_id=aws_access_key_id,
            aws_secret_access_key=aws_secret_access_key,
            region_name=region_name
        )
    
        # Check if the object already exists in S3
        if not does_object_exist(s3_client, s3_bucket_name, s3_object_key):
            # Convert JSON data to string
            json_string = json.dumps(json_data, indent=2)
    
            # Upload JSON data to S3
            try:
                s3_client.put_object(Body=json_string, Bucket=s3_bucket_name, Key=s3_object_key)
                print(f'JSON data successfully uploaded to S3 with key {s3_object_key}')
            except Exception as e:
                print(f'An error occurred while uploading JSON data to S3: {e}')
        else:
            print(f'The object with key {s3_object_key} already exists in S3. Skipping upload.')
    
        return {
            'statusCode': 200,
            'body': json.dumps('Lambda function executed successfully!')
        }
    

    Partitioning data by year and month attributes in the S3 bucket

    Partitioning data by year and month attributes in the S3 bucket is a common practice in data engineering and analytics. This approach provides several benefits:

    Efficient Data Retrieval: When your data is organized into partitions based on time, it becomes easier to retrieve specific subsets of data. For example, if you want to analyze data for a particular month or year, you can directly query or access the relevant partition, reducing the amount of data scanned.

    Cost Optimization: AWS S3 charges you for the amount of data scanned during queries. By partitioning your data, you can limit the amount of data that needs to be scanned, resulting in cost savings, especially when dealing with large datasets.

    Improved Query Performance: Partitioning can significantly improve query performance. Queries that involve a time-based filter can skip irrelevant partitions, making the queries faster and more efficient.

    Organized Data Structure: Partitioning provides a logical and organized structure for your data. It makes it easier for both humans and automated processes to understand and manage the data.

    Scalability: As your dataset grows, partitioning helps maintain a scalable and manageable structure. It becomes particularly important when dealing with historical data and performing analytics over time.

    Compatibility with Query Engines: Many query engines and tools, including AWS Athena and Amazon Redshift Spectrum, are designed to work seamlessly with partitioned data. They can leverage partition information to optimize query execution.

    In the provided example, the S3 object key is structured with year={year}/month={month}/scoreboard_{current_date}.json. This ensures that the NBA scoreboard data is organized into yearly and monthly partitions, making it easier to manage and analyze the data based on time.

    6. JSON Files Processing with Amazon S3 and AWS Lambda

    This application illustrates the use of Amazon S3 to trigger an AWS Lambda function for immediate processing of NBA API JSON data after an upload. AWS Lambda is utilized for real-time processing of JSON data obtained from the NBA API.

    Usage:

    Amazon S3 Bucket:

    • Set up an Amazon S3 bucket to store the NBA API JSON files that will trigger Lambda processing.
    • Configure the bucket with the necessary permissions for event notifications.

    AWS Lambda Function:

    • Create an AWS Lambda function to process NBA API JSON data based on S3 triggers.
    • Write the code to define the processing logic for NBA data (e.g., data transformation, extraction, analysis).
    • Configure the Lambda function to receive events from the specified S3 bucket.

    S3 Event Trigger:

    • Configure an event trigger on the S3 bucket to notify the Lambda function when a new NBA API JSON file is uploaded.
    • Choose the appropriate S3 event (e.g., ObjectCreated) to activate the Lambda function.

    Lambda Execution:

    • When a new NBA API JSON file is uploaded to the S3 bucket, the configured event trigger activates the Lambda function.
    • The Lambda function processes the NBA API JSON data in real-time based on the defined logic.

    Example Use Cases:

    Data Transformation:

    • Process NBA API JSON data to transform it into a different structure or format.

    Data Extraction:

    • Extract specific information from the NBA API JSON payload.

    Real-time Analysis:

    • Perform real-time analysis on NBA API JSON data to derive insights.

    Recommendations:

    • Monitor Lambda function logs for insights into processing performance.
    • Test the S3 event triggers and Lambda function thoroughly.
    • Ensure proper error handling and logging in the Lambda function.

    Note:

    • This setup enables the automatic and immediate processing of NBA API JSON data as soon as it is uploaded to the designated S3 bucket.