Public Procurement Dataset for Croatia (2008-2021)
Get the report here: Report
Intro: Public procurement plays a pivotal role in Croatia's economic landscape, accounting for a significant portion of government expenditure.
This dataset encompasses a vast array of contracts, offering valuable insights into the complexities of Croatia's procurement environment
Objective:
The objective of this exploratory data analysis is to gain insights into the procurement landscape in Croatia, focusing on key metrics such as the distribution of procurement types, trends over time, total tender value, average tender value by type, top buyers, bidders and etc.
Summary:
- Distribution and Trends in Procurement:
- Types of Procurement: The dataset reveals a predominant focus on supplies procurement (57,802 instances), followed by services (51,383) and public works (43,200). This distribution highlights the government's continuous need for a broad spectrum of goods, alongside essential services and infrastructural projects.
- Temporal Trends: There was a decreasing trend in the number of issued tenders from 2008 to 2018, suggesting a possible tightening of public spending or improvements in procurement efficiency. Post-2018, a resurgence in tender activity suggests a reactive increase in public projects, possibly influenced by economic recovery initiatives or infrastructural boosts.
- Financial Insights and Currency Conversion:
- Value Fluctuations: Works tenders had the highest average tender value at approximately €718,055, underscoring significant investments in public infrastructure. The services followed with an average of €363,057, indicative of the critical role of non-material support in public services.
- Currency Standardization: The presence of dual currency listings (HRK and EUR) necessitated a standardized approach for financial analysis. Conversion to euros was executed using historical average exchange rates, ensuring consistency in financial reporting and analysis.
- Market Dynamics and Competition:
- Competition Levels: A significant proportion of tenders, especially those issued under open procedures, attracted only one bidder. This pattern raises concerns about competitive dynamics and possibly indicates niche markets or high entry barriers in certain sectors.
- Data Integrity and Duplication: Over half of the tender IDs were duplicated, reflecting multiple bids per tender. Rigorous data cleaning was necessary to remove duplicates based on tender IDs and titles, ensuring the accuracy of subsequent analyses.
- Key Participants and Geographic Concentration:
- Prominent Buyers: The City of Zagreb dominated procurement activities, suggesting a centralization of governmental purchases in the capital. This could also reflect the higher infrastructure and public service needs of the populous capital.
- Bidder Demographics: The overwhelming majority of bidders were domestic, with a significant concentration in Zagreb (44.53%). This indicates a robust local contracting industry, though it might also point to limited foreign participation in the market.
- Challenges and Data Management Issues:
- Missing and Incomplete Data: A notable amount of data related to final tender prices was missing, particularly in the early years of the dataset. This issue was partially mitigated by substituting estimated prices where final prices were unavailable.
- Analytical Adjustments: Due to the significant missing data from 2008 to 2011, the tender total value might be missleading for those years. A decision was made to drop records from analysis where the missing data proportion was below a 5% threshold to maintain analytical rigor.
Source: The dataset is part of the global public procurement dataset (GPPD) available here: https://www.sciencedirect.com/science/article/pii/S2352340924003810#fig0005
Colums used in the analysis:
| Column Name | Description |
|---|---|
tender_year | Year of the tender. |
tender_id | Internal tender ID generated during the data processing. |
tender_title | Tender title. |
tender_proceduretype | Procedure type mapped to DIGIWHIST standard. It is based on the original procedure type published on the source publication that we recategorized to a standard enumeration. |
tender_supplytype | The type of the purchase. It can have the following values: supplies, services, public works. |
tender_recordedbidscount | Number of recorded bids - based on unique bids recorded in the source publication, i.e. it differs from lot_bidscount. |
tender_estimatedprice | Estimated price of the tender. |
tender_finalprice | Final price of the tender. |
lot_status | Whether the lot was awarded |
buyer_id | Main Identifier of the buyer from the source documents. |
buyer_name | Name of the buyer. |
buyer_city | City of the buyer. |
buyer_mainactivities | Main activity of the buyer. |
buyer_buyertype | Type of the buyer. |
bidder_id | Main Identifier of the bidder company from the source document |
bidder_name | Name of the bidder company. |
bidder_city | City of the bidder company. |
bidder_nuts_0 | Bidder company's First-level NUTS codes |
currency | Currency of prices. |
filter_ok | GTI specific filter which applies to work with non-duplicated awarded tenders/lots. |
# Import libraries
import pandas as pd
import numpy as np
import missingno as msno
import matplotlib.pyplot as plt
import seaborn as sns- Exploring the dataset
# Load the dataset
cro = pd.read_csv("Data/HR_DIB_2023.csv")
# Display unique values in each column and the shape of the DataFrame
display(cro.nunique().sort_values(ascending=False))
display(cro.shape)
# Display information about the DataFrame
cro.info()- The dataset contains many fields that we do not require. We'll select only the columns that are needed to proceed with the analysis.
# Selecting specific columns from the dataframe to focus on relevant data
cro = cro[["tender_year","tender_id", "tender_title", "tender_proceduretype", "tender_supplytype", "tender_recordedbidscount", "tender_estimatedprice", "tender_finalprice", "buyer_id", "buyer_name", "buyer_city", "buyer_mainactivities", "buyer_buyertype", "bidder_id", "bidder_name", "bidder_city", "bidder_nuts_0", "bid_price", "currency", "lot_status"]]
# Displaying the first few rows of the dataframe to verify the selected columns
display(cro)# Displaying information about the dataframe including the data type of each column
cro.info()cro["lot_status"].value_counts(normalize=True)- We are only interested in tenders that where "Awarded" or "Preawarded"
print(cro["lot_status"].unique())
cro = cro[(cro["lot_status"] == "AWARDED") | (cro["lot_status"] =='PREAWARDED')]
cro.shapeNUMBER OF BIDS PER TENDER
- Most tender had only 1 bidder.
# Calculate the normalized value counts of recorded bids and display the top 5
bids_count = cro["tender_recordedbidscount"].value_counts(normalize=True)
print(bids_count.head())
# Plot a histogram of the tender recorded bids count with specified bins and x-ticks
cro["tender_recordedbidscount"].plot(kind="hist", bins=np.arange(1,11,1)) # Setting bins from 1 to 10
plt.xticks(np.arange(1, 11, 1)) # Setting x-ticks to show each bin clearly
plt.show() - Upon inspection, we found that over half of the
tender_idsare duplicated. These duplicates arise because multiple bidders applied for the same tender. To address this, we'll remove all duplicated tenders.