Association Rule Mining Data Preparation


ARM Data Preparation

First, calculate the saving goals rate(rate_saved) from clean_laliga_playerDF, and select variables of season, teamName, position, game_minutes, rate_saved, and rating to generate a new dataframe. Second, calculate the team’s points for each season(PTS) from the clean_laliga_teamDF CSV file and extract the columns of season, teamName, and PTS to generate a new dataframe. Then, using these two new datafram and merge them with teamName and season as ID. Next, the data of selecting only the goalkeeper’s position and picking the goalkeeper’s playing time is greater than 95 minutes (90 minutes is 1 game, so less than 95 has no reference significance, and it may directly exclude). After that, generating a new dataframe prepare for the next step. (As shown in the following figure, the first stage of data preparation is done)

First Stage of Data Prepartion

By following the above, based on the first stage, select variables of game_minutes, rating, rate_saved, and PTS to generate a new dataframe. Since these data are numeric data, so they need to be converted to transaction data.

The conversion rules are as follows:

Coversion Rules

Converting the dataset to transaction data by following the rules above. Saving this dataframe in CSV format for exploring association rules.

Transcation Data


Resource:

Data Paparation CodeR Code
Transcation DataCSV File


  TOC