我正在处理一个包含大约10万行每日篮球统计数据的数据集。我的项目目标是根据相同的约束条件,汇总每天的前9名得分手。当我在数据集的精简版本(仅限于一个日期)上运行以下代码时,我能够编译出最佳组合。
单一日期数据集
from pulp import *import numpy as npimport pandas as pddateparse = lambda dates: pd.datetime.strptime(dates, '%m/%d/%Y')players = pd.read_csv("Date.csv", parse_dates=['DATE'], index_col='DATE',date_parser=dateparse)players["PG"] = (players["POSITION"] == 'PG').astype(float)players["SG"] = (players["POSITION"] == 'SG').astype(float)players["SF"] = (players["POSITION"] == 'SF').astype(float)players["PF"] = (players["POSITION"] == 'PF').astype(float)players["C"] = (players["POSITION"] == 'C').astype(float)players["SALARY"] = players["SALARY"].astype(float)model = LpProblem("problem", LpMaximize)total_points = {}cost = {}PG = {}SG = {}SF = {}PF = {}C = {}number_of_players = {}for i, player in players.iterrows(): var_name = 'x' + str(i) # Create variable name decision_var = pulp.LpVariable(var_name, cat='Binary') # Initialize Variables total_points[decision_var] = player["POINTS"] # Create PPG Dictionary cost[decision_var] = player["SALARY"] # Create Cost Dictionary # Create Dictionary for Player Types PG[decision_var] = player["PG"] SG[decision_var] = player["SG"] SF[decision_var] = player["SF"] PF[decision_var] = player["PF"] C[decision_var] = player["C"] number_of_players[decision_var] = 1.0objective_function = pulp.LpAffineExpression(total_points)model += objective_function#Define cost constraint and add it to the modeltotal_cost = pulp.LpAffineExpression(cost)model += (total_cost <= 60000)PG_constraint = pulp.LpAffineExpression(PG)SG_constraint = pulp.LpAffineExpression(SG)SF_constraint = pulp.LpAffineExpression(SF)PF_constraint = pulp.LpAffineExpression(PF)C_constraint = pulp.LpAffineExpression(C)total_players = pulp.LpAffineExpression(number_of_players)model += (PG_constraint <= 2)model += (SG_constraint <= 2)model += (SF_constraint <= 2)model += (PF_constraint <= 2)model += (C_constraint <= 1)model += (total_players <= 9)model.solve()
当我的数据集只包含一个日期时,这段代码能够正常工作并解决问题。我希望能够将所有相关日期添加到范围内,并让优化器遍历每个日期,找出每一天的最佳组合。
我尝试创建一个循环,为每一天生成约束条件,但这导致了错误
from datetime import datetimestart_date = "2021-01-25"stop_date = "2021-01-27"start = datetime.strptime(start_date, "%Y-%m-%d")stop = datetime.strptime(stop_date, "%Y-%m-%d")from datetime import timedeltawhile start < stop: objective_function = pulp.LpAffineExpression(total_points) model += objective_function#Define cost constraint and add it to the model total_cost = pulp.LpAffineExpression(cost) model += (total_cost <= 60000)
是否有可能在一个数据集中为多个日期分配相同的目标和约束?这样做的结果将是为数据集中的每个日期找到9名最佳球员。
回答:
只需创建你的日期列表,然后遍历这些日期,并将按日期过滤后的数据框输入到你的求解器中:
from pulp import *import numpy as npimport pandas as pddateparse = lambda dates: pd.datetime.strptime(dates, '%m/%d/%Y')players = pd.read_csv("Date.csv", parse_dates=['DATE'], index_col='DATE',date_parser=dateparse)players["PG"] = (players["POSITION"] == 'PG').astype(float)players["SG"] = (players["POSITION"] == 'SG').astype(float)players["SF"] = (players["POSITION"] == 'SF').astype(float)players["PF"] = (players["POSITION"] == 'PF').astype(float)players["C"] = (players["POSITION"] == 'C').astype(float)players["SALARY"] = players["SALARY"].astype(float)players = players.reset_index()date_list = list(set(players['DATE'])) #<-- create the list of dates in the dataframedate_list.sort() #<-- sort the datesfor var_date in date_list: rows = [] model = LpProblem("problem", LpMaximize) filter_players = players[players['DATE'] == var_date].reset_index(drop=True) #<-- filter by date total_points = {} cost = {} PG = {} SG = {} SF = {} PF = {} C = {} number_of_players = {} for i, player in filter_players.iterrows(): #<--then run on that filtered dataframe var_name = 'x' + str(i) # Create variable name decision_var = pulp.LpVariable(var_name, cat='Binary') # Initialize Variables total_points[decision_var] = player["POINTS"] # Create PPG Dictionary cost[decision_var] = player["SALARY"] # Create Cost Dictionary # Create Dictionary for Player Types PG[decision_var] = player["PG"] SG[decision_var] = player["SG"] SF[decision_var] = player["SF"] PF[decision_var] = player["PF"] C[decision_var] = player["C"] number_of_players[decision_var] = 1.0 objective_function = pulp.LpAffineExpression(total_points) model += objective_function #Define cost constraint and add it to the model total_cost = pulp.LpAffineExpression(cost) model += (total_cost <= 60000) PG_constraint = pulp.LpAffineExpression(PG) SG_constraint = pulp.LpAffineExpression(SG) SF_constraint = pulp.LpAffineExpression(SF) PF_constraint = pulp.LpAffineExpression(PF) C_constraint = pulp.LpAffineExpression(C) total_players = pulp.LpAffineExpression(number_of_players) model += (PG_constraint <= 2) model += (SG_constraint <= 2) model += (SF_constraint <= 2) model += (PF_constraint <= 2) model += (C_constraint <= 1) model += (total_players <= 9) model.solve() row = {} for v in model.variables(): if v.varValue: idx = int(v.name.replace('x','')) player_name = filter_players.iloc[idx]['PLAYER'] position = filter_players.iloc[idx]['POSITION'] salary = filter_players.iloc[idx]['SALARY'] points = filter_players.iloc[idx]['POINTS'] row = {'DATE':var_date, 'PLAYER':player_name, 'POSITION':position, 'POINTS':points, 'SALARY':salary} rows.append(row) lineup = pd.DataFrame(rows) lineup.loc[9,'PLAYER'] = '' lineup.loc[9,'POSITION'] = 'Total' lineup.loc[9,'POINTS'] = lineup['POINTS'].sum(axis=0) lineup.loc[9,'SALARY'] = lineup['SALARY'].sum(axis=0) print ('Lineup for: %s' %var_date.strftime('%m/%d/%Y')) print (lineup.iloc[:,1:]) print ('\n\n')