python load betfair data to mysql

The script below will import the betfair data from data.betfair.com into a mySQL table.  First create the table,

CREATE TABLE history (
sports_id int(11) DEFAULT NULL,
event_id int(11) DEFAULT NULL,
country varchar(10) DEFAULT NULL,
scheduled_off varchar(30) DEFAULT NULL,
actual_off varchar(30) DEFAULT NULL,
full_description varchar(150) DEFAULT NULL,
odds double DEFAULT NULL,
settled_date varchar(30) DEFAULT NULL,
first_taken varchar(30) DEFAULT NULL,
latest_taken varchar(30) DEFAULT NULL,
course varchar(150) DEFAULT NULL,
volume_matched double DEFAULT NULL,
number_bets double DEFAULT NULL,
event varchar(50) DEFAULT NULL,
in_play varchar(2) DEFAULT NULL,
selection varchar(50) DEFAULT NULL,
selection_id int(11) DEFAULT NULL,
win_flag varchar(1) DEFAULT NULL,
KEY sports_id (sports_id),
KEY event_id (event_id),
KEY odds (odds),
KEY win_flag (win_flag),
KEY descr (full_description)
) ENGINE=InnoDB DEFAULT

Then use the following script,

#!/usr/bin/python
import csv
import MySQLdb
import re
import os

path=”/path/to/files”

db = MySQLdb.connect(host=’localhost’,
user=’user’,
passwd=’pass’,
db=’betfair’)

cursor = db.cursor()
files = os.listdir(path)

for filename in files:
csv_data = csv.reader(file(path+”/”+filename))
for row in csv_data:
cursor.execute(“INSERT into history(sports_id, event_id, settled_date, country, full_description, course, scheduled_off, event, actual_off, selection_id, selection, odds, number_bets, volume_matched, latest_taken, first_taken, win_flag, in_play) values (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)”, row)

db.commit()
cursor.close()

Adjust the path at the to to the location of the raw csv(uncompressed) files that you would like loaded into mySQL.
Edit the database information
Run the script

You can also add filters, for example when processing horse racing data I wanted only the following,

– UK races
– Bets placed before the event (IE not in running)
– Exclude hurdles

This is what I used

for row in csv_data:
if row[1] != “EVENT_ID” and row[3] == “GB” and row[17] == “PE” and “PLACED” not in row[7] and “Hrd” not in row[7]:
if re.search(‘[0-9]f’, row[7]):
cursor.execute(“INSERT into history(sports_id, event_id, settled_date, country, full_description, course, scheduled_off, event, actual_off, selection_id, selection, odds, number_bets, volume_matched, latest_taken, first_taken, win_flag, in_play) values (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)”, row)