python attheraces.com data scraping

I have been working on scraping the data from attheraces.com for a project I am working on which combines this and betfair data for some statistical analysis.

In case this is useful to anyone here is the regex I have written,

meetings = re.findall(“<h5 id=\”fastfixhead(\d+)\”.*\”>(\w+)<\/a><\/h5>”, info)
for meeting in meetings:
times = re.findall(“<li class=\” \”><a href=\”/card.aspx\?raceid=(\d+)&meetingid=(“+meeting[0]+”)&date=([0-9]{4}-[0-9]{2}-[0-9]{2})&amp.*<strong>([0-9]{2}:[0-9]{2})<\/strong> – ([0-9+][f|m].*)\((\d+) run”, info)

This gathers the course name, raceid, meetingid, date, race time, distance and number of runners. It only gathers UK data (by only matching alphanumeric and underscores it strips non UK codes) so replace “\w+” with “.*” for the meetings match to include other countries.

Sample output,

Wolverhampton

RaceID: 741858
Date: 57674,
Race Time: 2013-01-14
Distance: 13:50,
Number of runners: 7f 32y
RaceID: 741859
Date: 57674,
Race Time: 2013-01-14
Distance: 14:20,
Number of runners: 5f 216y
RaceID: 741860
Date: 57674,
Race Time: 2013-01-14
Distance: 14:50,
Number of runners: 1m 4f 50y
RaceID: 741861
Date: 57674,
Race Time: 2013-01-14
Distance: 15:20,
Number of runners: 1m 141y
RaceID: 741862
Date: 57674,
Race Time: 2013-01-14
Distance: 15:50,
Number of runners: 1m 141y
RaceID: 741863
Date: 57674,
Race Time: 2013-01-14
Distance: 16:20,
Number of runners: 1m 1f 103y
RaceID: 741864
Date: 57674,
Race Time: 2013-01-14
Distance: 16:50,
Number of runners: 1m 1f 103y

Leave a Reply Cancel reply