A Tennis dataset and models for event detection & commentary generation. Discussed in:
“TenniSet: A Dataset for Dense Fine-Grained Event Recognition, Localisation and Description”
The tennis dataset consists of 5 matches and has manually annotated temporal events and commentary captions.
Type | Attributes | # Events | # Frames | Avg. Frames per Event |
match |
winner | 5 | 786,455 | 157,291 |
set |
winner, score | 11 | 765,738 | 69,613 |
game |
winner, score, server | 118 | 588,759 | 4,989 |
point |
winner, score | 746 | 159,494 | 214 |
serve |
near/far, in/fault/let | 1,017 | 68,385 | 67 |
hit |
near/far, left/right | 2,551 | 73,564 | 29 |
Due to the limited size of the dataset, there are two varieties of train, validation and testing splits. The first (01) uses the the entire V010 as the validation and test while the second (02) splits across all videos evenly.
Class | # Events – S01 |
# Frames – S01 |
# Events – S02 |
# Frames – S02 |
|||||||||
train |
val |
test |
train |
val |
test |
train |
val |
test |
train |
val |
test |
||
OTH |
2,507 | 133 | 198 | 573,394 | 28,538 | 49,648 | 2,079 | 160 | 608 | 470,963 | 36,932 | 143,685 | |
SFF |
342 | 11 | 29 | 20,114 | 772 | 1,925 | 296 | 22 | 64 | 17,716 | 1,402 | 3,693 | |
SFF |
117 | 2 | 5 | 7,962 | 153 | 333 | 95 | 7 | 22 | 6,430 | 577 | 1,441 | |
SFL |
25 | 0 | 1 | 1,596 | 0 | 72 | 21 | 1 | 4 | 1,380 | 38 | 250 | |
SNI |
293 | 24 | 29 | 17,186 | 1,762 | 1,994 | 242 | 18 | 86 | 14,876 | 992 | 5,074 | |
SNF |
111 | 7 | 10 | 7,312 | 578 | 772 | 88 | 8 | 32 | 6,020 | 473 | 2,169 | |
SNL |
10 | 2 | 0 | 656 | 126 | 0 | 9 | 1 | 2 | 543 | 65 | 174 | |
HFL |
533 | 22 | 45 | 16,520 | 648 | 1,419 | 432 | 33 | 135 | 13,530 | 1,037 | 4020 | |
HFR |
576 | 39 | 41 | 16,858 | 1,096 | 1,150 | 474 | 37 | 145 | 13,878 | 1,037 | 4,189 | |
HNL |
602 | 29 | 39 | 16,196 | 811 | 1,076 | 514 | 37 | 119 | 13,879 | 1,036 | 3,168 | |
HNR |
546 | 31 | 48 | 15,605 | 882 | 1,303 | 448 | 33 | 144 | 12,686 | 920 | 4,184 |
There is one commentary style caption for each of the 746 points, as well as another 10817 captions not aligned to any imagery. Some examples are:
Point ID | Caption |
P00000001 |
high kick serve fp returns a ls return short rally fp cross-court rs lands out-side the court |
P00000012 |
quick serve is an ace |
P00000036 |
np serves down the t fp returns a ls return brief rally np fails to keep a cross-court ls in the play |
P00000051 |
np hits a good serve fp struggles with it returning it long |
P00000155 |
cannon serve down the t is an ace |
P00000172 |
sharp angled slice serve np returns a rs return fp whips a rs cross-court winner |
Both groups of captions are utilised to generate a word embedding for the 250 unique words in the vocabulary. The embedding is generated utilising a SkipGram model. Below the 100 dimensional word embedding is visualised post t-SNE.
The main data can be downloaded from my Google Drive with the links below. The directory structure should be:
Tennis/ └── data/ ├── annotations (9.5 MB) ├── features (13.2 GB) ├── flow (217 GB) ├── frames (217 GB) ├── splits (36.3 MB) └── videos (11.1 GB)
annotations
stores .json
files for each video generated by the annotator, as well as other annotation and commentary .txt
files.features
stores .npy
feature files for frames for each video in subdirectories.flow
stores .jpg
image files for flow frames for each video in subdirectories.frames
stores .jpg
image files for RGB frames for each video in subdirectories.splits
stores .txt
files for each split (train, val, test).videos
stores the original video files as .mp4
files.More information can be found on the Github.
Models can be downloaded from my Google Drive. More information can be found on the Github.
I experimented with a number of models to determine the framewise event class:
The table below shows the F1 scores per class on the test set for some of the different models:
Model | Classwise F1 Score | ||||||||||||
OTH |
SFI |
SFF |
SFL |
SNI |
SNF |
SNL |
HFL |
HFR |
HNL |
HNR |
AVG | ||
Framewise CNN | 0006 |
97.0 | 57.9 | 17.7 | 13.0 | 62.9 | 21.6 | 0.0 | 74.8 | 76.3 | 77.5 | 78.0 | 52.4 |
Two-Stream Nets | 0010 |
97.2 | 67.4 | 14.6 | 13.4 | 67.0 | 19.4 | 0.0 | 81.8 | 83.5 | 79.0 | 86.2 | 55.4 |
R(2+1)D | 0031 |
90.8 | 24.4 | 6.4 | 1.7 | 37.4 | 3.9 | 0.0 | 39.6 | 44.9 | 43.7 | 41.8 | 30.4 |
Temporal Pooling | 0028 |
97.5 | 62.0 | 19.6 | 14.1 | 65.6 | 21.6 | 0.0 | 77.1 | 78.9 | 81.0 | 80.3 | 54.3 |
CNN-RNN | 0042 |
97.6 | 65.0 | 13.4 | 13.5 | 66.2 | 27.9 | 0.0 | 80.6 | 83.0 | 80.3 | 84.8 | 55.7 |
The captioning model is that from Google.
BLEU@1 | BLEU@2 | BLEU@3 | BLEU@4 | METEOR | ROUGE-L | CIDEr |
46.7 | 307 | 22.1 | 16.4 | 22.6 | 43.9 | 96.4 |
The table below shows some example generated captions on the test split, the underline marks errors.
Caption Ground Truths (G) and Predictions (P) | ||
01 |
G P |
“high kick serve fp returns a ls return short rally fp cross-court rs lands out-side the court” “fp serves a good one np delivers a rs return fp sends a ls out of the court” |
02 |
G P |
“good serve aimed at t np only reaches to it hitting the return long” “fp arrows a good serve at t np is unable to return it” |
03 |
G P |
“good serve in the middle np returns a quick ls return short rally np cross-court fails to clear the net in the middle“ “good serve in the middle np crafts a ls return fp cross-court ls fails to land inside the court“ |
04 |
G P |
“fp serves a high kick serve np returns a quick ls return brief rally np rs catches the net“ “fine serve placed out wide np returns a ls return short rally fp strokes a rs cross-court winner“ |
05 |
G P |
“quick serve np crafts a rs return fp goes for a ls down the line but catches the net“ “fine serve np shoots a rs return winner“ |
06 |
G P |
“double fault” “double fault” |
07 |
G P |
“good serve np generates a rs return fp then returns one into the net” “good serve in the middle np returns a ls return fp cross-court rs catches the net” |
08 |
G P |
“fp serves a high kick serve np delivers a high ls return fp produces a ls winner coming to net“ “fp serves a good one np returns a quick rs return fp struggles to keep a cross-court rs in a rally“ |
09 |
G P |
“quick serve np returns a quick rs return fp ls is unable to clear the net“ “nice serve by fp np faces difficulty in returning it“ |
10 |
G P |
“fp aims a high kick serve np returns a ls return fp hits a rs cross-court winner“ “fp serves a good one np returns a quick rs return fp struggles to keep a cross-court rs in a rally“ |
11 |
G P |
“good serve in the middle np crafts a ls return short rally np hits a rs cross-court but it fails to clear the net“ “good serve in the middle np crafts a ls return fp cross-court ls fails to land inside the court“ |
12 |
G P |
“fp hits a bodyline serve np has no answer to it” “fp arrows a good serve at t np is unable to return it” |
13 |
G P |
“double fault“ “good serve aimed at t np only reaches to it“ |
14 |
G P |
“double fault” “double fault” |
15 |
G P |
“fp aims a high kick serve np crafts a ls return good rally fp sends a rs cross-court out of the court” “good serve in the middle np crafts a ls return fp cross-court ls fails to land inside the court” |
16 |
G P |
“fp arrows a bodyline serve np struggles with it” “fp arrows a good serve at t np is unable to return it” |