Tennis

A Tennis dataset and models for event detection & commentary generation. Discussed in:
“TenniSet: A Dataset for Dense Fine-Grained Event Recognition, Localisation and Description”

The Dataset

The tennis dataset consists of 5 matches and has manually annotated temporal events and commentary captions.

Type	Attributes	# Events	# Frames	Avg. Frames per Event
`match`	winner	5	786,455	157,291
`set`	winner, score	11	765,738	69,613
`game`	winner, score, server	118	588,759	4,989
`point`	winner, score	746	159,494	214
`serve`	near/far, in/fault/let	1,017	68,385	67
`hit`	near/far, left/right	2,551	73,564	29

Splits

Due to the limited size of the dataset, there are two varieties of train, validation and testing splits. The first (01) uses the the entire V010 as the validation and test while the second (02) splits across all videos evenly.

Class	# Events – `S01`			# Frames – `S01`			# Events – `S02`			# Frames – `S02`
	`train`	`val`	`test`	`train`	`val`	`test`	`train`	`val`	`test`	`train`	`val`	`test`
`OTH`	2,507	133	198	573,394	28,538	49,648	2,079	160	608	470,963	36,932	143,685
`SFF`	342	11	29	20,114	772	1,925	296	22	64	17,716	1,402	3,693
`SFF`	117	2	5	7,962	153	333	95	7	22	6,430	577	1,441
`SFL`	25	0	1	1,596	0	72	21	1	4	1,380	38	250
`SNI`	293	24	29	17,186	1,762	1,994	242	18	86	14,876	992	5,074
`SNF`	111	7	10	7,312	578	772	88	8	32	6,020	473	2,169
`SNL`	10	2	0	656	126	0	9	1	2	543	65	174
`HFL`	533	22	45	16,520	648	1,419	432	33	135	13,530	1,037	4020
`HFR`	576	39	41	16,858	1,096	1,150	474	37	145	13,878	1,037	4,189
`HNL`	602	29	39	16,196	811	1,076	514	37	119	13,879	1,036	3,168
`HNR`	546	31	48	15,605	882	1,303	448	33	144	12,686	920	4,184

Captions

There is one commentary style caption for each of the 746 points, as well as another 10817 captions not aligned to any imagery. Some examples are:

Point ID	Caption
`P00000001`	high kick serve fp returns a ls return short rally fp cross-court rs lands out-side the court
`P00000012`	quick serve is an ace
`P00000036`	np serves down the t fp returns a ls return brief rally np fails to keep a cross-court ls in the play
`P00000051`	np hits a good serve fp struggles with it returning it long
`P00000155`	cannon serve down the t is an ace
`P00000172`	sharp angled slice serve np returns a rs return fp whips a rs cross-court winner

Both groups of captions are utilised to generate a word embedding for the 250 unique words in the vocabulary. The embedding is generated utilising a SkipGram model. Below the 100 dimensional word embedding is visualised post t-SNE.

Download

The main data can be downloaded from my Google Drive with the links below. The directory structure should be:

      Tennis/
      └── data/
          ├── annotations (9.5 MB)
          ├── features (13.2 GB)
          ├── flow (217 GB)
          ├── frames (217 GB)
          ├── splits (36.3 MB)
          └── videos (11.1 GB)

annotations stores .json files for each video generated by the annotator, as well as other annotation and commentary .txt files.
features stores .npy feature files for frames for each video in subdirectories.
flow stores .jpg image files for flow frames for each video in subdirectories.
frames stores .jpg image files for RGB frames for each video in subdirectories.
splits stores .txt files for each split (train, val, test).
videos stores the original video files as .mp4 files.

More information can be found on the Github.

The Models

Models can be downloaded from my Google Drive. More information can be found on the Github.

Event Detection

I experimented with a number of models to determine the framewise event class:

Framewise CNN – A DenseNet-121 model ran on individual frames (uses no temporal information)
Two-Stream Nets – Two DenseNet-121 CNNs, one for flow and one for RGB
R(2+1)D CNN – A R(2+1)D model with a temporal window of 8 frames
Temporal Pooling – Temporal max pooling on the original framewise model over a window of 15 frames
CNN-RNN – Applies a GRU RNN across the original framewise model

The table below shows the F1 scores per class on the test set for some of the different models:

Model		Classwise F1 Score
		`OTH`	`SFI`	`SFF`	`SFL`	`SNI`	`SNF`	`SNL`	`HFL`	`HFR`	`HNL`	`HNR`	AVG
Framewise CNN	`0006`	97.0	57.9	17.7	13.0	62.9	21.6	0.0	74.8	76.3	77.5	78.0	52.4
Two-Stream Nets	`0010`	97.2	67.4	14.6	13.4	67.0	19.4	0.0	81.8	83.5	79.0	86.2	55.4
R(2+1)D	`0031`	90.8	24.4	6.4	1.7	37.4	3.9	0.0	39.6	44.9	43.7	41.8	30.4
Temporal Pooling	`0028`	97.5	62.0	19.6	14.1	65.6	21.6	0.0	77.1	78.9	81.0	80.3	54.3
CNN-RNN	`0042`	97.6	65.0	13.4	13.5	66.2	27.9	0.0	80.6	83.0	80.3	84.8	55.7

Captioning

The captioning model is that from Google.

BLEU@1	BLEU@2	BLEU@3	BLEU@4	METEOR	ROUGE-L	CIDEr
46.7	307	22.1	16.4	22.6	43.9	96.4

The table below shows some example generated captions on the test split, the underline marks errors.

	Caption Ground Truths (G) and Predictions (P)
`01`	G P	“high kick serve fp returns a ls return short rally fp cross-court rs lands out-side the court” “fp serves a good one np delivers a rs return fp sends a ls out of the court”
`02`	G P	“good serve aimed at t np only reaches to it hitting the return long” “fp arrows a good serve at t np is unable to return it”
`03`	G P	“good serve in the middle np returns a quick ls return short rally np cross-court fails to clear the net in the middle“ “good serve in the middle np crafts a ls return fp cross-court ls fails to land inside the court“
`04`	G P	“fp serves a high kick serve np returns a quick ls return brief rally np rs catches the net“ “fine serve placed out wide np returns a ls return short rally fp strokes a rs cross-court winner“
`05`	G P	“quick serve np crafts a rs return fp goes for a ls down the line but catches the net“ “fine serve np shoots a rs return winner“
`06`	G P	“double fault” “double fault”
`07`	G P	“good serve np generates a rs return fp then returns one into the net” “good serve in the middle np returns a ls return fp cross-court rs catches the net”
`08`	G P	“fp serves a high kick serve np delivers a high ls return fp produces a ls winner coming to net“ “fp serves a good one np returns a quick rs return fp struggles to keep a cross-court rs in a rally“
`09`	G P	“quick serve np returns a quick rs return fp ls is unable to clear the net“ “nice serve by fp np faces difficulty in returning it“
`10`	G P	“fp aims a high kick serve np returns a ls return fp hits a rs cross-court winner“ “fp serves a good one np returns a quick rs return fp struggles to keep a cross-court rs in a rally“
`11`	G P	“good serve in the middle np crafts a ls return short rally np hits a rs cross-court but it fails to clear the net“ “good serve in the middle np crafts a ls return fp cross-court ls fails to land inside the court“
`12`	G P	“fp hits a bodyline serve np has no answer to it” “fp arrows a good serve at t np is unable to return it”
`13`	G P	“double fault“ “good serve aimed at t np only reaches to it“
`14`	G P	“double fault” “double fault”
`15`	G P	“fp aims a high kick serve np crafts a ls return good rally fp sends a rs cross-court out of the court” “good serve in the middle np crafts a ls return fp cross-court ls fails to land inside the court”
`16`	G P	“fp arrows a bodyline serve np struggles with it” “fp arrows a good serve at t np is unable to return it”