Meta Self-Learning for Multi-Source Domain Adaptation: A Benchmark
This paper introduces an a new method for domain adaptation through meta self-learning approach.
Apr 19, 2022
Introduction
This paper introduces an a new method for domain adaptation through meta self-learning approach. The experiments showed a noticeable difference in final performance comparing to similar methods. Also authors introduced a new large-scale (chinese & english) dataset for multi-domain benchmark and experiments.
Contributions
- Collected a multi-source domain adaptation dataset for text recognition with over 5 million images from 5 different domains. The first multi-domain adaptation dataset for text recognition (according to the authors).
- Proposed a new self-learning framework for multisource domain adaptation, which is effective and can be easily fit into any Multi-Domain Adaptation and self-learning problem.
- Experiments are conducted on new dataset, which provides a benchmark and show the effectiveness of proposed method.
Related Works
- Text Recognition
- 4-staged STR network similar to Naver: TPS/STN + CNN + BiLSTM/GRU + CTC/Attention
- Domain adaptation
- Discrepancy based domain adaptation
- Uses a domain confusion loss by calculating the maximum mean discrepancy (MMD) between the source domain data and the target domain data.
- Adversarial training based domain adaptation
- Uses a domain discriminator and gradient reversal layer to separate the feature extractor and the domain discriminator, which forces the feature extractor to extract the domain-invariant feature.
- Self-training-based domain adaptation
- The method trains the model iteratively by generating pseudo-label of target data and adding them into the training data.
- Meta-Learning
- Model-Agnostic Meta-Learning (MAML)
- Reptile: A Scalable Meta-Learning Algorithm
- Online Meta-Learning
- Self-Learning
- Predict labels for the unlabeled data using the model trained on source domains and take them as correct labels if the predict confidence is higher than a threshold. The self-learning method can always bring considerable improvement because of the direct use of target domain data.
- Synthetic domain
- 1,110,620 images in total
- Handwritten domain
- The data of the handwritten domain is generated using the images in CASIA Online and Offline Chinese Handwriting Databases. 1,897,021 images in total for handwritten images
- Document domain
- The data of document domain is collected from an open-source projects and the dataset contains about 3 million images. Authors filtered out the images that contains characters not in character set and got 1,710,885 images in total.
- Street view domain
- Merged from: SVT, SVT perspective, ICDAR2013, ICDAR2015, RCTW17, ICDAR-2019, and CUTE80 datasets → 199,346 images
- Car plate domain
- Re-balanced CCPD dataset → 207,928 images in total
- Warm-Up and Generation of Pseudo-Labels:
- The model will first be trained on DS as the warm-up phase. Warm-up is a necessary process for the self-learning method, and this process will greatly improve the quality of the generated pseudo-label and lead to a better result. Without warm-up process, the generated pseudo-labels will either have low confidence or wrong content, which will greatly jeopardize the predict accuracy on the target domain. After the warmup, the target data with pseudo-label will be generated.
- Random Split: The usage of the pseudo-label is one of the most important issue. As the raw pseudo-label can be noisy, a meta-update is used in our method. During the meta-update, both source domain data and target domain data with pseudo-lables will be used, and are divided randomly into meta-train set and meta-test set, which corresponds to the support set and query set in vanilla MAML.
- Meta update loss:
- Train:
- Test:
Multi-Domain Text Recognition Dataset
Proposed multi-domain text dataset consists of 5,209,215 images in total, and is divided into five domains, which are:
The character set size is set to 3,816, with 3,754 common Chinese characters and 62 alphanumeric characters.
Meta Self-Learning algorithm
Experiment Results
- Experiment settings:
- PyTorch on Tesla T4
- Adam is used as the outer optimizer, and SGD is used as the meta optimizer
- α is set to 1e − 3
- β and γ are changed during the training process
- Input size:
- 100 × 32
- Character set in these experiments is set to 3818, which includes:
- 3756 common Chinese characters
- 62 alphanumeric characters.
- Results:
- Baseline:
- The baseline model is trained with only source domains without any multi-source domain adaptation methods.
- MLDG:
- During the training, the source domains are divided into meta-train set and meta-test set. The model will first update one step using the meta-train set and then validate on the meta-test set. The final model converged on source domains will be deployed on the truly held-out target domain.
- Pseudo-Label:
- As the warm-up is a necessary step for the pseudo-label method, we use the baseline model as the pre-trained model and start training using pseudo-label directly on it.
- Pseudo-labels score threshold:
- Handwritten: 0.98
- Others: 0.9
- Meta Self-Learning (paper proposal):
- Same setting with the pseudo-label method
- Pseudo-labels usage settings experiments:
- IAOS: Use all 5 domains during the meta-update, and only use source domains during the outer optimization.
- IPOA: Use the pseudo-label domain as meta-test set only during meta-update and use all five domains during outer optimization.
- IPOP: has the same setting with IPOA during meta-update, while only use images with pseudo-label during the outer optimization.
- A fresh approach/direction for domain shift problem solution in text recognition area
- Large Multi-Domain Chinese Text Recognition dataset
- Application in for Lomin Textscope: documents with different domains???
- Self-Learning
- Online-Learning
NOTE: Best results are received from different pseudo-labels usage settings (see explanations below)
Conclusion
Share article