Mengyue Wang

Mengyue Wang

Master's Thesis

CAD2Image: Image Synthesis using CAD Models to Augment Training Data

Philipp Schlieper (M.Sc.), Prof. Dr. Björn Eskofier

03/2020 – 08/2020

Ambiguity in regard to classification problems often occurs in manufacturing due to the large number of mechanical components involved [1]. Recent work in industry investigated the utilisation of deep learning to identify components based on images [2]. Deep learning frameworks are able to learn feature representations and perform classification automatically from images [3]. However, a large dataset is crucial to the performance of a deep learning model [4]. Presently, most publicly available datasets are obtained from real scenes [5]. However, manual annotation of image datasets is time-consuming. Moreover, understanding a component category not only depends on the shape of its elements, but also on its usage within the product [6]. This encourages the use of synthetic data due to the controlled setting process and ability to automatically produce annotations along with data. For decades, synthetic data has been used for benchmarking purposes [7]. With recent progress in deep learning, synthetic data has become increasingly popular for training models [8, 9, 10, 11]. Also, a GAN-based network trained on synthetic and real images has been shown to deliver outstanding performance when simulating real images (Shrivastava et al. [12]). Papon et al. [13] consider the estimation of the semantic pose problem in indoor scenes. In their work, the deep network trained on randomly generated synthetic indoor scenes produced excellent results when transferred to real test data. There is also a large body of work that used synthetic data for object detection [14, 15, 16]. From the state of the art mentioned above, there is still a lack of synthetic mechanical components datasets for training image classification models. The widespread use of CAD tools and the large datasets of component models they produce makes it possible to accelerate the collection of synthetic data of components [6].

Therefore, this thesis considers a practical approach to augmenting training data for mechanical components classification tasks by utilising synthetic images of 3D printable components during the training process. To achieve this, large-scale datasets of real and synthetic images of 3D printable components will be produced and evaluated by deep learning classifiers. Based on the evaluation, the classifier performance in regard to minimising the amount of real images during the training process can be optimised by reweighting the ratio of real and synthetic samples or improving the realism of synthetic images using generative approaches.
The work will consist of the following main steps:

  1. Reviewing the literature of synthetic data for machine learning frameworks.
  2. Collecting and printing publicly available 3D printable models of simple mechanical components (e.g., screws, nuts, and washers)
  3. Recording real dataset of the 3D printable components. Utilising the Blender 3D software toolset to produce synthetic images of the same instant of components and annotating them automatically.
  4. Evaluating classifier (e.g., Classifier A offered by Schaeffler AG, Xception [17]) performance on different distributions of real and synthetic images.
  5. Optimising classifier performance regarding minimising the amount of real images in the training process by reweighting samples or using SimGAN [12] or other generative models to improve the realism of synthetic images.

[1] “Standard Part Classification Systems: An Incomplete Solution”, CADENAS PARTsolutions, 2020. [Online]. Available: [Accessed: 14- Feb- 2020].
[2] S. D. G. Smith, R. Escobedo, M. Anderson and T. P. Caudell, “A deployed engineering design retrieval system using neural networks,” in IEEE Transactions on Neural Networks, vol. 8, no. 4, pp. 847-851, July 1997.
[3] A. Tuama, F. Comby and M. Chaumont, “Camera model identification with the use of deep convolutional neural networks,” 2016 IEEE International Workshop on Information Forensics and Security (WIFS), 2016, pp. 1-6.
[4] J. Prusa, T. M. Khoshgoftaar and N. Seliya, “The Effect of Dataset Size on Training Tweet Sentiment Classifiers,” 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA), 2015, pp. 96-102.
[5] X. Li, K. Wang, Y. Tian, L. Yan, F. Deng and F. Wang, “The ParallelEye Dataset: A Large Collection of Virtual Images for Traffic Vision Research,” in IEEE Transactions on Intelligent Transportation Systems, vol. 20, no. 6, pp. 2072-2084, June 2019.
[6] M. Rucco, F. Giannini, K. Lupinetti and M. Monti, “A methodology for part classification with supervised machine learning”, Artificial Intelligence for Engineering Design, Analysis and Manufacturing, vol. 33, no. 1, pp. 100-113, 2018.
[7] N. Mayer et al., “What Makes Good Synthetic Training Data for Learning Disparity and Optical Flow Estimation?”, International Journal of Computer Vision, vol. 126, no. 9, pp. 942-960, 2018. [8] L. Lindner, D. Narnhofer, M. Weber, C. Gsaxner, M. Kolodziej and J. Egger, “Using Synthetic Training Data for Deep Learning-Based GBM Segmentation,” 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Berlin, Germany, 2019, pp. 6724-6729.
[9] A. Gaidon, Q. Wang, Y. Cabon and E. Vig, “VirtualWorlds as Proxy for Multi-object Tracking Analysis,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016, pp. 4340-4349.
[10] A. Handa, V. Patraucean, V. Badrinarayanan, S. Stent and R. Cipolla, “Understanding RealWorld Indoor Scenes with Synthetic Data,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), NV, 2016, pp. 4077-4085.
[11] C. de Souza, A. Gaidon, Y. Cabon and A. Peña, “Procedural Generation of Videos to Train Deep Action Recognition Networks”,, 2020. [Online]. Available: [Accessed: 14- Feb- 2020].
[12] A. Shrivastava, T. Pfister, O. Tuzel, J. Susskind, W. Wang and R. Webb, “Learning from Simulated and Unsupervised Images through Adversarial Training”,, 2020. [Online]. Available: [Accessed: 14- Feb- 2020].
[13] J. Papon and M. Schoeler, “Semantic Pose Using Deep Networks Trained on Synthetic RGB-D,” 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, 2015, pp. 774-782.
[14] X. Peng, B. Sun, K. Ali and K. Saenko, “Learning Deep Object Detectors from 3D Models,” 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, 2015, pp. 1278-1286.
[15] E. Bochinski, V. Eiselein and T. Sikora, “Training a convolutional neural network for multi-class object detection using solely virtual world data,” 2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Colorado Springs, CO, 2016, pp. 278-285. [16] P. Rajpura, H. Bojinov and R. Hegde, “Object Detection Using Deep CNNs Trained on Synthetic Images”,, 2020. [Online]. Available: [Accessed: 14- Feb- 2020].
[17] F. Chollet, “Xception: Deep Learning with Depthwise Separable Convolutions”,, 2020. [Online]. Available: [Accessed: 14- Feb- 2020].