Do you want to stay informed about this journal? Click the buttons to subscribe to our alerts.
This paper explores the automated recognition of objects and materials and their relation to depictions in images of all kinds: photographs, artwork, doodles by children, and any other visual representation. The way artists of all cultures, ages and skill levels depict objects and materials furnishes a gamut of ‘depictions’ so wide as to present a severe challenge to current algorithms — none of them perform satisfactorily across any but a few types of depiction. Indeed, most algorithms exhibit a significant performance loss when the images used are non photographic in nature. This loss can be explained using the tacit assumptions that underlay nearly every algorithm for recognition. Appeal to the art history literature provides an alternative set of assumptions, that are more robust to variations in depiction and which offer new ways forward for automated image analysis. This is important, not just to advance computer vision, but because of the new understanding and applications that it opens.
Purchase
Buy instant access (PDF download and unlimited online access):
Institutional Login
Log in with Open Athens, Shibboleth, or your institutional credentials
Personal login
Log in with your brill.com account
Balaji, Y. , Sankaranarayanan, S. and Chellappa, R. (2018), Metareg: Towards domain generalization using meta-regularization, in ‘Proceedings Advances in Neural Information Processing Systems 31’, pp. 1006–1016.
Bar, Y. , Levy, N. and Wolf, L. (2014), Classification of artistic styles using binarized features derived from a deep neural network, in ‘Workshop at the European Conference on Computer Vision’, Springer, pp. 71–84.
Bell, S. , Upchurch, P. , Snavely, N. and Bala, K. (2013), ‘Opensurfaces: A richly annotated catalog of surface appearance’, ACM Transactions on Graphics 32(4), 111.
Bell, S. , Upchurch, P. , Snavely, N. and Bala, K. (2015), Material recognition in the wild with the materials in context database, in ‘proceedings Computer Vision and Pattern Recognition’, pp. 3479–3487.
Bengio, Y. , LeCun, Y. and Henderson, D. (1994), ‘Globally trained handwritten word recognizer using spatial representation, convolutional neural networks, and hidden markov models’, proceedings Advances in Neural Information Processing Systems pp. 937–937.
Berg, A. C. and Malik, J. (2001), Geometric blur for template matching, in ‘proceedings International Conference on Computer Vision and Pattern Recognition’.
Berger, J. (2008), Ways of seeing, Vol. 1, Penguin UK.
Boulton, P. and Hall, P. (2019), ‘Artistic domain generalisation methods are limited by their deep representations’, arXiv preprint:1907.12622.
Caesar, H. , Uijlings, J. and Ferrari, V. (2018), Coco-stuff: Thing and stuff classes in context, in ‘proceedings Conference on Computer Vision and Pattern Recognition’, pp. 1209–1218.
Cai, H. , Wu, Q. and Hall, P. (2015a), Beyond photo-domain object recognition: Benchmarks for the cross-depiction problem, in ‘proceedings International Conference on Computer Vision Workshops’, pp. 1–6.
Cai, H. , Wu, Q. , Corradi, T. and Hall, P. (2015b), ‘The cross-depiction problem: Computer vision algorithms for recognising objects in artwork and in photographs’, arXiv preprint arXiv:1505.00110 .
Collomosse, J. , Bui, T. , Wilber, M. , Fang, C. and Jin, H. (2017), Sketching with style: Visual search with sketches and aesthetic context, in ‘proceedings Conference on Computer Vision and Pattern Recognition’, pp. 2660–2668.
Crowley, E. J. and Zisserman, A. (2014), In search of art., in ‘ECCV VisArt Workshops (1)’, pp. 54–70.
Csurka, G. , Dance, C. , Fan, L. , Willamowski, J. and Bray, C. (2004), Visual categorization with bags of keypoints, in ‘ECCV Workshop on statistical learning in computer vision’.
Dalal, N. and Triggs, B. (2005), Histograms of oriented gradients for human detection, in ‘proceedings Computer Vision and Pattern Recognition’, Vol. 2, pp. 886–893.
Deschaintre, V. , Aittala, M. , Durand, F. , Drettakis, G. and Bousseau, A. (2018), ‘Single-image svbrdf capture with a rendering-aware deep network’, ACM Transactions on Graphics (SIGGRAPH Conference Proceedings) 37(128), 15.
Ergun, S. , Onel, S. and Ozturk, A. (2016), A general micro-flake model for predicting the appearance of car paint, in ‘Proceedings of the Eurographics Symposium on Rendering: Experimental Ideas and Implementations’, pp. 65–71.
Falomir, Z. , Museros, L. , Sanz, I. and Gonzalez-Abril, L. (2018), ‘Categorizing paintings in art styles based on qualitative color descriptors, quantitative global features and machine learning (qartlearn)’, Expert Systems with Applications 97, 83–94.
Felzenszwalb, P. F. , Girshick, R. B. , McAllester, D. and Ramanan, D. (2010), ‘Object detection with discriminatively trained part-based models’, IEEE Transactions on Pattern Analysis and Machine Intelligence 32(9), 1627–1645.
Fischler, M. A. and Elschlager, R. A. (1973), ‘The representation and matching of pictorial structures’, IEEE Transactions on Computers 22(1), 67–92.
Fleming, R. W. (2014), ‘Visual perception of materials and their properties.’, Vision Research 94, 62–75.
Geirhos, R. , Temme, C. , Rauber, J. , Schutt, H. , Bethge, M. and Wichmann, F. (2018a), Generalisation in humans and deep neural networks, in ‘Advances in Neural Information Processing Systems’, pp. 7538–7550.
Geirhos, R. , Rubisch, P. , Michaelis C , Bethge, M. , Wichmann, F. A and Brendel, W. (2018b), ‘Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness’, arXiv preprint arXiv:1811.12231.
Ginosar, S. , Haas, D. , Brown, T. and Malik, J. (2014), Detecting people in cubist art, in ‘ECCV VisArt Workshop’, pp. 101–116.
Guarnera, D. , Guarnera, G. C. , Ghosh, A. , Denk, C. and Glencross, M. (2016), Brdf representation and acquisition, in ‘Computer Graphics Forum’, Vol. 35, Wiley Online Library, pp. 625–650.
Gultepe, E. , Conturo, T. E. and Makrehchi, M. (2018), ‘Predicting and grouping digitized paintings by style using unsupervised feature learning’, Journal of cultural heritage 31, 13–23.
Hall, P. and Song, Y.-Z. (2013), Simple art as abstractions of photographs, in ‘proceedings of the Symposium on Computational Aesthetics’, ACM, pp. 77–85.
Hall, P. , Cai, H. , Wu, Q. and Corradi, T. (2015), ‘Cross-depiction problem: Recognition and synthesis of photographs and artwork’, Computational Visual Media 1(2), 91–103.
He, K. , Zhang, X. , Ren, S. and Sun, J. (2016), Deep residual learning for image recognition, in ‘proceedings Computer Vision and Pattern Recognition’, pp. 770–778.
Hu, D. , Bo, L. and Ren, X. (2011), Toward robust material recognition for everyday objects., in ‘British Machine Vision Conference’, Vol. 2, p. 6.
Hu, R. , James, S. , Wang, T. and Collomosse, J. (2013), Markov random fields for sketch based video retrieval, in ‘Proceedings of the 3rd ACM conference on International conference on multimedia retrieval’, pp. 279–286.
Jenicek, T. and Chum, O. (2019), ‘Linking art through human poses’, arXiv preprint arXiv:1907.03537.
Karayev, S. , Trentacoste, M. , Han, H. , Agarwala, A. , Darrell, T. , Hertzmann, A. and Winnemoeller, H. (2013), ‘Recognizing image style’, arXiv preprint arXiv:1311.3715.
Kim, D. B. , Seo, M. K. , Kim, K. Y. and Lee, K. H. (2010), ‘Acquisition and representation of pearlescent paints using an image-based goniospectrophotometer’, Optical engineering 49(4), 043604.
Kosslyn, S. M. and Pomerantz, J. R. (1977), ‘Imagery, propositions, and the form of internal representations’, Cognitive psychology 9(1), 52–76.
Krizhevsky, A. , Sutskever, I. and Hinton, G. E. (2012), Imagenet classification with deep convolutional neural networks, in ‘proceedings Advances in Neural Information Processing systems’, pp. 1097–1105.
Kubilius, J. , Kar, K. , Schmidt, K. and DiCarlo, J. J. (2018), Can deep neural networks rival human ability to generalize in core object recognition?, in ‘proceedings Conference on Cognitive Computational Neuroscience’.
Li, D. , Yang, Y. , Song, Y.-Z. and Hospedales, T. (2017), Deeper, broader and artier domain generalization, in ‘proceedings International Conference on Computer Vision’.
Li, D. , Yang, Y. , Song, Y.-Z. and Hospedales, T. M. (2018), Learning to generalize: Meta-learning for domain generalization, in ‘proceedings AAAI Conference on Artificial Intelligence’.
Liu, C. , Sharan, L. , Adelson, E. H. and Rosenholtz, R. (2010), Exploring features in a bayesian framework for material recognition, in ‘proceedings Computer Vision and Pattern Recognition’, pp. 239–246.
Lowe, D. G. (2004), ‘Distinctive image features from scale-invariant keypoints’, Intl. Journal of Computer Vision 60(2), 91–110.
Matusik, W. , Pfister, H. , Ziegler, R. , Ngan, A. and McMillan, L. (2002), ‘Acquisition and rendering of transparent and refractive objects’, in 13th Eurographics Workshop on Rendering, pp. 267–278.
Matusik, W. , Pfister, H. , Brand, M. and McMillan, L. (2003), ‘Efficient isotropic BRDF measurement’, in 14th Eurographics Workshop on Rendering, pp. 241–248.
Ojala, T. , Pietik ̈ainen, M. and Harwood, D. (1996), ‘A comparative study of texture measures with classification based on featured distributions’, Pattern recognition 29(1), 51–59.
Pylyshyn, Z. W. (1973), ‘What the mind’s eye tells the mind’s brain: A critique of mental imagery.’, Psychological bulletin 80(1), 1.
Russakovsky, O. , Deng, J. , Su, H. , Krause, J. , Satheesh, S. , Ma, S. , Huang, Z. , Karpathy, A. , Khosla, A. , Bernstein, M. , Berg, A. C. and Fei-Fei, L. (2015), ‘ImageNet Large Scale Visual Recognition Challenge’, International Journal of Computer Vision 115(3), 211–252.
Schwartz, G. and Nishino, K. (2013), Visual material traits: Recognizing per-pixel material context, in ‘proceedings International Conference on Computer Vision Workshops’, pp. 883–890.
Shechtman, E. and Irani, M. (2007), Matching local self-similarities across images and videos, in ‘proceedings Computer Vision and Pattern Recognition, pp. 1–8.
Simonyan, K. and Zisserman, A. (2014), ‘Very deep convolutional networks for large-scale image recognition’, arXiv preprint arXiv:1409.1556.
Szegedy, C. , Vanhoucke, V. , Ioffe, S. , Shlens, J. and Wojna, Z. (2016), Rethinking the inception architecture for computer vision, in proceedings Computer Vision and Pattern Recognition, pp. 2818–2826.
Torrance, K. E. and Sparrow, E. M. (1966), ‘Off-specular peaks in the directional distribution of reflected thermal radiation’, Journal of Heat Transfer 88(2), 223–230.
van de Maaten, L. and Hinton, G. (2008), ‘Visualizing data using t-sne’, Journal of machine learning research 9(Nov), 2579–2605.
Wiebel, C. B. , Toscani, M. and Gegenfurtner, K. R. (2015), ‘Statistical correlates of perceived gloss in natural images’, Vision Research 115, 175–187.
Wilber, M. J. , Fang, C. , Jin, H. , Hertzmann, A. , Collomosse, J. and Belongie, S. (2017), Bam! the behance artistic media dataset for recognition beyond photography, in ‘proceedings International Conference on Computer Vision’, pp. 1202–1211.
Willats, J. (1997), Art and representation: New principles in the analysis of pictures, Princeton University Press.
Wu, Q. , Cai, H. and Hall, P. (2014), Learning graphs to model visual objects across different depictive styles, in ‘European Conference on Computer Vision’, Springer, pp. 313–328.
All Time | Past Year | Past 30 Days | |
---|---|---|---|
Abstract Views | 269 | 32 | 2 |
Full Text Views | 22 | 0 | 0 |
PDF Views & Downloads | 33 | 1 | 0 |
This paper explores the automated recognition of objects and materials and their relation to depictions in images of all kinds: photographs, artwork, doodles by children, and any other visual representation. The way artists of all cultures, ages and skill levels depict objects and materials furnishes a gamut of ‘depictions’ so wide as to present a severe challenge to current algorithms — none of them perform satisfactorily across any but a few types of depiction. Indeed, most algorithms exhibit a significant performance loss when the images used are non photographic in nature. This loss can be explained using the tacit assumptions that underlay nearly every algorithm for recognition. Appeal to the art history literature provides an alternative set of assumptions, that are more robust to variations in depiction and which offer new ways forward for automated image analysis. This is important, not just to advance computer vision, but because of the new understanding and applications that it opens.
All Time | Past Year | Past 30 Days | |
---|---|---|---|
Abstract Views | 269 | 32 | 2 |
Full Text Views | 22 | 0 | 0 |
PDF Views & Downloads | 33 | 1 | 0 |