In this article we explore the practical use of medialness informed by perception studies as a representation and processing layer for describing a class of works of visual art. Our focus is towards the description of 2D objects in visual art, such as found in drawings, paintings, calligraphy, graffiti writing, where approximate boundaries or lines delimit regions associated to recognizable objects or their constitutive parts. We motivate this exploration on the one hand by considering how ideas emerging from the visual arts, cartoon animation and general drawing practice point towards the likely importance of medialness in guiding the interaction of the traditionally trained artist with the artifact. On the other hand, we also consider recent studies and results in cognitive science which point in similar directions in emphasizing the likely importance of medialness, an extension of the abstract mathematical representation known as ‘medial axis’ or ‘Voronoi graphs’, as a core feature used by humans in perceiving shapes in static or dynamic scenarios.
We illustrate the use of medialness in computations performed with finished artworks as well as artworks in the process of being created, modified, or evolved through iterations. Such computations may be used to guide an artificial arm in duplicating the human creative performance or used to study in greater depth the finished artworks. Our implementations represent a prototyping of such applications of computing to art analysis and creation and remain exploratory. Our method also provides a possible framework to compare similar artworks or to study iterations in the process of producing a final preferred depiction, as selected by the artist.
Consider a trace being drawn on a blank canvas and an observer contemplating it. How to characterize the changes introduced by the lines as they are drawn, painted or written on the surface? If we think of the perceptual visual interface (Hoffman, 2009) as a tension field (Zhu, 2016) (Note 1), as proposed for example in the Gestalt school tradition, then the focus of attention of the observer appears to be attracted by certain regions and spots which depend on the interplay between the traces and canvas boundaries (Albertazzi, 2006; Arnheim, 1974). Not all positions on the canvas are seen as equal, and the apparent tension field will keep evolving as an artist creates additional traces or as the observer’s gaze scans over different parts of a large artifact.
One particular notion which proves useful in characterizing the shape (Note 2) of the traced outlines of objects as well as the shape of the ‘negative space’ in between these is that of medialness: the ridges of intensity of interactions between elementary trace fragments, typically considered by pairs. We later provide a more formal definition of medialness, which can then guide us to implement and test its validity in mimicking humans’ visual appreciation and understanding of the shape of the depicted objects.
Medialness proves a powerful tool for shape understanding in different applications (Leymarie and Kimia, 2008) and for different fields of study. In this communication we highlight the relations and background in fields we see as complementary to each other under the umbrella of medialness (Fig. 1): (i) the visual arts, (ii) perception and vision science, and (iii) mathematical models and computing. By the ‘visual arts’ we refer to the tradition of classic training and practice in sketching, drawing, painting, sculpting, calligraphy writing. Artists contribute their technical skills and visual expertise and are good at over-emphasizing certain visual cues, and even manipulate these to explore otherwise known concepts and provide new visualities (e.g., consider caricatures or cubism). Perception and vision science provide insights in how the wetware is processing information and creating an understanding of our environment. The language of mathematics then can be used to study and hopefully improve our understanding of the perceptual and the visual arts while permitting computational implementations, to test and ultimately extend our knowledge and reach (feeding back into the perceptual and artistic).
Having explored medialness under the scrutiny of these three disciplines in Sects 2 (Visual Arts), 3 (Perception and Vision), 4 (Maths and Computing), we will then briefly present a computational framework for medialness we have recently developed (summarized in Sect. 5) and illustrate its potential by studying a series of graphical works (in Sect. 6) by two of the most gifted artists of the 20th century: Picasso and Matisse (Note 3).
2. Medialness and the Visual Arts
Medialness appears in various guises in the visual arts. We focus our attention to its use with 2D artifacts such as when painting or drawing on a canvas, laying out forms on a wall in street art, describing main attitudes and movements in animation or preparatory sketches, or when designing a garden layout and horizontally positioning various elements sharing certain symmetry relationships.
When learning to draw, a basic technique to render animal forms is to approximate their skeleton via stick figures, over which ‘flesh’ or structure can be added (Hogarth, 1984; Williams, 2009). Stick figures are also used by infants and primitive cultures in early representations of human or other animal forms (Arnheim, 1974; Gombrich, 2000). The stick figure helps rapidly specifying a pose, balance, movement or action (Hahn et al., 2015). Also, the simplified form permits to decide on the body structure without being bothered by details. Joints can then be indicated, for example with small disks. Body parts can then be imagined as generic cylinders where only the contours need be drawn loosely connecting the joints (Fig. 2).
Another related technique is to draw approximate simple primitive outlines, such as rough disks, ovals, elongated slabs and cylinders, and position and combine these to create a first fleshed out version of the complete form; by placing and centering these primitives near joints, an animal form can then be rapidly sketched for various poses (Loomis, 1951), and thus create in a series the illusion of an articulated and natural movement (Fig. 3). Rather than using only a few primitives and their approximate linkage to create a rough fleshed out draft of a form, we can also more systematically sweep a primitive (such as a disk of varying radius) along sets of medial curves (connected or not) to synthesize geometrically well segmented forms, such as black patterns on a white background, with sharp boundaries (Harries and Blum, 2010).
Another basic technique used in sketching is to outline a region and fill it with textures rendered by a gesture that follows a principal medial axis of that region (Tresset and Leymarie, 2005). One draws across the axis (which may be explicitly drawn or left to the imagination of the artist) with rapid gestures with a pen or brush. By controlling the pressure, speed and curvature of the rendering, various styles are obtained (Fig. 4).
In animation the Line of Action (LoA) is a single curve running through the ‘middle’ of the character, which represents an overall force and direction of movement for the character. In the early days of the Walt Disney studio, before drawing a full character, an animator would frequently draw in a LoA to help determine the main pose of the character. By simply changing the LoA, e.g., curving or bending it in a different direction, the entire qualitative impression of the drawing can be controlled. A related technique used in sketching when doing a study of the articulated movement of a body, is to overlay on the same canvas the multiple positions of a body in movement by simplifying it to its essence in terms of a medial set of skeletal curves perhaps augmented with rough outlines of body primitives, but without any filling or shading. The artist faces the challenge of representing on the same canvas the essential information about the articulated movement under study, while minimizing the masking effects of overlapping body parts. This struggle was also at the birth of the basic technique defining today’s motion capture systems, in the early days of photography, as invented by Etienne-Jules Maray in the late 19th century (Kovács, 2010). Maray eventually discovered he could outlay on the same photographic plate the reflecting medial traces of subjects in movement. A human subject would be wearing black clothes to which one would attach reflecting dots at joints and reflecting lines to connect these; then the subject would perform a movement in a room with black backgrounds which could be captured by one of Maray’s photographic gun designs (Note 4). Maray was essentially drawing in space-time using his innovative photographic systems, as an experienced animator might draw on the blank canvas a succession of skeletal traces to rapidly study and refine a complete articulated movement of a character.
Another use for medialness in drawing practices is by considering its complementary role in representing the voids of space in between structures or drawn primitives, such as when considering the ‘negative spaces’ in architectural designs (Leymarie and Kimia, 2008). An interesting example of this approach in landscaping designs was uncovered by Van Tonder et al. in analyzing famous 15th century Japanese gardens. When contemplating a garden, we often select what we feel are better viewpoints to admire the structure and layout of the landscape, its plants, flowers, rocks, sculptural elements, and so on. Van Tonder et al. have shown that a class of ancient garden layouts can be well modeled by a set of connected medial traces in 2D which represents a perceptual (visual) tension flow field in between the garden’s main elements and lying in an horizontal plane parallel to that of the garden (Van Tonder et al., 2002; Van Tonder, 2006). In particular, the design of the Ryōan-ji garden had been a long lasting mystery; Van Tonder et al. have shown how by using the rock and plant structures of such a garden as the generators of an imaginary wave propagation, an approximate oriented flow field they call the Hybrid Symmetry Transformation (or HST—Van Tonder and Ejima, 2003) indicates the best viewing loci for the human observer as well as for a now bygone statue of the Buddha (Fig. 5). A possible interpretation is that the original master designer did conceive of a plan of the garden’s main element positioned according to their medialness relationships (as recovered by Van Tonder et al.). This type of approximate flow field which highlights medialness has also become a subject of study of growing interest in recent years in cognitive science and visual perception, as well as brain physiology studies (focused on the visual cortex) which we highlight next.
3. Medialness in Human Perception and Vision
Medialness in human perception studies appears in one form or another since the infancy of the field. For example, notions of symmetry—of individual objects as well as arrangements of plurality of objects—are proposed as being the source of fundamental principles of perception by the German Gestalt school in the 1930s. In the 1950s, Rudolph Arnheim, himself a graduate of the same school, intuitively arrives at a notion akin to a field of medialness and talks of the ‘structural skeleton’ of a canvas and its object traces (Albertazzi, 2006; Arnheim, 1974). Also in the 1950s, Fred Attneave studies and shows the importance of curvature features when observing 2D objects or 2D views of 3D objects. He proposes to represent a perceptual encoding of the outline of an object as being summarized by information stored at or in relation to curvature extrema, i.e., in relation to corners, convexities or concavities (Attneave, 1954) (Note 5). Through the 1960s and 1970s Harry Blum and his collaborators elaborated a new shape representation in terms of a class of symmetry graphs named ‘medial axis’ (denoted MA) and distance propagation fields (aka grassfire propagations) (Blum, 1967, 1973; Kotelly, 1963) (Note 6). Such early ideas on curvature, intrinsic axial symmetry and field propagations have since stimulated more interest and research in the closely related fields of human visual perception and human brain understanding and functional modeling. We summarize such important lines of work in the following sub-sections focused around (i) Arnheim’s structural skeleton, (ii) geons and codons (primitives and contour segments), (iii) point-based medialness, and (iv) vision (brain) substrates for medialness.
3.1. Arnheim’s Structural Skeleton
As we saw earlier in Sect. 2, one way to use and think of medialness in the context of artistic creation is as a locus of tension between rendered primitives. This idea was perhaps first explored to provide a theoretic framework of art and perception in the works of Denman Ross (Ross, 1907) and Rudolph Arnheim (Arnheim, 1974) in the first part and middle of the 20th century. Arnheim, who was a member of the first generation of students of the German Gestalt school, was trying to extend ideas from physics, such as known at the time from electromagnetism, to the study of perception when applied to the visual arts; he also had carefully read the works of Ross (McManus et al., 2011). In the opening stage of his monograph on Art and Visual Perception, first published in 1954, Arnheim proposed to model the creation and perception of visual pieces via a tension field (Note 7) whose main force lines constitute what he calls the ‘structural skeleton’ of a painting, drawing, or sketch (Albertazzi, 2006; Arnheim, 1974; Zhu, 2016). Some empirical evidence in support of the ability of humans to detect an induced structural skeleton in paintings has since been reported by Locher (Locher, 2003). The associated notion of pre-eminence of the center (of a canvas) also presented by Arnheim in various monographs is however less clear and possibly weak or non-existent (McManus et al., 2011).
3.2. Geons and Codons
Since the 1980s, Irving Biederman and his collaborators have developed a representational theory of shape based on a notion of parts as elementary geometric primitives they refer to as geons. Such object primitives are summarized by their MA and associated sweep functions, and thus can be seen as descendants of the generalized cylinders approach (sometimes referred to as generalized cones) as first explored in mathematics and early computational approaches to pattern recognition. The motivation is different however: Biederman et al. conducted over the years many psychophysical studies to show how humans tend to partition and recognize more complex objects in terms of parts which can be built from geons. The geon family is the basis of the ‘Recognition By Components’ (RBC) theory which postulates that (at least) a large percentage of human-made objects can be understood as being composed by hierarchies of parts modeled by geons and their relationships (Biederman, 1987, 2000). Other evidence for a structurally-based shape theory of perception that is pointed at by Biederman comes from the importance of ‘non-accidental features’ such as corners and main axial symmetries. This follows from earlier intuitions and initial work, for example by Fred Attneave (Attneave, 1954).
Also in the 1980s, a perceptual model in support of a class of non-accidental features of generic smooth contours was proposed by Richards and Hoffman (Richards and Hoffman, 1985) and defined as: sequences of pairs of significant concavities (measured as extrema of negative curvature) bounding a significant convexity (measured as a positive extremum of curvature). Such triples, aka codons (Note 8), for smooth bounding contours, tend to define useful local parts of 2D objects and can even be used to explain some visual illusions including figure–ground reversals, such as Edgar Rubin’s faces–vase ambiguous figure (Gregory, 2009). Later this theory was extended to 3D volumetric objects, where lines of negative curvature (or concave creases) were proposed as good loci to separate 3D objects in parts (Hoffman, 2001): such a partitioning corresponds to the parts and joints favored under the RBC theory. In the recent perception literature, similar notions of curvature linked to medial axes for shape segmentation into parts have been studied extensively, in particular by De Winter and Wagemans (De Winter and Wagemans, 2006).
Geons themselves are also interesting in relation to drawing techniques often used as teaching devices. The human figure (and other animals) is often first sketched on the basis of a series of connected generalized cylinders, a technique referred to as ‘geometric drawing’ (Simmons and Winer, 1977) or ‘analytic’ or ‘schematic drawing’ (Simmons, 1994). For example, the head modeled as an ovoid or elongated box can be fit on a series of cylinders modeling the neck, torso, limbs, defining a first sketch of a manikin, and each body part can be fleshed out and refined in successive sketches (Loomis, 1951; Simmons and Winer, 1977). Such sketches of characters can then be further modified as wholes, e.g., using the previously described Line of Action (LoA) technique from animation (Bregler et al., 2002).
3.3. Point-Based Medialness
By the 1990s other psychologists and cognitive scientists had started to explore in greater detail the potential of medialness as a substrate for the human perception of objects. Back at Rutgers University a team led by Béla Julesz and Ilona Kovács was focusing on how humans have their visual perception of the shape of animals in movement modified by change in contrast, thus providing a way to measure the influence of shape (Kovács and Julesz, 1993, 1994). Kovács, Julesz and Feher derived differential contrast sensitivity maps (Note 9) for 2D objects that were consistent with a medialness function (called D ε-function) representing the cumulated amount of boundary loci equidistant from the observation point within a tolerance (width) of ε (Kovács et al., 1998). This leads them to hypothesize a medial-based shape processing method for human vision, which is in part inspired by the original proposal made by Harry Blum (Blum, 1973). This D ε-function makes explicit certain medial points located in the vicinity of a corresponding MA, predicting where contrast sensitivity should be maximal, and potentially leading to a more compact representation. Ilona Kovács refers to such loci of high medialness as ‘hot spots’ and hypothesizes that they are likely to play a key role in rapid form analysis by humans, in particular when considering articulated movement (Kovács, 2010). Such hot spots are also possibly related to the way the 19th century chrono-photographer Etienne-Jules Maray was studying the movement of humans and animals by highlighting a series of reflective dots along straight skeletal lines (Kovács, 2010).
Another line of evidence is being provided by studies on human-driven attention when presented with simple static shapes, such as rectangles, ovals, with perturbations and parts added. In such studies, the naive human subjects are presented with unexpected shapes and their task is to indicate which areas of the canvas or screen is of greater interest. In an early study in the late 1970s, Psotka showed that when humans are asked to place dots within the outline of various 2D objects, the resulting cumulative pattern that emerges closely resembles Blum’s MA loci (Psotka, 1978). Recently, this study has been revisited and verified with strength using a computerized tablet as an interface and asking random people on the street (of NYC) to perform a similar task (Firestone and Scholl, 2014). We note that, similarly to the results of Kovács et al. the sampling of the MA observed in these studies tends to highlight ‘hot spots’: i.e., the MA trace is not sampled uniformly, but instead the samples tend to be focused in fewer small regions of attraction, often near convexities and corners, but also near object centers and some other few MA loci.
3.4. Vision (Brain) Substrates of Medialness
Perhaps one of the earliest hypotheses that the visual neural system may be processing incoming light as to emphasize a medialness map is to be found in the works of J. Anthony Deutsch who, in the early 1960s, thought of shape being characterized as a function of distance between contour loci. Deutsch proposes that ‘distance in space’ can be computed by wetware as ‘distance in time’ and that such a computation can be performed as a ‘wave of excitation propagated by the contours of the [object] itself’ (Deutsch, 1962) (Note 10). Deutsch goes further and hypothesizes that the anatomical layered arrangements of nervous fibers found in the optic lobes of bees is consistent with a computation of pairwise distances between contour points, and he then proposes two possible mechanisms which could exploit this architecture, one based on direct distance mapping via time simultaneity (or neural length), and one mimicking wavelet propagation of pairs of contour traces (Note 11). Deutsch also gives some basic criteria that a wetware-based system for shape recognition ought to have: invariance to (i) translation and (ii) rotation, (iii) asymmetric invariance to scaling (where it is easier (‘more efficient’) to match a smaller version—which may capture less details—to a larger template, than vice-versa), and finally (iv) invariance to mirror symmetries.
In recent years, evidence that the brain performs computations analogous to a medialness recovery has been cumulating. A good survey and summary of the state of the art circa 2003 can be found in the detailed manuscript by Ben Kimia (Kimia, 2003). Kimia, in collaboration with his colleagues and students, has developed an approach to shape analysis directly inspired by the early works of Blum and others. In particular, the computational scheme of Kimia et al. based on shock graphs (in 2D) and scaffolds (in 3D) supports the speculation of Kovács et al. that a ‘sparse skeleton representation of shape is generated early in visual processing’ (Kimia, 2003). Kimia et al. extend the traditional model of the MA to represent images, where each MA segment models a region of the image and is called a visual fragment. They presented a unified theory of 2D perceptual grouping and object recognition where through various sequences of transformations of the MA representation, visual fragments are grouped in various configurations to form object hypotheses, are related to stored models, and are also consistent with the formation of certain illusory contours (Tamrakar and Kimia, 2004); an equivalent effort for 3D percepts and objects remains largely to be explored—but there is recent work on the medial scaffold for some early steps (Leymarie, 2003, 2006a; Leymarie and Kimia, 2007; Leymarie et al., 2004), as well as work on approximating a 3D curve-based skeleton by combining information from 2D views (Livesu et al., 2012; Telea and Jalba, 2012; Yasseen et al., 2015).
Perhaps the earliest work that focused on the likeliness of neurons and neural networks in primates being used to explicitly evaluate medialness is by Lee et al. (Lee et al., 1998) published in the same 1998 issue of Vision Research as Kovács et al. (Kovács et al., 1998), where they showed the potential for neurons in the primary visual cortex (aka V1) of monkeys to be computing medialness. They observed high response for certain medial locations—e.g., ‘center of mass’ were found for ‘compact’ objects such as circles, diamonds and squares, while ‘central response peaks were found along the entire axis only for elongated strips and rectangle’. That is, Lee et al. found that medialness is made available in the neural substrate (in their case in a network of V1 monkey neural responses) and is emphasized at particular loci along the theoretical MA, in the spirit of Kovács et al. proposed ‘hot spots’. More recent studies include work by Lescroart and Biederman who indicate that parts representations and relative orientation of object parts can be encoded in V3 and higher visual stages and are in support of a coding of shape by the MA (Lescroart and Biederman, 2013). Hung, Carlson and Connor have further demonstrated ‘for the first time explicit coding of MA shape in high-level object cortex’ (aka as IT or inferotemporal cortex). Interestingly, they also report that such IT neurons simultaneously encode (2D) MA and contour components, what they refer to as ‘external shape’ or ‘surface features’ (Hung et al., 2012). A similar coding for 3D objects is possible and work in that direction is being pursued, in particular by the team lead by Charles Connor at John Hopkins University which has identified 3D complex surface shape tuning, also in IT (Yamane et al., 2008). Hatori and Sakai have also recently presented a scheme for coupling cells found in V1 and V2 that are selective for boundary ownership (necessary for figure–ground identification) which can provide 2D MA computations as a coupled network (Hatori and Sakai, 2014). Their results further suggest that V1 cells receive feedback from V2 as well as higher levels of the visual cortex such as V3 and V4; they also note that intercortical connections are much faster than lateral connections, and that MA responses from V1 can themselves be seen as local primitives made available back to these higher cortical areas for integration, for example by IT neurons and model at once objects represented by multiple axes (or multiple axis segments). Recently, a possible wetware architecture, exploiting in particular glial cells, for the computation of distance-based fields (which can support MA responses) has been proposed by S.W. Zucker (Zucker, 2012, 2015).
Evidence for curvature computations and representations in the visual cortex has also been mounting over the years. Perhaps one of the earliest works is the model of Dobbins, Zucker and Cynader (from the late 1980s) on the use of so-called endstopped neurons (aka as hypercomplex) for computing curvature (Dobbins et al., 1987, 1989). They were able to show that in particular ‘endstopped cells in area 17 of the cat visual cortex are selective for curvature, whereas non-endstopped cells are not, and that some are selective for the sign of curvature’ (which implies figure–ground segmentation). A computational framework for end-stopping and curvature measures was recently elaborated on the basis of Differences of Gaussians at varying orientations and scales to model simple neurons of V1, while more elaborate combinations of such Gaussian responses are used to approximate complex and hypercomplex neurons (Rodriguez-Sanchez and Tsotsos, 2012). Potential support for the codon representation of series of curvature peaks (minima and maxima) along 2D object contours has also emerged in reported neural population responses from area V4 of macaque monkeys, where the ‘strongest peaks in the population response were those corresponding to sharper convex and concave boundary features’ (Pasupathy and Connor, 2002).
4. Medialness via Mathematical and Computational Shape Probing
Medialness is characterized by the interaction of two fragments of an object, typically contour segments or outline traces, which share a spatial symmetry relation: e.g., think of the vertical axis of a pot wheel and the surface elements of the pot or of the finger tips of the pot maker which trace in time surfaces of symmetry (of revolution in this case). The axial curves of tubular structures are direct examples of medialness—reduced to its simplest expression—a subject formally studied by the famed French geometer Gaspard Monge (late 18th century) at the onset of modern mathematical studies in descriptive and differential geometries, and since applied in computer vision theories of shape representation by pioneers such as Thomas Binford (generalized cylinders) and David Marr (hierarchies of parts). A true visionary in the history of pattern recognition at the infancy of computing was Harry Blum who introduced and developed in the early 1960s with colleagues at the Air Force Cambridge Research Laboratories a graph-theoretic notion of ‘medial axis’ (MA) aka ‘skeleton’ or ‘symmetry axis’ (Blum, 1962a, b, 1967; Kotelly, 1963). Blum was interested in developing a mathematics for biology that would provide a process-based view on symmetry to study growth, movement, as well as static shape descriptions. Blum often would use simple drawn figures (line drawings) from which a flow would ignite at their outline and propagate over the horizontal space as a grass fire: fire fronts would quench by pair defining a medial trace of axial symmetries which could then be associated to implicit shape features of the original object; ‘medial’ here refers to the location of a symmetry axis being ‘in-between’ initial contour outline segments at the origin of the quenching (imaginary) fire. This fire analogy is one of space occupation or partitioning, i.e., every drawn or observed segment is affiliated with a zone of influence or distance field surrounding it (Fig. 6): where two such zones meet indicates a local line of symmetry between a pair of segments (a pair of object’s outline, drawn, observed or hallucinated). We note here that via this propagation process an explicit tension field is provided as a directed distance map, such that every point of empty space is oriented towards its ‘source’—a boundary segment or line trace—except for those loci at equal distance from multiple sources—which are precisely located on the trace of the MA.
Blum’s proposal is an attempt to unify notions of geometry with topology: e.g., the hole of the handle of a cup manifests itself as a resulting loop in the MA graph for the interior of the object and as an MA sub-structure oriented perpendicularly to the plane of the handle and through the center of the hole for the exterior space (alike a local pot wheel axis). Each MA locus can be linked to curvatures at the boundary of the object, hence unifying local differential structure with global topological properties (Siddiqi and Pizer, 2008).
In 2D an MA branch always terminates in relation to either a curvature extrema or a circular arc on the object’s boundary. Hence, a corner of a 2D object is well demarcated by a terminating MA branch. The dual of this observation is that concavities or inlands of an object are typically characterized by a terminating MA branch for the exterior space (aka negative space). In theory, there is a duality between the MA structure and the original outline of an object, such that by inflating the MA trace with a set of chosen primitives, the original outline can be recovered precisely. For general forms, the primitive is a disk (in 2D, a sphere in 3D) whose radius is specified by the generated distance field (or time of fire quenching) (Fig. 6); for special object classes such as tubular forms or generalized cylinders, the ‘primitive’ is a varying radius function which is swept along the axial graph.
Recent extensions of Blum’s MA—which remains an active topic of research after more than 50 years since Blum’s original ideas first appeared—include (i) the use of Bayesian modeling to integrate 2D properties such as grouping, convexity and curvature, figure–ground dichotomy (Feldman and Singh, 2006; Froyen et al., 2015), (ii) in 3D, work focused on efficient point-based modeling inspired by the computer graphics community (Delame et al., 2016; Tagliasacchi et al., 2016), as well as the work on shock scaffolds as an extension of earlier work on 2D shock graphs, based on results from (mathematical) singularity theory (Chang et al., 2009; Leymarie, 2003; Leymarie and Kimia, 2007), and (iii) at the junction of 2D and 3D, the work on the recovery of approximate 3D MA information from a plurality of 2D MA obtained from a series of view points taken around a given object of interest (Yasseen et al., 2015, 2017).
Here we note that Arnheim’s structural skeleton (Sect. 3.1) can be approximated as Blum’s MA for the exterior of the depicted objects’ traces, within the limits of a canvas, i.e., the symmetry curves indicating the lines of balance between the canvas’ frame—delimiting the outward boundary of the painting or sketch—and the pictorial elements imposed by the creator, the artist—such as the outlines of objects being depicted (Note 12). Furthermore, the generated distance field obtained when computing Blum’s grassfire propagation is directional: every point of space is oriented towards its closest source (here part of a drawn segment). Together with the lines of symmetry this field gives an explicit representation of Arnheim’s structural skeleton and notion of tension. The concept of (medial) lines of forces used either by an artist or designer in conceiving an artifact or by the observer in interpreting the composition of an artwork has been revisited by Michael Leyton since the later part of the 20th century. Leyton has mainly considered the interiors of drawn and painted forms with a notion of an extended 2D MA where branches are oriented to indicate growth patterns (e.g., of arms and fingers) and exterior pressures (Leyton, 2006) (Note 13). A recent review of such ideas that may support a theory of visual tension when applied to the visual arts is given by Ling Zhu (Zhu, 2016). Also recently, a theory of graffiti generation, which shows similitudes with some of the ideas of Leyton, can be found developed in the manuscript by the Italian artist Dado (Ferri, 2017).
In the following paragraphs we summarize a number of illustrative projects in computing which use variants of Blum’s MA as a key component in designing computerized drawing systems.
4.1. Use of the MA in Computerized Drawing Systems
Using an approximate medial curve to indicate a movement plan for the arm to move along and perform across it drawing gestures has been explored in some computerized systems that aim at approximating an artistic rendering style. In automatic painterly rendering, Gooch et al. use the 2D MA to define brush strokes, i.e., their direction and size. They first segment an image in homogeneous color patches, and then capture the main axial directions of each such patch via an approximate MA (Gooch et al., 2002). In automatic portrait generation, Tresset and Leymarie use the 2D MA to define main gesture lines to follow when mimicking the artist’s hand (Tresset and Leymarie, 2005)(Figs 7 and 8); this is similar to the shading technique used in rapid sketching techniques (Fig. 4). These ‘gesture lines’ drive the drawing device, e.g., an ink plotter in simpler systems or a robotic arm which mimics a human’s gestures holding a pen or brush in more recent and sophisticated implementations (Tresset and Leymarie, 2013). In automatic surface shading to produce illustrations and engravings, Deussen et al. use the 3D MA to provide the main axial directions from which to derive perpendicular intersection planes to draw hatching strokes on the object’s surface (Deussen et al., 1999). A computational system based on generalized cylinders connected via an MA graph, named ROSE (Representation Of Spatial Experience), was developed in the mid-1990s by Ed Burton to model young children’s drawings (Burton, 1995).
Bregler et al. have investigated capturing a ‘Line of Action’ (LoA)—in the form of an approximate main medial line running through an articulated form—as the source of the movement of a drawn character to animate and re-targeting it to other characters in other media (Bregler et al., 2002). Although there is not enough information in this single curve to solve for more complex motion, such as how the hands move relative to each other, the investigators discovered that a surprising amount of information comes from this LoA: the essence of the motion is still present in the re-targeted output (Bregler et al., 2002). More recently a team at the INRIA in France has been exploring the use and advantages of the LoA in designing graphical interfaces to better control characters performing 3D motions, such as in dance (Guay et al., 2013, 2015). Related work in Zurich at the ETH and at Disney Research, in which a set of loosely defined medial curves is used to create a type of stick figure, has led to the design of a computerized drawing interface to permit an artist to rapidly specify the main poses of a character, which can then be interpolated into dense animation sequences (Hahn et al., 2015). The use of a simplified MA in the form of a stick figure as a drawing interface has been proposed before, for example to access large datasets of human motion data (Choi et al., 2012). Alternatively, from a drawn 2D cartoon character, a simplified MA can be automatically recovered and then used to deform and animate a corresponding 3D extruded version (Igarashi et al., 1999) or even generate a 3D character proxy which can then be further manipulated by the artist (Bessmeltsev et al., 2015).
We emphasize that many of the above computational systems can be used to study how humans perceive and draw artistic pieces, perhaps even how they learn to draw. Alternatively, such systems can be used as extensions to the artistic practice: the artist can augment their creative frontiers by using such systems as a test bed for explorations in a ‘morphospace’ of possible renderings or animations.
The challenge of exploring a morphospace in search of new styles and ideas of forms is perhaps best exemplified in the practice of contemporary visual artists such as William Latham. Latham, who was trained in established art schools (Oxford, Royal College of Arts), decided early (in the mid-1980s) to project his artistic practice into ‘algorithms’. He decided to blend evolutionary schemes inspired by the principles of Darwin and Wallace, as well as the then very recent work by Dawkins on biomorphs (Dawkins, 1986), and explore morphospaces of rapidly evolving forms in sculpting and in drawings (Latham, 1989). This lead him later to join forces with computer scientists at the IBM Research Centre near Winchester in England to produce some of the first and most sophisticated 3D interactive graphics softwares in the late 1980s (Todd and Latham, 1992). These are implicitly built around notions of medial graphs and generalized cylinders branching out and fleshed with various 3D primitives and sweep functions. Latham developed a characteristic style in producing such patterns, animated or not, which have a clear ‘organic’ stylistic signature (Lambert et al., 2013).
There are a number of limitations in this notion of medialness for the description, analysis and genesis of the traces of objects (such as their outlines) which have been the source of numerous attempts in the computing literature to augment the original contributions of Blum et al. and try to remedy such limits. Because medialness is defined on the basis of a differential structure, leading to an abstract mathematical entity, that of a graph (made of nodes linked by curve segments and hyper-segments in higher dimensions) it suffers from some of the limitations of traditional differential geometry: it favors smooth outlines, can be responsive to small deviation (in curvature) and can lead to ill-defined behaviors under perturbations of boundary segments. On the other hand, Blum’s concept provides a number of original features in comparison to traditional geometry-based viewpoints as the MA transform can be applied to open segments—such as separated drawn line segments or non-closed boundaries or objects other than solids—as well as to objects with finitely many discontinuous curvature points, such as the corners of a rectangle where curvature is undefined (or assumed infinite). The MA also does not require a priori coordinate frames attached to the trace of the object’s outlines, a traditional necessary step when computing with calculus as a framework (Leymarie, 2006b).
In the next section we summarize our computational scheme for medialness which removes some of the limitations of Blum’s MA, and which is further better adapted to the approximate use of medialness found in the visual arts, in drawing practices, as well as found in emerging results from visual perception and studies from brain science on neural correlates.
5. Beyond the MA, Towards P-Medialness: A Computational Scheme for Medialness Informed by Visual Perception
We have designed in recent years a computational scheme which brings together the two main representations of 2D objects known to play an important role in visual perception, in the understanding of human vision neural correlates, and which are relevant to many procedural techniques found in the making of visual art pieces. In the following we summarize our proposed computational scheme to systematically map visual contour traces to a medialness map which is then further processed to identify feature points of three types: (i) ‘hot spots’ of the interior medial map as dominant points (i.e., dominant in medial value); (ii) significant concavities represented by loci near the exterior side of a contour; and (iii) significant convexities represented by loci near (the interior of) the contour.
According to Kovács et al., the definition of medialness of a point in the image space is given by the accumulation of sets of boundary segments falling into an annulus of thickness parameterized by a tolerance value (ε) and with interior radius taken as the minimum radial distance of a point from boundary (Kovács et al., 1998) (Fig. 9). This process maps an image (of the interior of an object) to a gray-level 2D map where grayness is a direct measure of accumulated medialness measure for each point of the original image under consideration. One noticeable drawback of this definition when seeking to retrieve dominant points is that it does not make a distinction with neighboring boundary segments constitutive of separate object parts and which ought not to be considered in the support annulus zone; e.g., this is the case when the fingers of one’s hand are kept near each other (Fig. 10).
We improve on this definition by introducing orientation to the boundary points and then use this readily available information (e.g., from a gradient filter) to modify accordingly the medialness function, resulting in D ε ∗ (eqn. A1.1, Appendix 1). This not only reduces the impact of neighboring parts on medialness measurements, but also emphasizes the sharpness of medialness at the tips of ridges hence helping improve the localization of associated convexities—and concavities when processing the exterior area.
5.1. Hot Spots as Interior Medial Dominant Points
Medialness increases with ‘grayness’ in our transformed images (for the purpose of visualizing medialness, similarly to Kovács et al.). In order to identify hot spots at positions of high level of medialness, let us consider the medialness map as a landscape upon which we can walk and climb up to its ridges. Once on a ridge, we can walk in either (i) one climbing direction, or (ii) move ahead or behind in order to stay at high positions along a ridge and take notice of local peaks: these will be candidate (interior) medial dominant points. In terms of implementation, we first filter the medialness map to isolate ridges and then apply some heuristics to localize peaks with some degree of tolerance—function of the same ε used in the definition of medialness (Fig. 11). We then walk along ridge regions (typically thick traces around ridge lines of the medialness landscape) to locate its peaks: these are then kept as interior medial dominant points or hot spots. Some of the technical details are provided in Appendix 1; a more complete view on the algorithmic implementation is available in other recent publications (Aparajeya and Leymarie, 2016; Leymarie et al., 2014b).
We introduce now a novel visualization of primitives of the original depicted object’s 2D traces based on hot spots, which proves useful when analyzing art pieces. To each hot spot is associated an annulus with a specific radius and thickness ε. We use the following heuristic to evaluate the association of hot spots to define larger neighborhoods via their respective annuli: when a pair of annuli of thickness ε 1 and ε 2 significantly overlap (as medial disks), combine these into a new ‘sausage-like’ neighborhood obtained by a set sum of both annuli, and make the boundary of that sausage region of average thickness [i.e., ε = 0.5(ε 1 + ε 2)]. This process can be iterated until no new disk region associated to a nearby hot spot overlaps enough with an already paired annulus. ‘Significance’ of overlap is currently a decision left to the human observer and taken as a parameter; we use a minimum of 33% of overlaps in our examples when building sausages. In the proposed visualization we do not allow for crossings over junctions (i.e., where more than two ridges used to find peaks of medialness identifying hot spots come together). In effect we are applying a reverse transform to identify shape primitives as (interior) sausage regions. An example of such a ‘reconstruction’ of the original approximate shape is given in Fig. 11d; notice the thick traces of the sausages’ boundaries, and the fact that some of these primitives further overlap. We leave refinements of this procedure to future work, such as the specification of the overlap parameter via machine learning (on multiple training examples). We also note here a possible relationship with a recent analysis performed by Oliver Layton et al. in their study of figure–ground segregation (Layton et al., 2014). In their model, the annuli of Kovács et al. map to on-surround receptive fields (RFs); feedback from families of such RFs can emphasize an interior MA to indicate closure (of figure versus ground). Furthermore they propose to combine series of annuli responses for varying radii resulting in a ‘teardrop’ model that can emphasize closure along parts and corners. Our sausage region computation offers a possible simple approximation and implementation of their teardrop model.
5.2. External Concave and Interior Convex Dominant Points
Medialness is not in theory only restricted to be evaluated for the interior of an object (Note 14). Mapping an image of contour traces to medialness for the exterior of an object’s outline provides another field along which we can also follow ridges. Those that end near the contour segments are indicative of negative curvature extrema or concavities (Note 15). We use such ends of exterior medialness ridges as candidate concavities; we rank order such candidates by a significance measure representing their contour support: i.e., we estimate how much of the boundary trace is represented by the associated local end of a medialness ridge point—this is to discard curvature extrema due to noise or very small features along a contour (Fig. 12, top). If we look at the dual image, i.e., the medialness map for the interior of the object, then end of ridges are indicative of positive curvature or convexities. We apply a similar method (to that for concavities) to extract the most significant convexities (Fig. 12, bottom).
5.3. P-Medialness: Perception-Based Medial Feature Point Set
Together, the three types of points derived from medialness—interior, concave and convex—form a rich description of the shape of an object’s image (Fig. 13). We have recently built systems on the basis of this tri-partite feature set to address problems of information retrieval with applications to environmental data sets (biological shapes, static or in movement) (Aparajeya and Leymarie, 2014, 2016), movement computing (Leymarie et al., 2014b) and gesture transfer between human artists and potential robotic simulators and collaborators (Leymarie et al., 2014a). In Fig. 13, bottom row, we illustrate the use of sausage region retrieval, based on interior hot spot association, to obtain a description in terms of shape primitives of a cat in movement. This is to be compared with the use of primitives by a draughtsman, illustrated in Fig. 3, to create this particular cat series. The primitives recovered via hot spots are not exactly the same, but (we claim) are of similar type and illustrate a similar use of topology to connect these and of morphology to capture the main body parts and limbs and other features. Note that we do not here use the additional information provided by retrieved significant convex and concave features (Fig. 13, middle row) which should be useful to help further characterize parts (Hoffman, 2001); we leave this potential for further studies.
5.4. P-Medialness versus Blum’s MA
We emphasize here the similarities and differences between the original ideas of Harry Blum and their continuing development and multiple applications since, and the concept of perception-based medialness (or p-medialness). In essence, Blum’s MA is a generic representation for shape that maps to an abstract oriented graph: a connected diagram with flows along its trace with possible associated singularities or shocks. We emphasize the ‘abstract’ nature of the MA as a representation: it is a mathematical entity; it has no materiality, no width. It also provides for uniqueness thanks to its precision: for a given set of (abstract) sources of propagation (e.g., the mathematical curves delimiting a 2D object, or a point sampling of such a boundary), the resulting MA is unique, in terms of the combined trace of the graph and its associated radii function creating a flow along this graph (Leymarie, 2006b). The MA offers a number of interesting additional features which explain in part its success in computational fields since its introduction in the 1960s; we mention a few here (for a more complete treatment of the subject, refer for example to the monograph edited by Siddiqi and Pizer (Siddiqi and Pizer, 2008)): the topology of the object is reflected in the topology of the MA graph (e.g., a hole maps to a loop); significant curvature extrema map to MA branch end points; curvature breaks map to branch ending on the original contour; the original set of sources is recoverable by reversing the wave propagation starting from the MA trace and limiting in time the propagation using the associated radii function; most small deviations, protrusions, indentations, of a smooth boundary are captured by the MA (sometimes seen as a high sensitivity problem or limitation); any gap along a contour is represented as an MA feature (usually a neck or source of the bidirectional flow along an MA branch). We could add to this list, but these features should give some appreciation for the potential of Blum’s MA as a shape representation in the computational disciplines.
P-medialness as we have presented it, and as we think of it, can be seen as a generalization of the abstract nature of Blum’s MA towards a physically (and biologically) plausible representation which could be computed and held in a (real) neural network (by contrast with an artificial neural network which can process and sustain a purely abstract mathematical representation). While the evidence for a medialness operation in the visual neural system is mounting, and its usefulness in cognition is also becoming apparent (Sect. 3), its relation to artistic techniques and practices also appears promising. Noticeably, p-medialness removes the constraint of uniqueness associated to Blum’s MA: slightly different loci of the sources of propagation can result in the same or similar medialness fields. In particular, a series of similar dominant points can then be obtained for various, but sufficiently closely related line drawings. ‘Sufficiently’ close would need a further analysis to more precisely characterize which line drawings can be considered equivalent. We have yet to perform such an analysis. We consider p-medialness as a generalization of Blum’s MA because it includes it, or reduces to it: as we let the annuli width vanish (ε → 0)the resulting medialness field becomes equivalent to a classic distance map used to evaluate the MA (Note 16).
6. Using P-Medialness to Study Works of Art
We now explore in this penultimate section the application of medialness as a representation substrate for a class of works of (visual) art. Our study is only meant as an entry into the subject and thus clearly not exhaustive. We focus on works from two important 20th century artists—Picasso and Matisse. For all our observations, we provide commentaries derived from a (non-artist) reading of medialness maps and features. We propose that such detailed analysis is made more explicit and obvious by using the information present in the medialness maps and recovered feature points.
In Fig. 14 we first segment the color photo of the famous ‘Les Demoiselles d’Avignon’ painting by Picasso (a) (Note 17), isolating the five female figures in the piece (b) which we refer to by numerals 1–5 from left to right (from the observer’s vantage point). In (c) we show the medialness field in between the bounding canvas rectangular limits and the regions exterior to the female figures. Such a medialness field can be used to study the spatial relations between the female figures. For example, an apparent first dichotomy is observed between two groups of figures: Demoiselles 1, 2 and 3 have very little medialness field left in-between them, in the form of elongated inroads; a similar situations (with no intermediate space) is observed for the second group of Demoiselles 4 and 5. However, the medialness region in between the two groups is wider and rich in its shape features with multiple hot spots and ridge ends (Note 18). Also the two groups are entirely separated by medialness (from top to bottom), while within each group figures come in close contacts. Yet another interpretation could be of two bordering groups: Demoiselle 1 on the left, and Demoiselles 4 and 5 together on the right, with the 3rd group made by the ‘couple’ of Demoiselles 2 and 3 (the background and skin colorations and textures seems to emphasize this partitioning as well). Indeed, Demoiselle 1 is almost entirely separated from Demoiselle 2 by a thin elongated region of exterior medialness. The central couple also share some strong pose features (folded arms above head), which is made apparent when also considering the interior medialness.
We also exploit the exterior medialness field for each figure individually, such as illustrated in Fig. 14d. This field is used to retrieve significant concavities (of the body; or by duality, significant convexities of the background) located at the tip of medial ridges that end near the body. This process is repeated for each body individually. In (e) we show the medialness field for the interior of all five female figures and in (f) the result of our recovery of the three types of feature points based on medialness (a visualization we refer to as ‘vis. #1’). We propose that such feature maps and their underlying medialness fields can be used to conduct careful studies of artworks composed of distinct objects such as Les Demoiselles. The five figures occupy similar amount of space (in terms of retrieved medial disk areas). Some of the Demoiselles share similar parts indicated by local groupings of medial features: e.g., the folded ‘arm and elbow’ of Demoiselles 2 and 3 which is repeated in the folded ‘leg and knee’ of Demoiselle 4. The convex and medial hot spots of the breasts of the Demoiselles 1, 2, 3 and 5, in profile or facing the observer, are also highlighted.
In Fig. A1 (Appendix 3) we show our feature point analysis of ‘Les Demoiselles d’Avignon’ for the interior and exterior medialness fields, where we only illustrate as hot spots (with associated medial radius) those medial points with the highest 80–100% values of maximum medialness. We also approximate ridge following on the medialness field (thick paths) to link the various feature points. In this visualization (we refer to as ‘vis. #2’), we show significant convexities with interior arrows (single headed) and significant concavities with exterior arrows (double headed); the orientation of these arrows corresponds to the direction of the associated end of ridges of medialness. We note that our choices of parameters and thresholds remain purely experimental and can only be used with care as a more in depth study over series of works will be required, to provide perhaps some systematic methods of parameter selection. Nevertheless, such visualization choices are useful we propose for at least some interactive modes of analysis of such an artwork. The three main modes of visualization we have introduced (‘sausages’, ‘vis. #1’ and ‘vis. #2’) will be used in the remaining of this experimental section. The specifics of the parametric choices are detailed in Appendix 2. These three visualization modes are also juxtaposed in the next figure to illustrate their differences and similarities.
In Fig. 15 we discover Picasso’s lithographs of the ‘Rites of Spring’ and a ‘Bullfight’ explicitly available as (segmented) figure–ground images with superposed main medialness information added: feature points and associated medial disks, vis. #1. Below is shown the use of vis. #2, which emphasizes the relationships amongst medial features, while at the bottom we see the sausage regions that can be used to identify the main sub-parts of the drawn figures and surrounding objects. In the Rites of Spring we see two human figures with similar arm structures while the horns of the goat mimic (or respond to) these arms in extension and general orientation pointing upward and curving. Also, the leg and knee of the dancing human figure appear also repeated (mirrored) in the front leg structures of the goat. In the Bullfight, the horns of the bull respond in medialness structure to the spades of the toreador. The overall toreador body is vertically stretched and slightly curved upward and seems to respond and mimic the bull’s horizontal own stretch, also slightly curving; both poses can be captured via a LoA or simplified MA. We note that in such simplified depictions the meaning of drawn forms may remain ambiguous: the ‘smallest units of meaning’ are ‘the shapes of the regions (round or long) together with secondary properties such as being bent or being pointed’ (Willats, 2006). Such ambiguity creates a tension that may augment the interest of the observer in the overall art piece. In particular, in the Rites of Spring the volumes at the end of the human arms can represent various types of instruments, and might in this instance either represent flat ‘slabs’ or elongated generalized cylinders or ‘lumps’ (Willats, 1992) perhaps interpreted as different categories of percussive instruments.
In Fig. A2 (Appendix 3) we only highlight the eight face-like forms found in the famous depiction by Picasso of the massacre of Guernica: six of humans, one of a bull and one of a horse. The human faces have very similar medialness structures and recovered feature points descriptions, with protuberances highlighting the nose, mouth and chin which are used to emphasize a sense of orientation in space: crying towards the sky, gaping towards the massacre. In Fig. A3 (Appendix 3), in the top part, the exterior medialness for the eight faces makes explicit their relative position on the canvas and their approximate zones of influence. The boundaries between these zones also highlight the relationships between each face/character (Note 19).
In Fig. A4 (Appendix 3) we discover a study of the Bull form performed by Picasso between December 5, 1945 and January 17, 1946, where the artist progressively simplifies the form of a bull seen in profile, to finally converge on a compromise between the last penultimate two stages. In our analysis of the same series, Fig. A5 (Appendix 3), we notice that some structures are kept throughout the series: e.g., the head’s main convexities, while the main body (frame) of the bull is gradually simplified [with a diminishing number of hot spots (interior medial points)]. A similar simplification process occurs with the legs (noticeable in terms of medialness as the legs are eventually reduced to simple curves). The tail is eventually made clearly separated from the main body (rather than overlapping as in the initial sketches), and it too is simplified into an elongated curve; this is made more explicit in the analysis of the exterior medialness (Fig. A7, Appendix 3). From the exterior medialness we also can observe how the backbone and overall dorsal region is made smoother (rounder, upwards, with no concavities left). The artist finally converges on an ultimate version (January 17, 1946) which appears to represent a compromise by combining elements of the two penultimate stages (January 5 and 10, 1946), preserving the simplest structures for the legs and tail (January 10) and head and main body (January 5), while deciding to go for the more elaborate reproductive system and inner body lines of January 10. Both the interior and exterior medialness analyses (Figs A6 and A7, Appendix 3) make explicit the exploration of form Picasso underwent and help show how he simplified certain aspects of the body’s overall form and refined certain parts (in particular the legs, tail, head).
In Fig. 16 is illustrated a series of three Picasso drawings from the 1940s, mainly centered around the female form (originals can be found in the 1950 book by Jean Bouret—Bouret, 1950). These three drawings were used by Koenderink, Van Doorn and Wagemans in their extensive study on cartoon-style line drawings (Koenderink et al., 2012). They produced different analyses of these three drawings, and in particular tried to capture the 3D percepts that human observers report in filling-in the body surfaces in between Picasso’s well-delineated bounding contours. One of their analyses is based on contour fitting (of drawn strokes) and further analyses of curvature and pairings indicating strong MA cues and significant circular primitives (such as associated to the buttocks of the female bodies, Fig. 16, middle row). Our results are in general agreement with theirs. We provide finer detailed analyses as we compute medialness for a larger set of line traces since we do not require to carefully represent drawn strokes (our medialness can be computed from partial data, points or line segments). In these drawings, the long linear structures of the arms are made explicit by the ridges of medialness, while important body parts (e.g., buttocks, bulging knees, breasts) are well captured as hot spots with associated medial disks, bottom of Fig. 16 (refer to Fig. A8 in Appendix 3 for the use of the other visualization modes).
Starting with Fig. 17, we consider some works by Henri Matisse, another important artist of the 20th century. Matisse was part of the same group as Picasso of visual artists who emerged on the art scene of Paris in the early 1900s and had a major impact throughout their lifetime, often re-inventing their style and practice, all the while influencing each other as well as their contemporaries. In this figure, we have three examples of the famous series of blue cuts in which Matisse explored the female form. In terms of medialness structures, we can observe very similar limbs and body descriptions (alike generalized cylinders); breasts are singled out, the head plays a dominant role (with one or very few hot spots) but its pose varies. The foldings of the arms and legs are similar, and represent an exploration of various possible solutions in terms of occlusions, orientations and foldings (of hand and foot), which is made more apparent in the visualization of sausage regions (bottom row of Fig. 17).
In Figs A10 to A15 (Appendix 3), we perform a comparison of a series on female reclining nudes by Matisse who explored various changes in positions and slight deformation of body parts, while the background contexts progressively become more abstract. This series was photographed by Matisse during a period of six months in 1935 when he explored various changes in pose and parts, which lead to a finished piece, the Large Reclining Nude (or Pink Nude, Fig. A9). We study 16 of the original 22 photographs in the series here (the original photographs are part of the Baltimore Museum of Art collection). If we consider the interior medialness information (Figs A10 to A13), the final version represents a more symmetric pause (legs versus arms and head); the dorsal line is smoother, the large buttocks lead towards a finer waist line (creating an approximate triangular flow as a generalized cylinder, oriented from the buttocks towards the breasts). The legs are more neatly aligned, one (the left) leading to the end of the other (right) which is otherwise largely occluded. The knee is folding in response to the (right) arm over the head creating a (mirrored) symmetrical pause (left legs vs. right arm). Also, one hand leads to the bottom of the canvas in response to one foot (a possibility explored in a few previous frames). In this last iteration, the head is re-oriented straight up alike the breasts (main convexities closely aligned in orientation). When considering the exterior medialness information (Figs A14 and A15), we notice that by the ultimate stage, the negative space under the body has now taken the form of a clean ‘V’ shape (Fig. A15, bottom right), which underlines the smooth way the buttocks link to the back line up to the rest of the body on one side, and to the folded leg on the other side. By comparing the evolution of the main negative space above the reclining female body, we discover that it has become more rectangular in form and that various options were explored by the artist, with slight changes in relative positions of legs and arms and breasts. Finally, in the last frame by observing either the interior or exterior medialness, we see the final solution selected by the artist, where the body occupies space in a more rectangular format, where the arms and top parts of the legs are nearly parallel and oriented vertically.
6.3. Picasso and Matisse
For our final experiment, in Fig. 18, we compare a piece by Picasso, The Acrobat, with one by Matisse, Flowing Hair. Both illustrate the use of long cylindrical parts to convey movement on a static canvas. Both artists give a main overall vertical orientation to the ‘body’, but all limbs and head and neck are oriented at uncomfortable angles, yet giving a strong sense of the movement and pose reached by the subject. From the interior medialness depictions we see a similar use of sausage regions. The head is given more details (convexities and concavities) for the Acrobat, while the breasts are made explicit for the female figure in Flowing Hair. From the exterior medialness depictions, we see that Matisse uses parallelism repetitively, with the arms, hair strokes, and neatly aligned legs, while Picasso, instead forces the extruding body parts of his Acrobat to explore a rectangular frame at various angles and folds, where no two parts is in parallel. Matisse use of medialness suggests flow lines, while Picasso’s evokes the stillness of the acrobat’s body obtained via contortions.
Medialness is shown to have deep roots in perception and cognition, the arts as well as computational models of shape understanding. More recently, evidence is mounting in the literature that parts of the visual cortex are likely involved in some forms of medialness computations and uses. Much remains to be explored however before we have a clear understanding of the various layers of representations and processing involved, in particular as complex feedback loops are integrated by our nervous system.
Inspired by such evidence emerging in these various disciplines, we have engaged in developing a computational framework on the basis in particular of the models proposed by, on one hand, Kovács et al. (Kovács et al., 1998) (medialness and hot spots) and, on the other hand, Hoffman and Richards (Richards and Hoffman, 1985) (contour codons), themselves inspired by the early work and proposal of Attneave (Attneave, 1954). Our combination of these two apparently different representations (one of regions, the other of contours) can be unified—a concept we refer to as p-medialness (for perception-based medialness)—as shown in our recent work (Aparajeya and Leymarie, 2016; Leymarie et al., 2014b), by relating medialness to contour features by identifying end of medial ridges to so-called ‘curvature extrema’. As indicated by recent studies in perception and cognition models, such extrema are better thought of as combining significant curvature peaks with regional support (De Winter and Wagemans, 2008), rather than referring to the traditional mathematical definition biased towards a purely local concept and analysis.
Our present study using medialness is mainly applied to solid shapes (i.e., with a defined interior) and their associated negative space (i.e., exteriors) (Note 20). One line of extension is to consider imprecise regions and forms with no well defined interior-exterior relationship, i.e., no clearly delineated boundaries (Koenderink et al., 2016). The original definition of medialness by Kovács et al. may come to our rescue as it is not dependent on having solids (as it does not rely on a strict segmentation of interiors via its lack of exploitation of a consistent orientation along contours). Implementing medialness and its representation in terms of hot spots and significant representatives of extrema of curvature, for this important more general type of forms present in artistic renderings, remains to be explored and we leave it as future work (Note 21).
We think of visual artists as experts in perception who, in order to explore novel visual representations, need to incorporate in their practice a deep, although often intuitive, understanding of how humans perceive and represent the shape of objects. This intuitive knowledge, apparent for example in drawing techniques, once made explicit can be used to guide more elaborate computational models and perception studies (Koenderink et al., 2012; Tresset and Leymarie, 2013). Our exploration of the use of medialness to study some well-known art works from the 20th century, by Picasso and Matisse, represents an initial step. More careful studies are needed to specify useful parametric ranges, useful visualization modes, and have artists, art historians, and other specialists use and comment on the framework when seen as a toolbox for exploring the works of other artists or of one’s own art. Other topics can also be studied using the current framework, including figure–ground completion (Layton et al., 2014), the relationship with attention (Bertamini and Wagemans, 2013), as well as the perception of movements (Leymarie et al., 2014b), which was the initial motivation behind Kovács et al.’s work (Kovács et al., 1998). Such topics are relevant to the disciplines that have inspired our study, from the arts, to perception, via mathematical and computational models of shape understanding.
1. Tension field: At every position in the visual space a tension gauge is assumed to exists which can take different values and have different orientations, possibly many. Mathematically speaking, one can think of the tension gauge as a vector of vectors or as a tensor, which can be associated to every spatial locus of interest. The tension gauge can reduce to a single vector, e.g., where the amplitude represents a force measurement, or even reduce to a single value, without any designated orientation (aka as a scalar), e.g., the nearest distance to a contour or drawn line.
2. Shape is the structure of the generated field surrounding an object’s trace (Leymarie, 2006b). This field typically refers to geometric entities—such as curvatures, singularities (of some appropriate mappings), other gauge figures—which may exist in association with each sample of the object being scrutinized. In order to identify or measure such entities, we will have to ‘probe’ the field. In this communication the geometric entities we use will be expressed by probing a medialness field.
3. In this work by ‘shape’ we refer to the descriptive features characterizing an object being depicted, for example on a 2D canvas, and by ‘form’ we refer to a class of shapes sharing important similarities. For example, a particular triangular object will see its shape described by three corners (with given angles), a circumcenter and three edge segments. The triangular form will refer to a class of similar objects sharing the same shape description, but possibly with varying parameters (angles) and qualities (curved versus straighten edges). This is a computational point of view on the distinction between shape and form (Leymarie, 2006b). For some in the visual arts or in architecture, ‘form’ refers to descriptive features of a 3D depiction of an object, while ‘shape’ refers to the 2D case. Note to the reader: in this article we adopt the computational point of view that proves useful for the context of our study.
4. This invention of Maray was rediscovered in the early 1970s by G. Johansson (Johansson, 1973; Kovács, 2010) in his dot pattern studies on human and animal movement. Maray’s invention is nowadays commonly implemented in commercial systems using multiple camera sensors that follow reflective patterns (usually made of dots) worn by actors in studios built with matte uniform backgrounds. These systems are often referred to as Computerized Movement Capture (aka mo-cap) and are used to transfer movement and animation sequences, e.g., for producing moving virtual characters in films and games (Amor et al., 2009).
5. Such curvature extrema are closely related to medialness as the loci of terminations when following ridges in the medialness field, a notion we exploit in our proposed computational scheme presented in Sect. 5.
6. The medial axis or MA is defined as loci at equal and minimum distance from two or more sources usually taken as contours or edge points. In 2D applications these points generate a graph made of branches (or axes) corresponding to loci at equal minimum distance from pairs of sources. Such axes can meet at junctions corresponding to loci at equal minimum distance from three or more sources (Blum, 1973). NB: there is close relationship between the medial axis of Blum and the (generalized) Voronoi graph (Okabe et al., 2000) made popular in computational geometry (Leymarie, 2003).
7. In Arnheim’s description, the field is typically computed over the empty spaces left between the outward boundaries (of a canvas) and the traces of drawn or painted lines (Arnheim, 1974).
8. Named ‘codons’ by analogy with the term used in genomics to describe triplets of nucleotides as basic coding units.
9. Differential contrast sensitivity maps: given a boundary of an object in an image sampled by Gabor patches together with numerous additional random (in location and orientation) Gabor patches as distractors, decrease gradually the contrast and notice those loci which remain above detection thresholds the longest (Kovács and Julesz, 1994).
10. Historical note—This exploration by Deutsch of a possible neural mechanism to study shape was being elaborated at the same time Harry Blum was independently developing his own related similar ideas from a computational perspective and inspired by his early work in radio engineering (Blum, 1962a, b).
11. This set of two ‘dual’ approaches mimics the wave-particle thinking in classic physics when describing propagation of information (in the context of geometric optics): particles moving along rays (such as in Fermat’s optical pathways) versus moving envelopes of wavelets (as in Huygen’s principle of wave propagation).
12. More precisely, Arnheim is thinking of an approximate full symmetry set (SymSet) for the space between the canvas and the drawn outlines (e.g., see Arnheim, 1974, Fig. 3, p. 13). The SymSet is obtained by tracing every possible pairwise symmetries between outline fragments under consideration; it includes and extends Blum’s MA; e.g., for a rectangle it includes the entire vertical and horizontal main central axes (Siddiqi and Pizer, 2008).
13. This dynamic view of the 2D MA is called by Leyton the ‘Process Inferring Symmetry Axis’ (or PISA). Many of the theoretical ideas of Leyton have yet to be implemented and tested on real images; e.g., how to retrieve robustly a plausible history or sequence of PISA traces from an image of a real painting remains unanswered. It is interesting however to notice that practicing artists often themselves talk about the recovery of the traces (e.g., of a painting brush) as a way to characterize one artist’s production from another.
14. Historical note—This generality in the application of medialness to both the interior and exterior of an object, or even to the traces of segments not necessarily delimiting closed contours, was already known and used by Harry Blum and his collaborators from the 1960s (Blum, 1962a, b, 1967). This is to be found in early papers from that period and in particular in his long manuscript (alike a manifesto and main thesis, rich in concepts and with an extended bibliography) published in the Journal of Theoretical Biology (Blum, 1973). Unfortunately, the original idea of Blum’s MA as applicable to the interior and exterior of objects, or even to open contour segments or point distributions, has often been forgotten or ignored since the early ground work was done. Too often the MA is presented, even recently, as defined only for the interior of closed contours (i.e., for the interior of solids), what we can only qualify as a mistake and historical oversight.
15. This can be proven (mathematically) in the case of the MA proper, i.e., when we let the tolerance ε go to zero.
16. Historical note—Such a notion of medialness as an alternative to Blum’s MA has been proposed in the past in a few variants (we are aware of), including the works of: (i) Pizer et al. on ‘cores’ and multiscale skeletons derived from gray-scale images (Pizer et al., 2003; Siddiqi and Pizer, 2008), (ii) Levine and Kelly on annuli computations (Kelly and Levine, 1995), (iii) Van Tonder and his HST (Hybrid Symmetry Transformation) model (used in particular for landscaping depiction) (Van Tonder and Ejima, 2003; Van Tonder and Lyons, 2005), and (iv) the concept of fuzzy skeletons (Bloch, 2008) and other probabilistic field computations, including the more recent work on Bayesian modeling (Feldman and Singh, 2006; Froyen et al., 2015). We note that the focus of these other approaches is towards the retrieval of an abstract graph or mathematical equivalent to Blum’s MA.
17. We use a color-based segmentation built from the watershed technique of mathematical morphology (Bloch, 2008; Serra, 1988).
18. The roles of exterior and interior regions can be interchanged, e.g., if the focus is on the ‘negative spaces’ (such as in architecture (Leymarie and Kimia, 2008)).
19. Formally speaking, these zones of influence are alike generalized Voronoi cells, a concept well studied in computational geometry and often applied in computer vision and graphics for space partitioning problems (Okabe et al., 2000; Siddiqi and Pizer, 2008).
20. Although most of the examples provided in this article are for well segmented figure–ground objects, we have some flexibility already built-in the computational framework, by allowing for open ended lines and contours to also be considered: e.g., this is visible in our treatment of breasts in some of Picasso’s female bodies (Figs 14 and 16). A more complete approach to be able to process general line drawings, without well defined boundaries or explicit figure–ground segmentation (Koenderink et al., 2016), will require a more ambitious method and implementation, which we leave for future investigations.
21. With no well defined closed boundary, the distinction between convex and concave disappears, and we are left with the sole notion of local regions of high curvatures and bends which can still be characterized by how a ridge of medialness points and end in their vicinity.
Amor H. , Berger E. , Vogt D. and Jung B. (2009). Kinesthetic bootstrapping: Teaching motor skills to humanoid robots through physical interaction, in: KI 2009: Advances in Artificial Intelligence, Mertsching B. , Hund M. and Aziz Zaheer (Eds), Lecture Notes in Computer Science, Vol. 5803, pp. 492–499, Springer, Berlin, Heidelberg, Germany.
Hoffman D. D. (2009). The interface theory of perception: Natural selection drives true perception to swift extinction, in: Object Categorization: Computer and Human Vision Perspectives, Dickinson S. , Leonardis A. , Schiele B. and Tarr M. J. (Eds), pp. 148–165, Cambridge University Press, New York, NY, USA.
Kovács I. (2010). ‘Hot spots’ and dynamic coordination in Gestalt perception, in: Dynamic Coordination in the Brain: From Neurons to Mind, von der Malsburg C. V. , Phillips W. A. and Singer W. (Eds), Strüngmann Forum Reports, pp. 215–228. MIT Press, Cambridge, MA, USA.
Leymarie F. F. and Kimia B. B. (2008). From the infinitely large to the infinitely small, in: Medial Representations: Mathematics, Algorithms and Applications, Siddiqi K. and Pizer S. M. (Eds), Computational Imaging and Vision series, Vol. 37, pp. 369–406, Springer, Berlin, Heidelberg, Germany.
Van Tonder G. J. (2006). Order and complexity of naturalistic landscapes: On creation, depiction and perception of Japanese dry rock gardens, in: Visual Thought: The Depictive Space of Perception, Albertazzi L. (Ed.), Advances in Consciousness Research, Vol. 67, pp. 257–301, John Benjamins Publishing, Amsterdam, The Netherlands.
The medialness at a point p in the image space is defined by computing the modified D ε function as a distance metric D ε ∗ (to boundary segments). This is a sampling function constrained to annular sectors. Given a minimum radius function R(p) with center locus p which defines the interior shell of the annulus, and R max(p) = R(p) + ε the exterior shell, a given sector S i is specified by an angular opening θ i (in radians) with bounding segments defined by the intercepts t i and t i+1. The area of the sector is then: A i = θ i(R + 0.5ε) × ε. In practice the intercepts bounding a sector [t i, t i+1] are given by the extremal points of contour or edge segments entering and exiting the annulus, i.e., crossing either of its interior or exterior shells (Fig. 9 right). The medialness measure is then taken as a sampling over the sum of two or more annular sectors containing boundary information (i.e., with contour or edge segments). What is actually measured in each sector is application dependent; in our case one can count the amount of boundary information present in each sector, e.g., the number of edge pixels, or the contour segment length or a binary counter for fixed-length elementary sub-sectors (or bins for fixed sub-tended elementary angular steps, δθ), or simply compute the area of the sector, A i. We denote the measure taken over an annular sector S i : S i. Then, we can express the medialness measure first proposed by Kovács et al. (1998) in a general form as:
This formulation implies that at least two separate annular sectors are traversed by boundary information. Notice that when the tolerance value ε reduces to zero, D ε reduces to a maximal inscribed disk and leads to the traditional medial axis (MA) graph measure. One noticeable drawback of this definition when seeking to retrieve dominant points is that it does not make a distinction with neighboring boundary segments part of separate object parts and which ought not to be considered in the support annular zones; e.g., this may be the case with the fingers of one’s hand when these are kept near each other (Fig. 10). This imprecision in D ε can be remedied by introducing a measure of orientation at boundary points when assuming figure–ground segmentation (i.e., when knowing the interior versus the exterior of an object); we use such information (e.g., when obtained from a traditional gradient boundary filter) to modify the medialness function, resulting in D ε ∗ as follows.
for a point p = [x
p], vector b(t) = [x(t),y(t)] describing the 2D bounding contour (B) of the object, and such that
The medialness of a point p depends on two parameters: R(p) and ε, where R(p) is the minimum radial distance between p and bounding contour, and ε is the width of the annulus region (capturing object trace or boundary information). We can think of the width of the annulus as dictated by ε as an equivalent to a scaling parameter: the larger the width, the more averaging of nearby contour information is considered. How to set the tolerance ε in order to have desirable scaling properties thus needs to be addressed. We have argued elsewhere in setting ε as a logarithmic function of R(p) with a logarithmic base of value x = 1.5 (Aparajeya and Leymarie, 2016):
To select points of internal dominance, a ‘black’ top-hat transform (Serra, 1988; Vincent, 1993) is applied, resulting in a series of dark areas which typically correspond to peaks, ridges and passes of the medialness map when considered as a height field. Figure 11b shows the result obtained after applying the black top hat transform on a medialness image.
We still require processing further the output of the top-hat transform to isolate the most dominant points amongst the remaining selected medialness loci. We also consider the cases where the resulting ridges are more like plateaus and thus rather flat at their top. In order to identify isolated representative dominant points for such plateaus we ‘pull-up’ such flatten regions and map the central locus of a plateau to the highest local peak value (Aparajeya and Leymarie, 2016). To provide some control on the possible clustering of dominant points, a flat circular structuring element of radius ε p (but of at least two pixels in width) is also applied over the output of a top-hat transform to pick maxima. We also impose that no remaining points of locally maximized medialness are too close to each other; this is currently implemented by imposing a minimum distance of length 2ε p is taken between any pair of selected points. An example of applying these steps to identify interior dominant points in medialness is given in Fig. 11c.
We illustrate the choices that can be made in visualizing the medialness information with three modes in this article, for which we summarize the main features.
A2.1. First Mode: Vis. #1
The minimum separation between selected hot spots (dominant medialness loci) is set to 2 × ε (twice the annulus operator’s width).
We indicate an associated medial disk (radius + annular width) using a thick circle. The center is differently shaded and the corresponding dot size reflects the medialness value.
Both concavities and convexities measures are performed by doing contour analysis. To do such an analysis, we used two operators: (i) length of support—this basically avoids small bumpy regions in the shape; and (ii) threshold angle—this limits the angle (of opening) of the concavity or convexity.
Detected concavities and convexities are projected on the contour.
NB: This approach works well in detecting sharp concavity/convexity, but it fails in those cases where the region of support of a concavity/convexity is relatively small or very large. For example small bumps and large circular structures will be ignored by this method to be counted as the candidates of concavity/convexity.
A2.2. Second Mode: Vis. #2
The minimum separation between selected hot spots (dominant medialness loci) is set to ε (the annulus operator’s width). This tends to generate more dominant points (hot spots) than for vis. #1. It brings us closer to a medial axis graph structure (as ε becomes smaller). The center is shaded and the corresponding dot size reflects the medialness value.
Again, we indicate an associated medial disk (radius + annular width) using a thick circle, but we do this only for those hot spots whose medialness value is equal or more than 80% of the maximum value (for a given image). Thus only the top 20% of hot spots having contour support are illustrated.
Both concavities and convexities measures are done by analyzing medialness values (concavity via external medialness, convexity via internal medialness). Only ends of medialness ridges are considered as candidate convexities/concavities. The equivalent of ‘length of support’ in terms of medialness is used to decide on which candidates to keep.
We use arrows to indicate the orientations of concave and convex dominant points.
The ridge trace left from the hat transform filters is thinned downed and showed as a trace of varying thickness (still reflective of the local medialness values). This visualizes an approximate path of medialness linking the various features.
A2.3. Third Mode: Sausage Regions
Given a set of retrieved hot spots, a first corresponding annulus is selected (typically: one of the highest peak in medialness).
Nearest neighboring hot spots along a connecting ridge are checked.
If the corresponding annulus of a nearest neighbor sufficiently overlaps, then combine into a larger sausage region**.
Iterate until no more nearest neighboring hotspots are found along a given ridge; then move on to check another significant hot spots not yet considered (typically located on a different medialness ridge).
** A sausage is obtained by combining overlapping selected annuli, such that a minimum amount of the area of the disk—associated to the annulus last selected—sufficiently overlaps with the current sausage region (in our examples, we use a threshold of 33% minimum overlap). The arcs of overlapping annuli that are interior to the combined region are removed, resulting in more or less elongated ovals and other tubular forms. The resulting thickness of the final sausage region boundary is taken as the average of the combined annuli widths (ε values).
This Appendix list a number of additional figures referenced to in the main part of the article:
Figure A1: Les Demoiselles d’Avignon by Picasso: Visualization of interior and exterior p-medialness features.
Figure A2: Guernica by Picasso: visualization of interior p-medialness features of faces.
Figure A3: Guernica by Picasso: visualization of exterior and interior medialness fields.
Figure A4: Study of the form of a bull by Pablo Picasso.
Figures A5 to A7: Study of the form of a bull by Pablo Picasso: visualization of interior and exterior p-medialness features.
Figure A8: Women series drawn by Picasso: visualization of interior p-medialness features.
Figure A9: Series of 22 reclining nude female forms by Matisse.
Figures A10 to A15: Series of reclining nude female forms by Matisse: visualizations of interior and exterior p-medialness.