Sustainability Performance of Certified and Non-certified Food

The dataset Sustainability performance of certified and non-certified food (https:// www.doi.org/10.15454/OP51SJ) contains 25 indicators of economic, environmental, Research Data Journal for the Humanities and Social Sciences (2021) and social performance, estimated for 27 certified food value chains and their 27 conventional reference products. The indicators are estimated at different levels of the value chain: farm level, processing level, and retail level. It also contains the raw data based on which the indicators are estimated, its source, and the completed spreadsheet calculators for the following indicators: carbon footprint and food miles. This article describes the common method and indicators used to collect data for the twenty-seven certified products and their conventional counterparts. It presents the assumptions and choices, the process of data collection, and the indicator estimation methods designed to assess the three sustainability dimensions within a reasonable time constraint. That is: three person-months for each food quality scheme and its non-certified reference product. Several prioritisations were set regarding data collection (indicator, variable, value chain level) together with a level of representativeness specific to each variable and product type (country and sector). Technical details on how relatively common variables (e.g., number of animals per hectare) are combined into indicators (e.g., carbon footprint) are provided in the full documentation of the dataset.

6 and social performance, estimated for 27 certified food value chains and their 27 conventional reference products. The indicators are estimated at different levels of the value chain: farm level, processing level, and retail level. It also contains the raw data based on which the indicators are estimated, its source, and the completed spreadsheet calculators for the following indicators: carbon footprint and food miles. This article describes the common method and indicators used to collect data for the twenty-seven certified products and their conventional counterparts. It presents the assumptions and choices, the process of data collection, and the indicator estimation methods designed to assess the three sustainability dimensions within a reasonable time constraint. That is: three person-months for each food quality scheme and its noncertified reference product. Several prioritisations were set regarding data collection (indicator, variable, value chain level) together with a level of representativeness specific to each variable and product type (country and sector). Technical details on how relatively common variables (e.g., number of animals per hectare) are combined into indicators (e.g., carbon footprint) are provided in the full documentation of the dataset.
Keywords sustainability performance -economic performance -environmental performance -social performance -certified food -protected designation of origin -protected geographical indication -organic farming Online publication date: 13-12-2021 -Related data set "Sustainability performance of certified and non-certified food" with doi www.doi.org/10.15454/OP51SJ in repository "Data inrae"

Introduction and Research Problem
EU and national food quality policies have witnessed recent reforms. In 2007, the EU agreed on a new Council Regulation (Council Regulation (ec) No. 834/2007) setting out the principles, aims, and overarching rules of organic production and defining how organic products were to be labelled. In 2012, the Quality Package (Regulation (EU) No. 1151/2012) was passed to improve and promote the operation of schemes to protect Geographical Indications (gi s) for agri-food products. The Regulation details the rationale for establishing/ bellassen et al.
Research Data Journal for the Humanities and Social Sciences 6 (2021) 1-22 7 promoting gi s as a means to generate a fair return for farmers and producers for the qualities of particular goods and to enable consumers to make better-informed purchasing decisions through effective labelling. The diversity and quality of EU agricultural and fisheries production are one of its main strengths in both domestic and international markets. Supporting Food Quality Schemes (fqs s) -here understood as Protected Designation of Origin (pdo), Protected Geographical Indications (pgi) and organic products -is thus regarded as consistent with Europe 2020 policy priorities for 'sustainable and inclusive growth' , which seek to achieve competitive and high employment economies (economically sustainable) delivering social and territorial cohesion (socially sustainable), while paying attention to the burden placed on the environment and natural resources (environmentally sustainable). But are fqs s really more sustainable than other food products?
To answer this question and as part of the H2020 Strength2Food project, we gathered raw data on 54 food value chains spanning over 13 countries. The sampling design is paired: 27 certified -pdo, pgi, or organic -products and 27 reference products (products similar to the certified value chain but not certified). This raw data allows for the estimation of 25 performance indicators covering the three sustainability pillars: 9 economic indicators, 7 environmental indicators and 9 social indicators.

Methods
Disclaimer: being a summarized description of the method used to estimate sustainability indicators, this article largely draws from two existing documents from the same authors: Bellassen et al. (2019) and Bellassen et al. (2016). More technical details on the Methods are available in the data repository.
- Overview of Indicators and Minimal Systematic Comparison The choice of indicators was made on the basis of the safa methodology (Sustainability Assessment of Food and Agriculture systems) developed by the Food and Agriculture Organization of the United Nations (2013) to measure the sustainability of food production. safa provides guidelines on how to consider each sustainability dimension, including which indicators could be sustainability performance Research Data Journal for the Humanities and Social Sciences 6 (2021) 1-22 relevant and useful indications on how to implement them. safa, however, is primarily focused on processing firms and stops short of formulating a complete method which goes from primary data collection to indicator estimation and interpretation.
The indicators presented in this document operationalise a subset of safa indicators, complementing them along the following three lines: -Most safa indicators cannot be directly implemented from the safa indicators report. They require the definition of specific data to be collected and calculation or aggregation methods which are not explicated in the report, although the report sometimes refers to existing tools for doing this. Our method defines all necessary data and variables, and provides associated calculators or aggregation methods, together with a data storage and source traceability system. -Because they were designed to be collected for a single firm, many safa indicators require a substantial amount of data. This makes it difficult to cover more than a few indicators for an entire value chain within 3 person-months. Our method simplifies indicators by prioritising data collection on the key drivers of the indicators, by providing default values for many non-key but necessary variables and, where necessary, by restricting the scope of an original safa indicator down to the scope for which data is most accessible. As a result, it is possible in most cases to estimate 25 sustainability indicators across the three sustainable development pillars for both a specific product produced by several firms and a generic reference product in 3 person-months. -Finally, several safa indicators rely only on the subjective views of specific stakeholders. Where stakeholder views are a necessary part of the indicator (e.g. bargaining power distribution), our indicators combine stakeholder views with objective data.
To make the collection of information and the subsequent analysis on the twenty-seven case studies efficient, operational choices were made concerning the type of indicators and their management. One of the most important choices is the distinction between "systematic indicators" which should be computed on all case studies and "complementary indicators" which concern only a subset of case studies, often based on data availability. There was a total of 13 systematic indicators (four economic; four environmental; five social), and a total of eleven complementary indicators (five economic; three environmental; four social). Around 150 variables were collected and refined into the 25 indicators (see Table 1). bellassen et al. Relative Difference and Value Chain Averages Indicators are estimated at each level of the value chain (farm level, processing level and, where relevant, retail level). To control for country and product specificities, we analyse relative differences between the fqs and its reference product rather than absolute values.
Equation (1) is used, where rel_diffj is the relative difference for an indicator at level j of the value chain, and indic FQS, j and indic FQS, j are the indicator value at level j of the value chain for the fqs and the reference product respectively.
For environmental indicators and for bargaining power distribution, the opposite of the relative difference is used in the analysis so that a positive difference consistently indicates higher performance of fqs (e.g., more added value, more employment, lower carbon footprint).
In a second step, to assess the difference in performance for the entire value chain, we compute aggregated values or "value chain averages as shown in equation (2) bellassen et al.
For most indicators, these aggregated values are simply averages across value chain levels for which the indicator could be estimated (farm, processing and, where relevant, retail). There are, however, two exceptions. The first exception concerns indicators expressed on a per ton basis, that is the environmental indicators and the labour to production ratio. Because these indicators follow a life cycle assessment logic, and in particular because they use a functional unit (one ton of product), aggregated values over the value chain must be calculated cumulatively. If one ton of cheese requires 10 tons of milk, the aggregated indicator sums the footprint of 10 tons of milk at farm level and 1 ton of cheese at processing level rather than averaging the footprints of one ton of milk and one ton of cheese. This cumulative process also allocates the footprint to all products (e.g., milk and meat at farm level) based on their relative economic value. For environmental indicators, this is already done in the estimation of the indicator. For labour to production ratio, the formula is provided in equation (2).
The second exception concerns the indicator on value chain stability for which the aggregated value is the minimum across value chain levels. In equation (2), the vc average is the aggregated performance difference for the entire value chain, rel_diffj is the relative difference in performance at level j of the value chain (see equation (1)), n is the lowest level of the value chain where the indicator could be estimated (most often the processing level), cum_indicX is the cumulative indicator over different value chain levels for product X (either fqs or reference), indic X,farm and indic X,proc are the indicator value for product X at the farm and processing levels respectively, final_prod_ratio is the amount of raw product at farm level (e.g., milk) necessary for one ton of final product (e.g., cheese), and coproducts_farm and coproducts_proc are the value of coproducts (e.g., meat) expressed as a percentage of the value of the main product (e.g., milk) at farm and processing levels respectively.
Selection of a Reference Product/Case: Elements of Guidance To provide a basis for comparison, each sustainability indicator has been estimated for the same product category (for example cheese) in two different value chains: specific quality (organic or geographical indication) and generic quality (reference product). To define the reference, the following guidance, composed of two objectives and three constraints, was applied. The two objectives are: -comparability of contexts: the two cases (food quality scheme and its standard reference) should be produced in territorial contexts (in terms of location) as similar as possible; sustainability performance Research Data Journal for the Humanities and Social Sciences 6 (2021) 1-22

14
-comparability of the products: the two products/basket of products (food quality scheme and their standard reference) should be as comparable as possible. These objectives should be sought until one of the three following constraints are met: -data resolution limit: data for the reference are only available at a larger scale than for the case studied; -confusion of the case and its reference: for example, for an apple under geographical indication (gi), the reference would ideally be the production of "standard" apples in the same area. Nevertheless, if almost all the apple production of that area is under gi, a reference should be chosen at a larger scale (regional or even national scale); -the case studied is the only one of its type: with the example of an apple under gi, the ideal reference would be a standard apple of the same variety. Nevertheless, as mentioned for geographic scale, data may be scarce at this detailed level (variety), or even all the apples of this variety may be sold under gi. In this case, a suitable reference would be one, or a mix of, the main varieties. In practice, the choice of a relevant reference by case study conductors will strongly depend on data availability, so that a national average can be used if a more suited reference cannot be documented. Moreover, a mix of specific references and national averages can be used. For example, looking at the Comté cheese, some variables (e.g., price of milk, price of cheese, …) may be specific to Emmental, a non-certified ripened, hard, cow-milk based cheese, while national averages are used for other variables (e.g., quantity of mineral fertilizer per hectare, share of exports over total production, …) for which Emmental-specific data are not readily available.
Note that the use of the reference is primarily to interpret the results from the case, so even if the reference presents some peculiarities, this can be accounted for in the discussion of results. An extreme case of such peculiarities is Sjenica cheese in Serbia. Because it is almost the only sheep cheese produced in the country, the reference product is a conventional cow cheese. But as a result, many differences between Sjenica cheese and its reference are better explained by the difference between sheep and cow than by the technical specifications or the terroir of Sjenica cheese. For this reason, Sjenica should be excluded from most cross-comparisons. To the contrary of many performance assessments, we thus opted for real relative references as opposed bellassen et al. to normative references, that is references which correspond to fictive cases or to targets to be reached (Acosta-Alba & Van der Werf, 2011).
Value Chain Diagram The first step in data collection was to identify the firms which belong to the value chain (see the box text for the criteria) and to classify them into different levels (e.g., farm level, processing level, retail level). This first step resulted in a value chain diagram which is inserted in the second sheet of each data file and provides the code of each value chain level.

Box.
Criteria Used to Identify Which Firms Belong to the Value Chain When firms are making only part of their turnover from the fqs product (- e.g., a freezing plant which is freezing and packaging all kind of fruits, including the fqs (organic raspberries) -criteria are needed to determine whether they belong to the fqs value chain. The key recommended criterion is that the firm makes at least 50% of its turnover from the fqs product. As such, most firms at retail level will be excluded. However, a few systematic or ad hoc exceptions are made:) -The retail level is included for two economic indicators, namely price premium and export; -A firm/value chain level can be retained on an ad hoc basis when its impact on an indicator is substantial (e.g., impact of freezing on the carbon footprint of frozen raspberries); -A firm/value chain level can be retained on an ad hoc basis when stakeholders consider it as part of the value chain despite it making less than 50% of its turnover from the product. In other words, most of the data collection/gathering effort should be spent on key variables which contribute to systematic indicators, while the rest should only be provided if data is readily available, and should not be the object of a dedicated data collection effort.

2.2.2.3.
Relying on Existing Sources of Information In general, given the resource and time constraints, most variables were designed to be common enough to be obtained from existing studies, reports and databases. A good strategy for a comprehensive overview of existing sources may be to conduct a few (3-5) interviews with key stakeholders in the chosen case study's value chain.

2.2.2.4.
Default Values In parallel to case-by-case data collection, an effort was made to obtain national average values for as many variables as possible, and cover all the sectors studied (dairy, meat products, seafood/fish, cereals, fruits & vegetables). These values do not refer to specific products but to larger product categories which can be identified in systematic surveys. For this purpose, databases with pan-European coverage, such as the Farm Accountancy Data Network (fadn) and different surveys and datasets available via Eurostat database (i.e., Farm Structure Survey, Structural Business Statistics, Labour Force Survey, etc.) have been explored.
These default values were used in three different manners: -to check that the collected data for the case and/or its reference is of a reasonable order of magnitude; -to estimate indicators for a "national average" reference product; -to save time on data collection when there is evidence (e.g., expert judgement) that a given variable is not significantly different from the national average. This last option was infrequently used and, in all cases, data sources for each variable and product are transparently documented in the data repository.
Principles Considering the scale and the complexity of data collection (measuring the sustainability level of 54 products using 25 indicators referring to the environmental, economic and social dimensions of sustainability), an organizational model was developed. This thorough quality check procedure was bellassen et al.
implemented to limit the risk of misreporting data. The three key aspects of this procedure were 1) to record all data, their date and source in a shared spreadsheet, 2) to separate the person who collected data from the person who estimated the indicator, and 3) to come up with a written and consensual interpretation of the results between these people.
The most important principle of the procedure for data collection and indicator estimation is an early and repeated interaction between the case study conductor and the indicator coordinator (see Figure 1). The case study conductor is responsible for collecting the data and ensuring its traceability, which implies creating a repository with all source files and intermediary calculations. The indicator coordinator is responsible for the quality check of the data provided (e.g., verifying, together with the case study conductor, the source when an order of magnitude seems wrong, etc.) and for providing the case study conductor with the estimated indicator(s). Both are responsible for interpreting the results. Results are considered valid only when both the case study conductor and the indicator coordinator agree that the estimated resulting indicators are plausible and, should a large difference occur between the certified product and its conventional reference, that they have plausible ways of explaining this large difference. Those conducting case studies received initial guidance and tips for data collection, and regular online meetings were organized to share data collection practices and problems and thus ensure consistency across case studies.

2.2.4.
Metadata Documentation For each variable value, two metadata items were documented: -the source/reference for the values (e.g., "Dupond et al., 2010"); -to which time period the variable's values correspond. Time periods should be as recent as possible, and to the extent possible, similar between different variables. When relevant and available, time-series and/or multi-year averages can be used. In addition, all original documents from which the data are sourced and the intermediary calculations (e.g., excel or word documents) have been stored in an online repository, so that both the case study conductor and the indicator coordinator can go back to them easily to double-check some values or interpret the results.

Description of Indicators, Their Purpose and Their Estimation Method
The exhaustive list of the raw data collected and the technical details of the method to estimate the 25 performance indicators based on this raw data are provided in the full documentation posted in the data repository. Table 2 sets out the sample characteristics where the sectors are highlighted (red, green and blue lines for animal, vegetal and seafood respectively). The indicated turnover is either at processing or farm level, whichever is higher. Arfini & Bellassen (2019) provides a detailed description of each value chain, its structure, its governance and its sustainability performance.
As a result of the quality check procedure described in 2.2.3, the applicant-pgi Sjenica sheep cheese was removed from the sample: its reference product is a cow cheese, and the difference between cow and sheep was identified as the main drivers for the differences in performance. The procedure also resulted in the exclusion of employment indicators at processing level for pgi Doi Chaang coffee and pgi tkr Hom Mali rice, for which differences between certified food and its reference were both high and unexplained. bellassen et al. Note: Red, green and blue shading denotes animal, vegetal and seafood sectors respectively. The indicated turnover is either at processing or farm level, whichever is higher.

Reuse Potential
We anticipate two major avenues for reuse of the dataset Sustainability performance of certified and non-certified food. First, the analysis we have made so far and submitted to academic journals is not exhaustive. Sector-specific or standard-specific analysis are only sketched and have not been synthesized.
Other cross-comparisons could also be envisaged, based on geographical or cultural proximity for example, and more systematic sensibility studies could be performed. Second, other food value chains may be willing to assess their sustainability using the same method. Having access to our detailed dataset will allow them to better understand the method -seeing it applied to a broad set of examples -and to undertake detailed quality checks (e.g., identifying outliers in both raw data and indicator values). Finally, should such an assessment be conducted, we encourage these fellow researchers to enrich the dataset by sending us their results. Such a virtuous cycle could, over time, lead to further interesting analyses, such as intertemporal comparisons or testing past results on a higher sample.