This week, I attended the Swiss GS1 Systemtagung in Olten, a workshop on GS1 standards and related matters. There were three main themes: digital bills, food labelling legislation and GDSN. I also gave a presentation about the use of GTINs in Google Shopping.
The conference started with a presentation on the importance of standards – interesting even though I don’t need to be convinced about that. I found the sessions on digital bills very interesting, while there are EDI solutions for electronic billing, they don’t scale well for partners who exchange bills infrequently. The proposal for Switzerland follows a german standard: ZUGFeRD. This standard is pretty elegant: attach machine readable XML to the PDF file of the bill. The file can be sent around by e-mail or any other messaging system. The attachment contains XML encoded EDIFACT data (ISO TS 20625), the PDF follows the PDF/A-3 standard (ISO 19005-3). The approach is very similar to what is done on the web, where machine readable information in the schema.org format is embedded within the HTML, either in JSON-LD format, or in micro-data annotations. The standard seems to be on track to become adopted also in France and in Switzerland and could be used for other forms of documents besides bills.
The second session was about food labelling. I found the rules pretty pragmatic: only industrial packaged good need to be labelled, other products (handmade things, as well as artisanal produce) are exempt, but the information has to be available either in some sign or orally. The general expectation is that farmers selling pots of jam don’t have to do the exact nutrient analysis, although I wonder if in five years this won’t be doable with a small mobile phone ad-on.
The last session was about the Global Data Synchronization Network (GDSN). This is basically the internet version of the old EDI point to point links, producers push product information into the network and retailers, but also certification agencies can consume this product data. The goal is that a seller only needs to provide the data once, instead of doing it for each retailer they want to sell their product with. The presentations were mostly various companies explaining their experiences setting up the system, which a big emphasis on the internal work to collect and structure the data internally.
All in all it was a very interesting conference, and I was very honoured to be allowed to present there.
The key assumption of any database key is that it uniquely identifies an item. As observed in previous entries, GTIN fulfils that role, most of the time. We recently bought some hair accessories at Migros and the ticket showed three times the same item even though we bought three different things. Upon checking, I realised that all three products share the same GTIN:
As all three items share the same code, they also share the same price and the same receipt label. What is interesting is that this is not a case of product variants, these are different products which share the same code.
Migros has a pretty neat mobile app which lets you visualise your shopping tickets. What is interesting is that the product name does not match, the packaging says Fashion Girl, and the receipt Hair fashion standard. There are no identifiers on the packaging nor any information (country or origin, recycling codes), which is pretty unusual for Migros products. One of the product has some bad english indications
It can be used for arranging different hairstyle from the well populan pony tail to the latest fad.
All the information is on a stick-on label with the Migros brand, the barcodes, an indication that these are not toys. Each product has a different SKU: HF 231301, HF 231323 and HF 231410.
This code has the Swiss prefix (76) and is assigned to the company Herba-Imodac AG. But I highly doubt this product is produced in Switzerland, most probably these were built somewhere in Asia, and the labels added by the importer, who did not bother giving them different GTIN even though they have different SKUs.
Localisation is hard, in particular in domains where there are legal requirements, so you end up with labels that claim that the same bottle has different physical volumes depending on the country.
The packaging suffer from the same schizophrenia about product codes, as it has two GTINs, one UPC,
752183585835 registered with American Wine Distributors Pier 23 The Embarcadero Suite 201 CA-94111 San Francisco USA and one EAN
3278480629302 registered with Godet Frères SAS 34 Quai Louis Durant, BP 70041 17003 La Rochelle CEDEX 1 France.
As this is a bottle of French Cognac, it seems the product was first assigned an EAN, and when the product needed to be sold in the US, a UPC was added to work with old systems in the US and Canada. I wonder if cash registers in Europe have two entries in their databases, one with the EAN, and one with the UPC, as European scanners will recognise UPCs.
Back of package with two GTINs CC-BY-SA, attribution Isabelle Hurbain-Palatin
While the GTIN system was intended for logistic up until the point of sale, the emergence of marketplaces for second hand goods means that many products, and their codes, appear once again in system. As often with legacy data, this means the old assumptions still affect the new system.
French comic books used to be quite different from US comic books: different format (A4 typically), hard binding, they commonly had ISBN codes. Some popular series were sold in supermarkets, and thus had an EAN code, even before the mapping of ISBN into EAN codes.
One character I loved was Gaston Lagaffe, and I still have a few albums of 80s edition. They are interesting because each album bears an ISBN-10 number (no barcode), but also an EAN code, except all albums of the set have the same code:
5410983209003. The album Gare aux Gaffes du Gars Gonflé has ISBN-10
2-8001-0308-6, while Gala de Gaffes à Gogo has ISBN-10
2-8001-0093-1, both have the same EAN. If I search that code using the Red Laser application, I get another album: Le Cas Lagaffe.
I can only hypothesise on why the publisher chose this scheme, in those days the selection of albums in supermarkets was pretty random, and the price of all of them the same, so it might well be that supermarkets would just handle those albums as minor variants of each others, i.e. the supermarket would buy a selection of them, but consider them all equal for inventory purposes. This is still the case for smaller toys, where items with different colours share the same code.
Of course, each book has an individual ISBN-10, and if I convert it into an ISBN-13 and then look for it, I get the correct results.
Selling stuff is a pretty old human activity, and merchants had found ways to distinguish themselves from their competition way before Archimedes had shouted Heúrēka. Trade is a complicated business, and online shopping has not made that simpler, quite the contrary. So when programmers build system to support online shopping they tend to stumble on their own erroneous, assumptions.
This post is similar to the one I made about geographic assumptions, but about online shopping, again this list is not exhaustive, and some of the falsehoods are disputable.
- A product has a price
- Products sold on auction site do not yet have a price. The moment the price is known is actually the moment the item will not be on sale anymore.
- Except for auctioned items, products have one price
- Products do not have one price, they have many prices: with or without taxes, then there is the sale price, the regular price, the list price, the manufacturer approved price, the mandatory publisher price.
- Products have one final total price
- The total price paid typically depends on a lot of variables: time of the transaction, location of the buyer, shipping methods, memberships, sometimes even the profiling of the buyer.
- A product has a strictly positive price
- Many phones are sold for “free”, there is typically a subscription behind it. Some online shops also add samplers and documentation as free items to their inventory.
- A price is a number
- Without a currency, a price is meaningless on the internet.
- A price is a floating pointer number and a currency
- Using floating points for price is incorrect: no currency is defined for transaction below two decimal points,
3.1415 is a valid floating point number value, but USD 3.1415 is not a valid price for a transaction. Some currencies like the Japanese yen don’t accept any decimal position at all (the fraction of the yen, the sen, was removed from circulation in 1953). More generally floating point representation has rounding and approximation behaviour which are bad for monetary values which need to be exact.
- Currencies need to be rounded to some decimal position
- The Swiss franc needs to be rounded to five centimes.
- Currencies symbols uniquely identify a currency.
- The peso and dollar sign
$ is used my many countries: USA, Cananda, Australia, Brunei, Namibia. The ¥ sign is used both the Japanese Yen and the Chinese Yuan.
- Currencies have a unicode symbol
- The Swiss franc does not, and until 2010, neither did the India rupee.
- Currencies have zero or one unicode symbol
- The dollar and peso symbol appears three times in unicode:
- Currencies have zero or one unicode symbol after normalisation
- The Japanese yen can be represented by the following symbols: ¥, 円, 圓.
- Currencies can be described by a single three letter code
- The ISO 4216 code for the Russian ruble is
RUB, the three letter code руб is widely used, so is
CA$ for the Canadian dollar.
- Each stock keeping unit translates to a product
- Some bulky items have to be kept in the warehouse as two or more boxes, hence two stock keeping units, but can only sold together as one product.
- Each product has an picture
- Many generic, or bulky items are sold online without pictures: pocket books in Japan, but also packs of screws etc.
- You can put all products in database
- Increasingly products can be customised, a shop that sells T-shirts with custom text as an infinite number of products, which won’t fit in a database. Even if you consider some good whose dimensions can be customised, the combinatorial growth of possibilities will quickly go beyond the capacity of a database.
- There is a common keying system for products
- GTINs are the closest thing, but many smaller manufacturer do not participate in the system, some items have multiple keys. The system also does not support custom goods.
- There is a common system for annotating web-pages with products
- There are multiple micro-data and micro-format variants.
- In stock means the item is in the warehouse
- Many online sellers do not have any actual warehouse, they ship directly from their suppliers (Drop shipping)
When I was a kid, it was a common sight to see people in the supermarket with machines applying price labels to stuff. Nowadays this is something really rare, and only happens when food products are about to run out and are discounted. Still, sometimes, you see a product that has been re-labelled.
Sometimes the new label contains the same barcode, but additional information that is not present on the original packing, this is often the case in pharmacies in Switzerland. Another reason can be that the labels are incorrect, or incomplete, the official price has changed or some other legal changes. And sometimes, it is really hard to know what went on.
I needed a new sleeve for my swiss army knife, so I bought a new one at transa. The sleeve comes in a plastic wrapping, with a perfectly valid barcode,
07611640194054 which is assigned to:
Wenger S.A. Fabrique de Couteaux
Route de Bâle 63
Still another label was added with a restricted use code:
2000003332137. The label also contains some additional information: size, colour, weight and another number,
078577-006001 which is not encoded in the barcode. Why change the barcode? One possible reason is that there are variations of the product sharing the same manufacturer provided code and they wish to distinguish them in their system. Another possible reason is that the product originally did not bear a barcode, and the store assigned it a restricted use one so that their system could work, and the product got a manufacturer provided number later. Compatibility…
I already mentioned many ways were a given product can end-up with multiple GTIN codes: a book with an ISBN and a UPC, a book with two national ISBN codes, or products with two product codes on the same box. There is yet another reason for having more than one GTIN: relabelling.
One of the goals of the whole GTIN system is to avoid the need of adding labels to the boxed product: the code is on the box and serves as primary key to be looked-up. Of course this only works if there is no price on the box, or if the price does not change.
I have at home a tower of hanoi game produced by Éditions Trédaniel which has an ISBN printed on the box,
9782849331231, along with a price, 22€. The code and the price were covered up with a label with a different price, 16.90€, but also a different ISBN:
9782849332566. While it makes sense to cover the price, why change the code?
I’m not sure, but this product is in some ways a book: there is a booklet inside and it bears a book number (ISBN), which means that it might fall under the Lang Law, which lets the publisher fix the price of a book, which is printed on it, the seller must respect this price, he is only allowed to give a 5% discount. This also means that if a publisher wants to lower the price, he needs to change the label, but also probably the code. As here both ISBN fall into the same range, this is probably what happened.
So we found instances of books with an ISBN and a UPC, and books with two ISBN, but can we find non-book products with two GTIN codes? Yes, we can. Many electronic goods are now sold globally, but often contain multiple national codes, all of which are part of the GTIN system. Case in point: the HTC manufactured Google Nexus One desktop dock has both a EAN (
4710937336078) registered in Taiwan and a UPC (
0821793004965). I don’t fully understand why this is needed, clearly you need a UPC for being able to sell the device in the US, where there might be old system that cannot handle the 13 digit codes. But then why bother with an EAN code? In theory systems that can handle EAN can implicitly handle UPC codes.
Still these weirdnesses are only caused by the old UPC legacy issue? Not always: there are cases where a box contains two codes in the same national prefix space. Here the example is the Vodafone 802SE that I owned while I lived in Japan, the box harbours two JAN codes:
4908993111252, both registered with ソフトバンクモバイル. Maybe one was for the actual phone, the other for the phone subscription.
While the codes on consumer goods around us all use the same UPC/EAN barcodes, there are different formats, different carriers. One you might encounter is the ITF-14 system, which is easily recognisable by its thick black border, called the bearer bar, it is used to label shipping cartons and other boxes that are used to distribute good to the retail shops and or not always recognised by cashiers.
While the graphical representation is different from EAN codes, the information that is encoded within them is compatible: you can know what item is within a box by analysing the code. As the name ITF-14 suggests, the barcode carries a 14 digit number. Remove the leading digit, recompute the check-digit and depending on the number of initial zeroes, you get an EAN-13, a UPC code or an EAN-8.
Here the code is
80000049641505, strip the leading 8, and the remaining zeroes and you get
49641505, remove the check digit, we get
4964150 recompute the check digit: (0 × 3) + (5 × 1) + (1 × 3) + (4 × 1) + (6 × 3) + (9 × 1) + (4 × 3) = 51, subtract the last digit from 10 and you get the check-digit: 9. So the code of the items within that box is
The value of the first digit of the ITF-14 is not very strictly defined, if it is zero, then the box is considered a single item, maybe because its content is heterogenous (in my limited experience this is pretty rare), 9 should never be present, as this would indicate bulk a good in bulk quantity, not typically what you have in a box. The other numbers just mean some level of packaging, the only recommandation is that higher levels of packaging (more stuff in it), mean a higher number.
Incompatible technical standards are one way of getting multiple primary keys, the other is to have organisations that cannot accept keys provided by other organisations. National administrations are particularly good at this, as acknowledging any other national organisation would be akin to admit that there are multiple countries on this planet.
The book Le musée du Silence has two ISBN codes: one for Canada, and one for the rest of the universe. The one used in Québec is 2-7609-2463-7 (978-2-7609-2463-5), and the one used elsewhere is 978-2-7427-5491-5. Notice that there is no common code between the two systems, we have true incompatibility.
It is interesting to see that this book published in 2005, two years before ISBN-13 became mandatory already uses the new system for Europe, but not for America.