Wednesday, September 28, 2011

The Length of International Addresses and Your Database

Tags: international addresses, address formats, international mail, data quality, database, database address fields

The address is the single most important item in getting your mail delivered.  No matter how good your mailing services company may be, what's inside the envelope or the package just will not get to its destination without a deliverable address.  Yet incorrect addresses and undeliverable mail remain a persistent problem.  

What can a company do to limit this problem?  The solution begins with implementing and following good data quality practices.  And that starts with database design.  Many international addresses are too lengthy for a database designed for U.S. addresses, with too many lines or lines that are too long or both.  (No matter what country you reside in addresses from some other countries will be too lengthy to fit in a domestic format.)  Obviously, the requirements will depend on the countries in your database and decisions on whether you translate some fields, such as the honorific, to their equivalents in English (or your language).  Be aware if you translate that some terms may have no exact equivalent.

While many databases store address information in the fields (family name, building number, town and so forth) that make up the lines of the address, the information may come in as lines on a web form or application and must be returned to that format if something is mailed.  It is therefore useful to discuss the number of lines needed for a quality address, as well as the length of the lines and the length of individual fields within them.

The table below, with statistics derived from a number of international databases maintained in the U.S., gives an indication of the space requirements.  It is likely that the databases referenced below have concatenated multiple shorter lines into fewer longer lines for the U.K., as addresses in the U.K. generally have more lines than shown in the table. 



Average number of address lines Maximum lines Average number of characters/line Maximum characters/line
World 5.9 10 14.8 54
Germany 5.4 8 15.9 30
Mexico 6.1 9 18.6 30
U.K. 6.8 10 11.3 40
U.S. 4.1 6 18.9 30

Some countries use more lines – sometimes many more – in addresses.  Many East Asian countries, the U.K., some of its former colonies, and any countries using descriptive addresses (e.g. the house with the red door across from the church) fall into this group, with a possible 5 or 6 lines in addition to the addressee's name, organizational title, the organization's name and department, and the country name.  Altogether this can come to a total of 11 lines.  In most cases, the individual lines in addresses with many more lines tend to be shorter.  In many databases, these shorter lines are combined into a single line with a comma separating what would be different lines if the address were written by a resident of the destination country.

Lengthy words create both longer address lines and longer individual fields.  Compound nouns are particularly well known for creating lengthy street names.  The Germanic languages of northern Europe all use compound nouns and lengthy street names are common in Austria, Germany, the Netherlands and the Scandinavian countries.  These lengthier words mean that fields in the database and the resulting address lines are longer.  Unfortunately, there is sometimes no solution to this.  For example, Escherheimerlandstrasse in Frankfurt, Germany is abbreviated Escherheimerlandstr.  (Strasse is street in German.)
 
Lengthy words also occur in some languages with compound words.  Some regions of India are known for lengthy individual, street and city names and some Indians will shorten their names for everyday purposes.  However, the city of Thiruvananthapuram has no shortened form.  Thailand is also known for very long names and words.  Suffice it to say that the full transliterated name of Bangkok is Krung Thep Mahanakhon Amon Rattanakosin Mahintharayutthaya Mahadilok Phop Noppharat Ratchathani Burirom Udomratchaniwet Mahasathan Amon Phiman Awatan Sathit Sakkathattiya Witsanukam Prasit – a colossal 188 characters.

These comments also apply to individuals' honorifics and names.  Lengthy or compound family and personal names are common internationally, as are names with more than two segments.  The common American usage of first, middle and last name can be misleading since internationally a "middle" name may be part of the family name or the "last" name in the string may be the personal name.  Single names are also used (e.g., Suharto or Thant).  Common honorifics in many languages require more than 4 characters and some have no common abbreviation.  WorldVu's database allows for 12 characters and we use them all for the German who gave his honorific as "Dr. Dr. Ing."

The cost of maintaining correct addresses is often offset by the cost savings from printing, processing, and postage on undeliverable addresses – and incalculable benefit of an improved corporate image.