Detecting Language Preference from the Browser with Accept Header

Some time ago I was on a trip to Germany for the Smashing Mag event. Several websites I visited (including the world’s largest search engine) asked me to confirm my language preferences based on my current physical location. This struck me as a rather inefficient approach to setting language preferences. Especially given the language of the browser is readily available to web publishers as part of the HTTP request.

The HTTP_ACCEPT_LANGUAGE header is defined by W3C in RFC 2616, as “the set of natural languages that are preferred as a response to the request”.

It returns one or more languages in the header that tell the web server which language(s) the browser prefers to receive content assets in. This can be explicitly set in the browser by the user. More typically though, the default value will be the language of the owner of the device.

Now it’s not that I don’t want to ever come into contact with a foreign language, but for one reason or another, usually sheer expediency, I want to access content that is in the default language of my browser and OS, particularly when I am travelling to a country where I don’t actually speak the language. I can always choose to search in the local language if my language chops are up to it.

Furthermore, as someone who works in a very internationalized environment, a quick straw poll among colleagues from non-English speaking countries indicates that most set at least some of their interfaces to their native language, regardless of the fact that (most of) our communications are officially in English.

So, if you have an internationalized web presence, or a need to address audiences in a variety of languages, it seems like a pretty good way to serve users in their native language.

Language and Locale – an alternative to Geo IP?

When the browser makes a request, language and locale are returned as simple two-letter ISO codes (ISO-3166 and ISO-639), which together indicate the language and country variant of that language; for example en-GB for English as used in the UK. There are currently 249 country codes set out in ISO-3166 and 184 languages defined in ISO-639.

Assuming the user is using a default browser, this is a pretty good indication of the actual language of the user. The language locale may give further clues as to the language preference, maybe even country of origin, if you accept that the user is unlikely to change those settings. Take es-AR for Argentinian Spanish, or pt-BR for Brazilian Portuguese, for example.

The combination of these two ISO codes, can provide a steer for which language version of a website or piece of content to serve to a visitor, and because it is not dependent on IP tracking, it may provide more reliable results on the underlying native language of the user, rather than the language of his/her current location.

When in Roam

IP-based language targeting alone does not deal well with roaming usage. If you are in Germany, but don’t speak German, chances are you are going to prefer to see web content or advertising in your own language (provided it’s available) rather than a local language you don’t understand.

Language and locale opens up particularly interesting possibilities in the advertising world. In combination with knowledge of physical location it opens up the potential to reach audiences with locally relevant information in their own language or regional variant, irrespective of where they are at any given time.

Internationalization – a babel of web content

From pure content perspective, if you are developing a web application with an international focus, knowing language and locale may be interesting not only to handle customers with a range of different languages, but to cater to the nuances within different variants of the same language. For example, in English, there are significant formatting differences for date and time, between US and UK variants. Is 05/06/14 May 6th or the 5th of June?

There may also be international differences in numbering systems. At one point a British billion was a 1000 times greater than its American counterpart, and although UK English has adopted the ‘short scale’ (increments of x1000 between billion, trillion), there are many nations who continue to use the ‘long scale’.

There are also many differences in spelling and terminology which may be addressed with a knowledge of language and locale. If you are developing an international site, then you may want to include these locales.

Q Values – an interesting aside

In the W3C spec, there is also a way that users can set their language preferences. If for example you were bilingual – which according to some reports, more than half the planet are – you can set your browser to accept content in order of priority, using q values. Q Values must be between 0 and 1 with a higher value indicating a higher preference for a particular language. If no number appears, a value of 1 is assumed.

The below example applies a weighting in the following order; first preference Danish, if not available then British English is requested, and finally any type of English.

Accept-Language: da, en-gb;q=0.8, en;q=0.7

You could see how this would be useful in situations or nations where there are more than one language in regular use. For example, the use of Spanish in the USA is growing strongly. There are many examples of countries with several working languages. In Spain you have Basque, Catalan and Castillian. In Belgium, Flemish and French.

Aside from polyglots and bilinguals who are bothered setting up language preferences, advertisers are most likely to benefit from a knowledge of the browsers language setting.

How to do this

Part of the prompt to write this piece is because of my involvement in DeviceAtlas project. If you are implementing a device awareness strategy, where you want to return information on the characteristics of the requesting device then you can add knowledge of browser language settings with DeviceAtlas. You can do this because the DeviceAtlas APIs analyze all headers in the detection process, not just the UA string. See here for specifics of that.

If you are just interested in doing this in your web applications then how you return information from the HTTP request will depend on your programming environment, however time and space constraints don’t permit a full discussion of all the options here. Environment specific how-to’s are fairly readily available on the web, in the meantime, you can check out the following links providing code for some of the more poplular set-ups:

Leave a Reply

Exclusive tips, how-tos, news and comment

Receive monthly updates on the world of mobile dev.

Other Products

Market leading device intelligence for the web, app and MNO ecosystems
DeviceAtlas - Device Intelligence

Real-time identification of fraudulent and misrepresented traffic
DeviceAssure - Device Verification

A free tool for developers, designers and marketers to test website performance
mobiReady - Evaluate your websites’ mobile readiness

© 2024 DeviceAtlas Limited. All rights reserved.

This is a website of DeviceAtlas Limited, a private company limited by shares, incorporated and registered in the Republic of Ireland with registered number 398040 and registered office at 6th Floor, 2 Grand Canal Square, Dublin 2, Ireland