Google woes

I’ve discovered an annoying problem with Google’s core search engine: the way it deals with accents. For example, if you search for "Muller", it doesn’t pull up pages where the U contains an umlaut.

This is daft. It would be easy for Google to treat unicode characters 0075, 00FA, 0171, 00FC and E146 (all the accented Us) as identical, and would significantly improve search functionality. The database already treates unicode 0075 and 0055 (lowercase and uppercase Us) as the same, so the move wouldn’t even involve significant coding.

I am biased here: I’ve long believed that the use of accents is a waste of time designed exclusively to piss people off and keep grammar teachers in business – indeed, I think the computer industry made a terrible mistake in moving away from standard ASCII to allow foreigners to pollute the digital world with their stupid characters.

Nonetheless, even a believer in accents should be able to see the utility in allowing combined searches, given that nearly all accented words and names exist in the web in both accented and accent-free forms. You could even do the same for pedantic Anglicisations of accented words, so that "Muller" also returns "Mueller".

For purists, Google could easily make "return actual character only" an advanced search option (the same interface could also allow case-matching, which might sometimes be helpful). Alternatively, the foreign-language versions of Google could continue as they are, while the English version moved over to working in proper English.

Mild digression: few editorial style things annoy me more than people who use accents when writing English. The word is role, you idiot. We don’t have a hat in our language.

This entry was posted in Uncategorized by John B. Bookmark the permalink.

11 thoughts on “Google woes

  1. But why should the Internet be solely geared towards English-speakers?

    But I agree, it is a pain in the arse to have to do ALT+130 for "é" every time I search for French words in Google.

  2. But why should the Internet be solely geared towards English-speakers?

    Cos we invented it ;-)

    Erm, it needn’t be solely geared towards English speakers, and my paragraph about how everyone ought to be forced to write in the standard 26 letters rather than pissing about with accents isn’t entirely serious.

    However, Google.com *should* be solely geared towards English speakers (and google.fr solely geared towards French speakers, etc). Why bother with having multiple different-language portals if you’re not actually going to make significant language adjustments?

    BTW if you’ve got a UK keyboard, you can press AltGr and e together to get an é. Graves are more challenging…

  3. Well, not in the boxed linuxes that I’m familiar with. But it should be possible to edit the keymap to allow for the use of AltGr and a character to form accented characters (why SuSE, which is German, doesn’t deliver this out of the box is a bit of a puzzler for me).

  4. Given that the Finns are no strangers to accents, it’s odd that Linus Torvalds didn’t build it in to start with.

    But I thought people in mainland Europe had keyboards that can deal with accents much more easily? Or do the French just use AZERTY to annoy Anglophones…?

  5. They have the national sites, but they don’t really gear them to other languages – they just make certain advanced search options available on the front page. So Google.de gives you buttons to search sites in German or with German doman names, but you can do this from the advanced menu or by typing in site:de anyway. None of the underlying stuff is customised…

  6. "Not in Linux you can’t."

    The standard X11 way to do this is to set up AltGr as a "compose" key, and then type 3-key sequences. E.g. Compose, o, " gives ö, Compose, ‘, e gives é.

  7. John — "Cos we invented it ;-)" — largely true for the internet, but we mean the Web here don’t we? And it’s true that was invented by Tim Berners-Lee, but he was working for CERN at the time. CERN stands for Centre European pour Recherche Nuclear. Very English.

  8. mnletomol.com What is it with this absolute
    http://penny-flame.letomol.com/penny-flame.html penny flame obsession with correct pitch?
    http://lucie-seena.letomol.com/lucie-seena.html lucie seena I’m fairly sensitive to things
    http://ivey.letomol.com/ivey.html ivey gallery being sung out of tune myself,
    http://angie.letomol.com/angie.html angie but honestly, some of the stuff
    http://zoe-britton.letomol.com/zoe-britton.html zoe britton that Randy Jackson and Paula Abdul
    http://brenda.letomol.com/brenda.html brenda pounce on and pronounce as "a
    http://monica-sweetheart.letomol.com/monica-sweetheart.html monica sweetheart bit pitchy" sounds absolutely
    http://dominique-dane.letomol.com/dominique-dane.html dominique dane fine to me. Some of the greatest
    http://roxi.letomol.com/roxi.html roxi pop performances are a little
    http://miranda.letomol.com/miranda.html miranda letomol.comnm

Comments are closed.