Detect Language of Text in Python

How can I use Python to detect the language of a given piece of text?

For example, if I have the following inputs:

Input text: “ру́сский язы́к”
Output text: “Russian”

Input text: “中文”
Output text: “Chinese”

Input text: “にほんご”
Output text: “Japanese”

Input text: “العَرَبِيَّة”
Output text: “Arabic”

How can I determine the language of the text in Python?

Sure, I’ve worked quite a bit with language detection in Python, and a reliable library I often use is langdetect. It’s straightforward and gets the job done.

Here’s a quick example:

from langdetect import detect

text = "Ein, zwei, drei, vier"
lang = detect(text)
print(lang)  # Output: 'de' (German)

Just feed it some text, and it will return the detected language code. Perfect for simple use cases where you want to detect language quickly!

I’ve also been working with language detection, and langid is another excellent library for this. It’s particularly great if you need something lightweight yet powerful for tasks like python detect language.

Here’s how you can use it:

import langid

text = "Ceci est une phrase en français"
lang, _ = langid.classify(text)
print(lang)  # Output: 'fr' (French)

The library even supports training custom language models if you have specialized needs. I’ve found it to be a strong alternative to langdetect!

Another great tool I’ve used for python detect language tasks is the TextBlob library. It’s versatile since it offers additional natural language processing capabilities beyond just language detection.

Here’s an example:

from textblob import TextBlob

text = "Hola, ¿cómo estás?"
blob = TextBlob(text)
lang = blob.detect_language()
print(lang)  # Output: 'es' (Spanish)

What’s cool about TextBlob is that you can extend its functionality for other text processing tasks like sentiment analysis or translation. It’s a good pick if you want more than just language detection in one package!