Note: Transliteration will not be perfect, as each language or script is unique. There will be equivalent substitution in some cases.
- About Unicode and Character table for all Scripts
- Unicode Table with equivalents for Indian Scripts
- Tamil Editor
- Kannada Editor
- Hindi (Devanagari Script) Editor
- Kannada from/to Tamil
- Kannada (Devanagari Script) from/to Hindi
- Hindi (Devanagari Script) from/to Tamil
India has at least 33 Principal languages in common use out of 1600+ different Indian languages which are spoken.
The language diversity of its one billion people makes the communications problems of most other countries seem trivial by comparison.
The number of languages in India as being 1652 (1961 census). More recent censuses show a slightly different number, 1,576 to 1,721 “mother tongues” with separate grammatical structures. The exact figure is not available due to the richness of native Indian languages and their variants, contradictions in language surveys and human’s inability to identify the difference between Indian languages and Indian dialects.
It is hard to accurately determine exact number of languages and dialects in India. It is like counting number of birds in a region, as in Birbal story. The taste of water changes every mile, language or dialect changes every four miles (कोस कोस पर बदले पानी , चार कोस पर वाणी।). There are different theories about how many of these mother-tongues qualify to be described as independent languages.
The encyclopaedic People of India series of the Anthropological Survey of India, identified 75 "major languages" out of a total of 325 languages used in Indian households. Ethnologue, too reports India as a home for 398 languages, including 387 living and 11 extinct languages.
As per the 2011 Census, there are about 122 languages, out of which 23 (including English) are listed as the official languages of the Republic of India. 800 million Indians can speak this 23 languages. The Constitution stresses the primacy of Hindi which, written in Devanagari script. But English remains in widespread official use.
Collections and Essays
- Declaration of Human Rights: Tamil
- Declaration of Human Rights: MALAYALAM
- Languages: Concepts and Myths
- As per the People's Linguistic Survey of India (PLSI), as many as 780 different languages are spoken and 86 different scripts are used in the country. Nearly 250 languages have been lost in the last 50 years. 22 of the 780 languages are scheduled Indian languages. 122 languages have been declared by the census as spoken by a population exceeding 10,000. [hindustantimes 17 Jul 2013]
- For many educated Indians, English is virtually their first language.
- 23 national languages, plus English, associate official. They are: Assamese; Bengali; Bodo; Dogri; English; Gujarati; Hindi; Kannada;
Kashmiri; Konkani; Maithili; Malayalam; Marathi; Meitei (Manipuri); Nepali; Odia;
Punjabi; Sanskrit; Santali; Sindhi; Tamil; Telugu; and Urdu.
These languages have over 720 dialects.
- Formation of lingustic provinces. The acceptance of this policy involved the statutory recognition of all the major regional languages.
- The general script of the Aryan languages is different from the general script of the Dravidian languages. The Indians also distinguish between the general north Indian accent and general south Indian accent. Along with these two main language families, there are others from Sino-Mongoloid family spoken in the East India.
Number of Languages
|Radio programs ||100 plus|
|Newspapers ||90 plus|
|Schools teach||60 plus|
|Language Family ||Number ||Percentage of population |
|Indo-European or Indo-Aryan ||54 ||70.0 |
|Dravidian ||20 ||23.8 |
|Mon-Khmer ||20 ||2.0 |
|Sino-Tibeto-Burman ||98 ||2.0 |
|Austro- Asiatic languages ||25 ||1.2 |
|Others ||1400 plus ||1.0 |
325 recognized/documented Indian languages
Agaria, Ahirani, Aimol, Aiton, Anal, Andamanese, Angani, Angika, Ao, Apatani, Arabic, Armenian, Ashing, Assamese, Asuri, Awadhi
Badaga, Baghelkhandi, Bagri, Baigani, Bajania, Balti, Bangni, Banjari, Basturia, Bauria, Bawm, Bazigar Boli, Bengali, Bhanja- bhumia, Bantu, Bharmauri, Bhairi, Bhili, Bhojpuri, Bhotia, Bhuiya, Bhumij, Bhunjia, Biate, Bilaspuri, Birhor, Birjia, Bishnupriya, Bodo, Bokar, Bondo, bori, Braj Bhasha, Brijlal, Bugun, Bundelkhandi, Burmese, Bushari
Chakhesang, Chakma, Chambilai, Chameali, Chang, Changpa, Chattisgarhi, Chikari, Chinali, Chiru, Chote, Churasi
Dalu, Deori, Dhanki, Dhimal, Dhodia, Dhundhari, Didayi, Dimasa, Dingal, Dogri, Dommari, Droskhat/Dokpa, Duhlian-Twang
Gadaba, Gadiali, Gallong, Gameti, Gamit, Gangte, Garasia, Garhwali, Garo, Giarahi, Gondi, Gujarati, Gujjari, Gurung, Gutob
Hajong, Halam, Halbi, Harauti, Haryanavi, Hebrew, Himachali, Hindi, Hinduri, Hindusthani, Hmar, Ho, Hrusso, Hualngo
Jabalpuri, Jangali, Jarawa, Jaunsari, Juang
Kabui, Kachanga, Kachari, Kachchi, Kadar, Kagati, Kakbarak, Kanashi, Kangri, Kannada, Karbi, Karen, Karko, Kashmiri, Kathiawari, Khadiboli, Khaka, Khamba, Khampa, Khampti, Khampti-shan, Kharia, Khasi, Khaskura, Khatri, Kherwari, Khiangan, Khorusti, Khotta, Kinnauri, Kiradi, Kisan, Koch, Kodagu, Koi, Koireng, Kokni, Kolami, Kom, Komkar, Konda, Konicha, Konkani, Konyak, Koracha, Koraga, Korava, Korku, Korwa, Kota, Kotwalia, Kudmali, Kui, Kuki, Kulvi, Kumaoni, Kunbi, Kurukh, Kuvi
Ladakhi, Lahauli, Laihawlh, Lakher (Mara), Lalung,Lambani, Lamgang, Laotian, Laria, Lepcha, Limbu, Lisu, Lodha, Lotha, Lushai
Mag, Magahi, Magarkura, Mahal, Maithili, Majhi, Makrani, Malankudi, Malayalam, Malhar, Malto, Malvi, Manchat, Mandiali, Mangari, Mao, Maram, Marathi, Maria, Maring, Marwari, Mavchi, Meitei, Memba, Mewari, Mewati, Milang, Minyong, Miri, Mishing, Mishmi, Mizo, Monpa, Monsang, Moyon, Muduga, Multani, Mundari
Na, Nagari, Nagpuri, Naikadi, Naiki, Nati, Nepali, Nicobarese, Nimari, Nishi, Nocte,
Odki, Onge, Oriya
Padam, Pahari, Paharia, Palilibo, Paite, Panchpargania, Pang, Pangi, Pangwali, Parimu, Parji, Paschima, Pasi, Pashto, Pawri, Pengo, Persian, Phom, Pochury, Punchi, Punjabi,
Rai (Raikhura), Rajasthani, Ralte, Ramo, Rathi, Rengma, Riang,
Sadri, Sajalong, Sambalpuri, Sangtam, Sansi, Santali, Sadra, Saraji, Sarhodi, Saurashtri, Sema, Sentinelese, Shekhawati, Sherdukpen, Sherpa, Shimong, Shina, Shompen, Sikligar, Sindhi, Singpo, Siraji, Sirmauri, Soliga, Sulung, Surajpuri
Tagin, Tai, Tamang, Tamil,Tangam, Tangkhul, Tangsa, Tataotrong, Telugu, Thado, Thar, Tharu, Tibetan, Toda, Toto, Tulu
Yereva, Yerukula, Yimchungre
Zakring (Meyer), Zeliang, Zemi, Zou.
Endangered languages and scripts
An endangered language is a language that is at a risk of falling out of use, generally because it has few surviving speakers. If it loses all of its native speakers, it becomes an extinct language. UNESCO defines four levels of language endangerment between "safe" (not endangered) and "extinct": Vulnerable; Definitely endangered; Severely endangered; and Critically endangered. 191 languages of India are classified as vulnerable or endangered.
Some notes on North Indian languages
- Hindi is an Indian language spoken in most states in northern and central India. It is an Indo-European language, of the Indo-Iranian subfamily. It evolved from the Middle Indo-Aryan prakrit languages of the middle ages, and indirectly, from Sanskrit. Hindi derives a lot of its higher vocabulary from Sankrit. Due to Muslim influence in Northern India, there are also a large number of Persian, Arabic and Turkish loanwords.
- Marathi is one of the many languages of India, and has a long literary history. It is the language spoken in the state of Maharashtra. Marathi is supposed to be derived from Sanskrit (like many other languages in India). It is spoken by about 70 million people and is to have separated from the other languages in its group about a thousand years ago. Other names for the language are Maharashtra, Maharathi, Malhatee, Marthi, and Muruthu.
- ----- more to be added
- Kashmiri is an Indo-Aryan language spoken in parts of India and Pakistan. It has 4,391,000 speakers. It is an SVO language written in a persian script.
- Sindhi is an Indo-Aryan language spoken by approximately 17 million people in the province of Sindh, Pakistan. Sindhi is also a recognised official language of India, where it is spoken by approximately 1.2 Million members of an ethnic group which migrated from the province of Sindh, Pakistan during the partition of British India in 1947. The language can be written using Devanagari
- Nepali is an Indo-Iranian language spoken in Nepal, India and Bhutan.
Some notes on South Indian Languages
South India, surrounded by three oceans, is a region of overwhelming grandeur and pristine beauty. Separated from north India by the Vindhya mountain range, the south Indian peninsula is doubly insulated by the Arabian Sea and Eastern Ghats on the east and the Bay of Bengal and Western Ghats on the west.
As a result, this triangular volcanic land that was once part of the geologically primeval Gondwanaland, remained culturally undisturbed for millennia, evolving an aura of poised tranquillity.
The dominant features of south India are the tropical climate less harsh than the northern States, lush green tropical vegetation in the coastal areas and the architecture, culture, languages and lifestyle which had remained essentially Dravidian at the core in spite of repeated exposures to alien influences.
The major languages in South India today are: Tamil, Telugu, Kannada and Malayalam. The minor languages are several and these are: Brahui, Gondi, Kui, Malto, Oraon (Kurukh), Toda, Tulu and Konkani (in Kannada script). Today, Tamil is the state language of Tamil Nadu, Telugu - the state language of Andhra Pradesh, Kannada - the state language of Karnataka, and Malayalam - the state language of Kerala. Of the minor languages, Tulu and Konkani are the only active languages and modern works continue to be published in these languages forming part of the literature of Karnataka State.
Tamil is official language in other countries like Singapore and Malaysia.
Around 18 percent of the Indian populace (about 169 million people in 1995) speak Dravidian languages. Most Dravidian speakers reside in South India, where Indo-Aryan influence was less extensive than in the north. Only a few isolated groups of Dravidian speakers, such as the Gonds in Madhya Pradesh and Orissa, and the Kurukhs in Madhya Pradesh and Bihar, remain in the north as representatives of the Dravidian speakers who presumably once dominated much more of South Asia. (The only other significant population of Dravidian speakers are the Brahuis in Pakistan)
The Dravidian family of languages includes approximately 26 languages that are mainly spoken in southern India and Sri Lanka, as well as certain areas in Pakistan, Nepal, and eastern and central India. Dravidian languages are spoken by more than 200 million people, and they appear to be unrelated to languages of other known families. Some scholars include the Dravidian languages in a larger Elamo-Dravidian language family, which includes the ancient Elamite language of what is now southwestern Iran.
Many common linguistic features are still discernible among these Dravidian languages. Some five thousand words are common to these languages. Many grammatical forms are common. The overwhelming influence of Sanskrit scholars and borrowing of Sanskrit words resulted in the emergence of Kannada and Telugu as distinct languages from Tamil some fifteen hundred years ago.
Tamil was the language of bureaucracy, of literati and of culture for several centuries in Kerala. In fact, fifteen centuries ago the rulers of Kerala were all Tamils. Up to the tenth century the Pandya kings ruled Kerala with royal titles such as 'Perumaankal and 'Perumaankanar'. It was a Tamil poet from Trivandrum who in fact presided over the academy of Tamil scholars, when they met to evaluate the famous Tamil grammatical work Tolkappiyam. From the third century 13.C. to the first century A.D., many poets from Kerala composed poems in Tamil and their compositions are included in Tamil anthologies such as Akananaru and Purananaru. All the one hundred poems in the anthology PatiRRuppattuextol the greatness of the kings of the Kerala region. The author of the famous Tamil epic Cilappatikaram was a poet from Kerala. The shrine in honor of KaNNaki, the heroine of Cilappatikaram, was built at Tiruvancikkulam in Kerala. Among the Saiva and Vaisnava composers, CEramAn PerumAl Nayanaar and KulacEkara Alvaar respectively, belong to the Kerala region. AiyanEritanaar, the author of the tenth century grammatical work PuRapporul VeNpaamaalai, hailed from Kerala. Many scholars and pundits from Kerala contributed much to the Tamil language and literature and the historical evidence shows that the region now known as the State of Kerala was once an integral part of Tamil Nadu at some period of time.
Major languages in India, with over 720 dialects are written in 13 different scripts.
The Brahmi script is the earliest writing system after the Indus script. Most of the Indian scripts and several hundred scripts found in Southeast and East Asia are derived from Brahmi. Brahmi is an abugida that thrived in the Indian subcontinent and uses a system of diacritical marks to associate vowels with consonant symbols. It has numerous descendents like Gupta script. A southern form of Brahmi developed into the Grantha script
The Kharosthi script (also known as 'Indo-Bactrian' script) was more or less contemporary to Brahmi script and was employed to represent a form of Prakrit.
Origin of Brahmi Script is debatable with possible candidates: Unknown so far, Indus, Hieratic, cuneiform, Phoenician, Aramaic etc.
Prinsep deciphered Brahmi and Kharosthi from the bilingual Indo-Greek coins. Tamil-Brahmi script was found in Palani in Southern India, scientifically dated to 540 BCE.
The Unicode Consortium is a non-profit organization devoted to developing, maintaining, and promoting software internationalization standards and data, particularly the Unicode Standard, which specifies the representation of text in all modern software products and standards. The Unicode Consortium actively develops standards in the area of internationalization including defining the behaviour and relationships between Unicode characters. The Consortium works closely with W3C and ISO—in particular with ISO/IEC/JTC 1/SC2/WG2, which is responsible for maintaining ISO/IEC 10646, the International Standard synchronized with the Unicode Standard.
The latest electronic version of the Unicode Standard can be found at Unicode site. The publications of the Unicode Consortium include Unicode Standard, with its Annexes and Character http://www.unicode.org/ucd/, Unicode Technical Standards and Reports http://unicode.org/reports/, Unicode Technical Notes and the Unicode Locales project, the Common Locale Data Repository.
The Unicode Character Standard primarily encodes scripts rather than languages. That is, where more than one language shares a set of symbols that have a historically related derivation, the union of the set of symbols of each such language is unified into a single collection identified as a single script. These collections of symbols (i.e., scripts) then serve as inventories of symbols which are drawn upon to write particular languages. In many cases, a single script may serve to write tens or even hundreds of languages (e.g., the Latin script).
A bilingual person is, in its broadest definition, anyone with communicative skills in two languages, be it active or passive. In a narrow definition, the term bilingual is often reserved for those speakers with native or native-like proficiency in two languages. Similarly, the terms trilingual and multilingual are used to describe comparable situation in which three or more languages are involved. Many bilingual speakers are able to switch from language to language with ease, sometimes in mid-sentence.
In India, 3-language formula is some what popular. In State-run schools, students learn mother tongue, Hindi, English; if in a Hindi state, another language is included. English serve as medium of education at many Universities.
It is often claimed that a distinctive feature of bilingualism in India is its stability, i.e., speakers of Indian languages tend to maintain their languages over generations and centuries, even when they live away from the region where it is dominant.
By necessity, a substantial minority are able to speak two Indian languages; even in the so-called linguistic states, there are minorities who do not speak the official language as their native tongue and must therefore learn it as a second language. Many tribal people are bilingual. Rural-urban migrants are frequently bilingual in the regional standard language as well as in their village dialect. In Bombay, for example, many migrants speak Hindi or Marathi in addition to their native tongue.
For example, Speakers of Marathi, spoken in the state of Maharashtra (North), have lived in a predominantly Tamil-speaking Dravidian language area (South) for over 300+ years. The ancestors of the present-day Marathi speakers came to Tanjore between 1638 and 1680. Like most Indians, the members of this community are trilingual. They use Tamil and English outside the home, and Marathi at home and as a community language.
There are 26 major languages (no majority language) with more than one million speakers each! India has so far accommodated to it with great creativity, with linguistic tolerance, and with the prevalence of multi-linguality among Indians, especially educated Indians.
Popularity of English Language
What proportion of India's population of a billion speaks English is hotly debated, but most sources agree it is well under 10 percent. Even 5% is 50 million people, the largest population of English-speakers in the world, after the United States, the United Kingdom. English is a second language for virtually everyone in India who speaks it. English is the main vehicle for certain kinds of knowledge, a library language. It is the best source of scientific knowledge today. Despite growth in book publication in other indian languages, it is not in science and technology.
English is the lingua franca, or more precisely, the link language of India. It remains the language of the Lok Sabha (the parliament), of the higher courts, of the highest levels of the Indian Civil Service, of the major universities, of multi-nationals and of most large sale Indian businesses. It is no exaggeration to say that the ruling elite of India speaks excellent English and uses English for business and professional purposes on a daily basis.
Indian software developers and programmers, among the most skilled in the world, invariable use English in their work. India alone has the third-largest English-speaking population in the world, with about fifty million fluently bilingual speakers.
Digital Divide or Digital Opportunity
But what of the other 95 percent, who can not use English?
There is in effect no widely available software in any of the Indian languages including Hindi, with 950 million primary and secondary speakers.
"But is there a market for local (vernacular) computing in South Asia?" The story of the two shoe salesmen who went to a rural village will provide the answer. The first came back to his headquarters and said, "The situation is absolutely hopeless. Of a thousand people, not a single one wears shoes." The second returned to his home company and said, 'This is an incredible opportunity: no one owns shoes and we can sell a thousand pairs!"
Points to note:
- For a multi lingual nation, India, is still lagging behind in language Internet usage and seeks investments
- The average language net user is a 25-year old, who accesses the Internet from a cyber cafe, reads regional language newspaper.
- The language newspapers comprise eight of the top selling newspapers in the country yet language net users are only 9.6 per cent of the total. So, the critical issue is infrastructure.
- The survey, titled 'Net Bhasha: state and future of language Internet services in India', showed a strong off-line presence resulted in a strong on-line use. Therefore, for a strong regional language newspaper, which has already a strong reader base, converting this base to its on-line offering is easier
- Internet is still an urban and English-language dominated medium in the country despite its strongest use for e-mail.
- Cyber cafes have proliferated, in towns. Ease of use, technology issues like fonts, which have to be downloaded, which put off potential users, are the other hitches for the lower use. "Language use will benefit if cyber cafes offered language keyboards," it is felt.
Some Interesting Facts
- Traffic is on the left side (and cars have Right Hand Drive).
- English used in India is modelled on British English.
- Date format: dd/mm/yyyy
- Number format: 100 thousand = 1 lakh. 10 million or 100 lakhs or 1,00,00,000 = 1 crore.
- Postal Code (PIN): 6 digits.
- Official Measurements: Metric
- Voltage 220V; 50 Hz
- Financial Year starts on April 1.
References (including sources) and Language Studies WEB sites
- Aksharamala - Indian Language Solutions URL:http://www.aksharamala.com/
- Multilingual Applications URL:http://acharya.iitm.ac.in/applic.html
- AncientScripts.com URL:http://www.ancientscripts.com/
- India: Languages and Scripts URL:http://www.cs.colostate.edu/~malaiya/scripts.html
- Timelines of Asia TOC: India, China, Japan URL:http://web.cocc.edu/cagatucci/classes/hum210/tml/asiantml.htm
- UCLA Language Materials Project URL:http://www.lmp.ucla.edu/default.htm
- UTF-8 and Unicode FAQ URL:http://www.cl.cam.ac.uk/%7Emgk25/
- The Unicode HOWTO: Introduction URL:ftp://ftp.ilog.fr/pub/Users/haible/utf8/Unicode-HOWTO-2.html
- ishida - Unicode character pickers URL:http://people.w3.org/rishida/scripts/pickers/
- Wikipedia URL:http://en.wikipedia.org/wiki/Indian_languages
- Malayalam Online Dictionary URL:http://malayalamdictionary.com/
- Introduction to indic scripts URL:http://people.w3.org/rishida/scripts/indic-overview/
- Viewable with Any Browser: Accessible Site Design Guide URL:http://www.anybrowser.org/campaign/abdesign2.html
- Linguistics 100.03 URL:http://www.library.cornell.edu/olinuris/ref/ling100_3_spring_04.html
- Peoples and languages URL:http://asnic.utexas.edu/asnic/subject/peoplesandlanguages.html
- The origin and development of the Konkani language URL:http://www.kamat.com/kalranga/konkani/konkani.htm
- Sanskrit Documents List: Projects Listing URL:http://sanskrit.gde.to/projects_list1.html
- Implementation Tools for Tamil Standard Code URL:http://www.geocities.com/Athens/5180/tsctools.html
- Ethnologue URL:http://www.sil.org/ethnologue/
- Numbers in many Languages URL:http://www.zompist.com/numbers.shtml
- Prayer in many languages URL:http://www.christusrex.org/www1/pater/index.html
- LangNet URL:http://www.nflc.org/infolangnet/
- Less Commonly Taught Languages URL:http://carla.acad.umn.edu/lctl/lctl.html
- LangNet URL:http://www.langnet.org/
- Consortium for Language Teaching and Learning URL:http://consortium.dartmouth.edu/
Old language map of India