ASCII, UTF8, UTF32, ISCII, Unicode

For a computer to be able to store text and numbers that humans can understand, there needs to be a code that transforms characters into numbers. The reason character encoding is so important is so that every device can display the same information. A custom character encoding scheme might work brilliantly on one computer, but problems will occur when if you send that same text to someone else. It won’t know what you’re talking about unless it understands the encoding scheme too.

Character Encoding

All character encoding does is assign a number to every character that can be used. You could make a character encoding right now.

For example, I could say that the letter becomes the number 13, a=14, 1=33, #=123, and so on.

This is where industry-wide standards come in. If the whole computer industry uses the same character encoding scheme, every computer can display the same characters.

ASCII

Stands for “American Standard Code for Information Interchange.” ASCII is a character encoding that uses numeric codes to represent characters. These include upper and lowercase English letters, numbers, and punctuation symbols.

ASCII Table

Dec  = Decimal Value
Char = Character

'5' has the int value 53
if we write '5'-'0' it evaluates to 53-48, or the int 5
if we write char c = 'B'+32; then c stores 'b'
 CharNumberDescription
NUL00null character
SOH01start of header
STX02start of text
ETX03end of text
EOT04end of transmission
ENQ05enquiry
ACK06acknowledge
BEL07bell (ring)
BS08backspace
HT09horizontal tab
LF10line feed
VT11vertical tab
FF12form feed
CR13carriage return
SO14shift out
SI15shift in
DLE16data link escape
DC117device control 1
DC218device control 2
DC319device control 3
DC420device control 4
NAK21negative acknowledge
SYN22synchronize
ETB23end transmission block
CAN24cancel
EM25end of medium
SUB26substitute
ESC27escape
FS28file separator
GS29group separator
RS30record separator
US31unit separator
   
DEL127delete (rubout)
CharNumberDescription
 0 – 31Control characters (see below)
 32space
!33exclamation mark
34quotation mark
#35number sign
$36dollar sign
%37percent sig
&38ampersand
39apostroph
(40left parenthesis
)41right parenthesis
*42asteris
+43plus sign
,44comma
45hyphen
.46period
/47slash
048digit 0
149digit 1
250digit 2
351digit 3
452digit 4
553digit 5
654digit 6
755digit 7
856digit 8
957digit 9
:58colon
;59semicolon
<60less-than
=61equals-to
>62greater-than
?63question mark
@64at sign
A65uppercase A
B66uppercase B
C67uppercase C
D68uppercase D
E69uppercase E
F70uppercase F
G71uppercase G
H72uppercase H
I73uppercase I
J74uppercase J
K75uppercase K
L76uppercase L
M77uppercase M
N78uppercase N
O79uppercase O
P80uppercase P
Q81uppercase Q
R82uppercase R
S83uppercase S
T84uppercase T
U85uppercase U
V86uppercase V
W87uppercase W
X88uppercase X
Y89uppercase Y
Z90uppercase Z
[91left square bracket
\92backslash
]93right square bracket
^94caret
_95underscore
`96grave accent
a97lowercase a
b98lowercase b
c99lowercase c
d100lowercase d
e101lowercase e
f102lowercase f
g103lowercase g
h104lowercase h
i105lowercase i
j106lowercase j
k107lowercase k
l108lowercase l
m109lowercase m
n110lowercase n
o111lowercase o
p112lowercase p
q113lowercase q
r114lowercase r
s115lowercase s
t116lowercase t
u117lowercase u
v118lowercase v
w119lowercase w
x120lowercase x
y121lowercase y
z122lowercase z
{123left curly brace
|124vertical bar
}125right curly brace
~126tilde

Unicode

international character-encoding system designed to support the electronic interchange, processing, and display of the written texts of the diverse languages of the modern and classical world. The Unicode Worldwide Character Standard includes letters, digits, diacritics, punctuation marks, and technical symbols for all the world’s principal written languages, using a uniform encoding scheme. The first version of Unicode was introduced in 1991; the most recent version contains almost 50,000 characters. Numerous encoding systems (including ASCII) predate Unicode. With Unicode (unlike earlier systems), the unique number provided for each character remains the same on any system that supports Unicode.

https://www.britannica.com/topic/Unicode

ASCII (American Standard Code for Information Interchange) became the first widespread encoding scheme. However, it’s limited to only 128 character definitions. This is fine for the most common English characters, numbers, and punctuation, but is a bit limiting for the rest of the world.

It became apparent that a new character encoding scheme was needed, which is when the Unicode standard was created. The objective of Unicode is to unify all the different encoding schemes so that the confusion between computers can be limited as much as possible.

These days, the Unicode standard defines values for over 128,000 characters and can be seen at the Unicode Consortium. It has several character encoding forms:

  • UTF-8: Only uses one byte (8 bits) to encode English characters. It can use a sequence of bytes to encode other characters. UTF-8 is widely used in email systems and on the internet.
  • UTF-16: Uses two bytes (16 bits) to encode the most commonly used characters. If needed, the additional characters can be represented by a pair of 16-bit numbers.
  • UTF-32: Uses four bytes (32 bits) to encode the characters. It became apparent that as the Unicode standard grew, a 16-bit number is too small to represent all the characters. UTF-32 is capable of representing every Unicode character as one number.

ISCII

Indian Script Code for Information Interchange (ISCII) is a coding scheme for representing various writing systems of India. It encodes the main Indic scripts and a Roman transliteration. 

What are the basic differences between Unicode and ISCII code?

ASCII, ISCII and Unicode are encoding languages with unique characteristics that define their usage.

ASCII uses a 7-bit encoding and ISCII uses an 8-bit which is an extension of ASCII while Unicode is a variable bit encoding that doesn’t fit into one 8 bit and generally uses 16-bit encoding.

I am listing the differences between ASCII below:

Unicode is standardised while ASCII as well as ISCII aren’t. ISCII are specific to Indian scripts and are less dynamic than Unicode.

Unicode represents most written languages in the world while ASCII does not.

ASCII has its equivalent within Unicode.

The basic difference between ‘Unicode and ISCII’ code:

Unicode:

  • Unicode uses 16-bit encoding and gives a code point for more over than 65000 characters.
  • It provides every character a special numeric value as well as a name.
  • It provides encode all the characters used for writing for almost all languages which is used all over the world.

ISCII:

  • 8-bit code is used in ISCII code  
  • It contains the general alphabet which is essential for the ten Indian script
  • It is originated from Brahmi script.

Leave a Reply

Your email address will not be published. Required fields are marked *