5.7 KiB
Report:
describe the components involved
Purpose / Goals:
Implement a Shift Cipher Decrypter using Python.
User can decrypt a long enough (more than 200 words) message which was originally shift cipher encrypted
.
For demonstration purpose, the remaining half "encryption" was also implemented. (which is the vice versa of decryption)
Assumption / Requirement:
Assuming a use case, for which message to be encrypted:
- Only contains upper case letters, space characters, punctuation marks
- Space characters remain unchanged during the encryption.
- punctuation marks remain unchanged during the encryption.
- message is longer than 200 words, alphabet distribution follows the general pattern
Procedure / Terminology:
Given that the encrypted
message is long enough.
A shift cipher decrypt can guess the message without knowing k.
The most frequent letter will be ‘E’ which is also reflected in the encryped
message as letter shifting of the whole message(encrypted
) are the same.
data types / variable declaration and initialization (data type used)
INTEGER
i.e.
ORD_a = ord('a') # 97
ORA_a
is a variable storing 97
in integer format
STRING AND ARRAYS
...
with open(file_path,'r') as fi:
# beginning of the process
# read file and join the lines all
lines = fi.readlines()
e_temp = ''.join(lines)
...
at here:
lines
is an array of string.e_temp
is a single lined string.
data collection, input and validation/ data processing
User input:
console
press 1
to start a encryption (encrypt file)
- select a key you want (in numeric format)
press 2
to start a decryption (decrypt file)
press q
to quit
modularity/ reusability/ portability
(TODO: need to capture from text book)
Terminology
- decryption algorithm (k guessing)
Background:
Letter e
is counted to have the most occurrence in daily english. As the process of shift xxx encryption
is shifting letter by k(a unknown integer) times. Assuming the k used for the whole message are the same (or even in a regular pattern that already known). The letter occurrence will be reflected in the encrypted
message (i.e. e
-> k's shift -> m
for this case) as well. That's why the k can be guessed by counting the most occurrence letter and assuming that is the letter e
in the original message.
Implementation
need to replace this !!!
need to review comments !!!
def find_max_occurrence(char_occurrences):
# find distance to the letter e (case in-sensitive)
# get the letter of the most occurrences. i.e. m
# by subtract between this letter to e, k can be guess
# find max occurrence and its index
max_idx = char_occurrences.index(max(char_occurrences))
# subtract it with index of e -> 4
return max_idx - 4
def count_letter_occurrence(txt_in):
# letter e, as stated have the most occurrence in the message by statistics.
# as 'Shift Cipher' is a encryption by letter shifting, the letters have good chance
# to have the most occurrence too in the encrypted text.
output = [0] * 26 # bucket for 26 letters
for char in txt_in:
if char.isalpha():
output[ord(char.lower()) - ORD_a] += 1
# output contains the statistics of paragraph letter by letter
return output
...
need to replace this !!!
need to review comments !!!
def decrypt_file(file_path):
# will open an encrypted file and decrypt it by a guessed key
with open(file_path,'r') as fi:
# beginning of the process
# read file and join the lines all
lines = fi.readlines()
e_temp = ''.join(lines)
characters_distribution = count_letter_occurrence(e_temp)
print('')
print('distribution of letters in encrypted text (case insensitive, from a to z)')
print(characters_distribution)
print('')
guess_k = find_max_occurrence(characters_distribution)
print(f'guessed k: {guess_k}')
print('')
print('decrypted text:')
decrypted_text = shift_cipher_decrypt(e_temp, guess_k)
print(decrypted_text)
need to replace this !!!
need to review comments !!!
def shift_cipher_decrypt(ciphertext, key):
plaintext = ""
for char in ciphertext:
if char.isalpha():
ascii_offset = ORD_a if char.islower() else ORD_A # Determine ASCII offset based on lowercase or uppercase letter
# Calculate the distance of the target character from a or A
distance = ord(char) - ascii_offset
# Reverse the shift by subtracting the key and taking modulo 26 to wrap around
shifted_distance = (distance - key) % 26
# Convert back to ASCII by adding the offset and get the corresponding character
decrypted_char = chr(shifted_distance + ascii_offset)
plaintext += decrypted_char
else:
# If it is not an alphabetic character, retain as is.
plaintext += char
return plaintext
progress
section 1
- week 1
- week 2
section 2
- week 3
- week 4
- week 5
References
- [](https://github.com/dwyl/english-words)
section 1
- [blablabal](http://www.google.com)
- [blablabal](http://www.google.com)
section 2
- [blablabal](http://www.google.com)
- [blablabal](http://www.google.com)
- [blablabal](http://www.google.com)