217 lines
5.7 KiB
Markdown
217 lines
5.7 KiB
Markdown
# Report:
|
||
|
||
# describe the components involved
|
||
|
||
|
||
## Purpose / Goals:
|
||
|
||
Implement a Shift Cipher Decrypter using Python.
|
||
User can decrypt a long enough (more than 200 words) message which was originally `shift cipher encrypted`.
|
||
|
||
For demonstration purpose, the remaining half "encryption" was also implemented. (which is the vice versa of decryption)
|
||
|
||
|
||
## Assumption / Requirement:
|
||
|
||
Assuming a use case, for which message to be encrypted:
|
||
|
||
- Only contains upper case letters, space characters, punctuation marks
|
||
- Space characters remain unchanged during the encryption.
|
||
- punctuation marks remain unchanged during the encryption.
|
||
- message is longer than 200 words, alphabet distribution follows the general pattern
|
||
|
||
|
||
## Procedure / Terminology:
|
||
|
||
Given that the `encrypted` message is long enough.
|
||
A shift cipher decrypt can guess the message without knowing k.
|
||
The most frequent letter will be ‘E’ which is also reflected in the `encryped` message as letter shifting of the whole message(`encrypted`) are the same.
|
||
|
||
|
||
## data types / variable declaration and initialization (data type used)
|
||
|
||
### INTEGER
|
||
|
||
i.e.
|
||
|
||
```python
|
||
ORD_a = ord('a') # 97
|
||
```
|
||
|
||
`ORA_a` is a variable storing `97` in integer format
|
||
|
||
### STRING AND ARRAYS
|
||
|
||
```python
|
||
...
|
||
with open(file_path,'r') as fi:
|
||
# beginning of the process
|
||
# read file and join the lines all
|
||
lines = fi.readlines()
|
||
e_temp = ''.join(lines)
|
||
...
|
||
```
|
||
|
||
at here:
|
||
- `lines` is an array of string.
|
||
- `e_temp` is a single lined string.
|
||
|
||
|
||
## data collection, input and validation/ data processing
|
||
|
||
### User input:
|
||
|
||
console
|
||
|
||
<place a menu screen capture here>
|
||
|
||
<place a menu screen capture here>
|
||
|
||
press `1` to start a encryption (encrypt file)
|
||
- select a key you want (in numeric format)
|
||
|
||
<place a menu screen capture here>
|
||
|
||
press `2` to start a decryption (decrypt file)
|
||
|
||
<place a menu screen capture here>
|
||
|
||
press `q` to quit
|
||
|
||
|
||
## modularity/ reusability/ portability
|
||
|
||
(TODO: need to capture from text book)
|
||
|
||
### Terminology
|
||
|
||
- decryption algorithm (k guessing)
|
||
|
||
### Background:
|
||
|
||
Letter `e` is counted to have the most occurrence in daily english. As the process of `shift xxx encryption` is shifting letter by k(a unknown integer) times. Assuming the k used for the whole message are the same (or even in a regular pattern that already known). The letter occurrence will be reflected in the `encrypted` message (i.e. `e` -> k's shift -> `m` for this case) as well. That's why the k can be guessed by counting the most occurrence letter and assuming that is the letter `e` in the original message.
|
||
|
||
### Implementation
|
||
|
||
```python
|
||
need to replace this !!!
|
||
need to review comments !!!
|
||
|
||
def find_max_occurrence(char_occurrences):
|
||
# find distance to the letter e (case in-sensitive)
|
||
|
||
# get the letter of the most occurrences. i.e. m
|
||
# by subtract between this letter to e, k can be guess
|
||
|
||
# find max occurrence and its index
|
||
max_idx = char_occurrences.index(max(char_occurrences))
|
||
|
||
# subtract it with index of e -> 4
|
||
return max_idx - 4
|
||
|
||
def count_letter_occurrence(txt_in):
|
||
# letter e, as stated have the most occurrence in the message by statistics.
|
||
# as 'Shift Cipher' is a encryption by letter shifting, the letters have good chance
|
||
# to have the most occurrence too in the encrypted text.
|
||
output = [0] * 26 # bucket for 26 letters
|
||
|
||
for char in txt_in:
|
||
if char.isalpha():
|
||
output[ord(char.lower()) - ORD_a] += 1
|
||
|
||
# output contains the statistics of paragraph letter by letter
|
||
return output
|
||
|
||
...
|
||
|
||
```
|
||
|
||
<diagram showing for loop >
|
||
|
||
|
||
```python
|
||
need to replace this !!!
|
||
need to review comments !!!
|
||
|
||
def decrypt_file(file_path):
|
||
# will open an encrypted file and decrypt it by a guessed key
|
||
|
||
with open(file_path,'r') as fi:
|
||
# beginning of the process
|
||
# read file and join the lines all
|
||
lines = fi.readlines()
|
||
e_temp = ''.join(lines)
|
||
|
||
characters_distribution = count_letter_occurrence(e_temp)
|
||
|
||
print('')
|
||
print('distribution of letters in encrypted text (case insensitive, from a to z)')
|
||
print(characters_distribution)
|
||
|
||
print('')
|
||
guess_k = find_max_occurrence(characters_distribution)
|
||
print(f'guessed k: {guess_k}')
|
||
|
||
print('')
|
||
print('decrypted text:')
|
||
decrypted_text = shift_cipher_decrypt(e_temp, guess_k)
|
||
print(decrypted_text)
|
||
```
|
||
|
||
<diagram showing for loop >
|
||
|
||
```python
|
||
need to replace this !!!
|
||
need to review comments !!!
|
||
|
||
|
||
def shift_cipher_decrypt(ciphertext, key):
|
||
plaintext = ""
|
||
|
||
for char in ciphertext:
|
||
if char.isalpha():
|
||
ascii_offset = ORD_a if char.islower() else ORD_A # Determine ASCII offset based on lowercase or uppercase letter
|
||
|
||
# Calculate the distance of the target character from a or A
|
||
distance = ord(char) - ascii_offset
|
||
|
||
# Reverse the shift by subtracting the key and taking modulo 26 to wrap around
|
||
shifted_distance = (distance - key) % 26
|
||
|
||
# Convert back to ASCII by adding the offset and get the corresponding character
|
||
decrypted_char = chr(shifted_distance + ascii_offset)
|
||
|
||
plaintext += decrypted_char
|
||
else:
|
||
# If it is not an alphabetic character, retain as is.
|
||
plaintext += char
|
||
|
||
return plaintext
|
||
```
|
||
|
||
<diagram showing for loop >
|
||
|
||
|
||
## progress
|
||
|
||
### section 1
|
||
- week 1
|
||
- week 2
|
||
|
||
### section 2
|
||
- week 3
|
||
- week 4
|
||
- week 5
|
||
|
||
|
||
## References
|
||
- [](https://github.com/dwyl/english-words)
|
||
### section 1
|
||
- [blablabal](http://www.google.com)
|
||
- [blablabal](http://www.google.com)
|
||
|
||
### section 2
|
||
- [blablabal](http://www.google.com)
|
||
- [blablabal](http://www.google.com)
|
||
- [blablabal](http://www.google.com)
|