Files
004_comission/banson_hker/phase1-fix/doc/report.md
louiscklaw 72bacdd6b5 update,
2025-01-31 19:28:21 +08:00

5.7 KiB
Raw Blame History

Report:

describe the components involved

Purpose / Goals:

Implement a Shift Cipher Decrypter using Python. User can decrypt a long enough (more than 200 words) message which was originally shift cipher encrypted.

For demonstration purpose, the remaining half "encryption" was also implemented. (which is the vice versa of decryption)

Assumption / Requirement:

Assuming a use case, for which message to be encrypted:

- Only contains upper case letters, space characters, punctuation marks
- Space characters remain unchanged during the encryption.
- punctuation marks remain unchanged during the encryption.
- message is longer than 200 words, alphabet distribution follows the general pattern

Procedure / Terminology:

Given that the encrypted message is long enough. A shift cipher decrypt can guess the message without knowing k.  The most frequent letter will be E which is also reflected in the encryped message as letter shifting of the whole message(encrypted) are the same.

data types / variable declaration and initialization (data type used)

INTEGER

i.e.

ORD_a = ord('a') # 97

ORA_a is a variable storing 97 in integer format

STRING AND ARRAYS

...    
    with open(file_path,'r') as fi:
        # beginning of the process
        # read file and join the lines all
        lines = fi.readlines()
        e_temp = ''.join(lines)
...

at here:

  • lines is an array of string.
  • e_temp is a single lined string.

data collection, input and validation/ data processing

User input:

console

press 1 to start a encryption (encrypt file) - select a key you want (in numeric format)

press 2 to start a decryption (decrypt file)

press q to quit

modularity/ reusability/ portability

(TODO: need to capture from text book)

Terminology

  • decryption algorithm (k guessing)

Background:

Letter e is counted to have the most occurrence in daily english. As the process of shift xxx encryption is shifting letter by k(a unknown integer) times. Assuming the k used for the whole message are the same (or even in a regular pattern that already known). The letter occurrence will be reflected in the encrypted message (i.e. e -> k's shift -> m for this case) as well. That's why the k can be guessed by counting the most occurrence letter and assuming that is the letter e in the original message.

Implementation

need to replace this !!!
need to review comments !!!

def find_max_occurrence(char_occurrences):
    # find distance to the letter e (case in-sensitive)

    # get the letter of the most occurrences. i.e. m
    # by subtract between this letter to e, k can be guess

    # find max occurrence and its index
    max_idx = char_occurrences.index(max(char_occurrences))

    # subtract it with index of e -> 4
    return max_idx - 4

def count_letter_occurrence(txt_in):
    # letter e, as stated have the most occurrence in the message by statistics.
    # as 'Shift Cipher' is a encryption by letter shifting, the letters have good chance 
    # to have the most occurrence too in the encrypted text.
    output = [0] * 26    # bucket for 26 letters

    for char in txt_in:
        if char.isalpha():
            output[ord(char.lower()) - ORD_a] += 1

    # output contains the statistics of paragraph letter by letter
    return output

...

need to replace this !!!
need to review comments !!!

def decrypt_file(file_path):
    # will open an encrypted file and decrypt it by a guessed key
    
    with open(file_path,'r') as fi:
        # beginning of the process
        # read file and join the lines all
        lines = fi.readlines()
        e_temp = ''.join(lines)

        characters_distribution = count_letter_occurrence(e_temp)

        print('')
        print('distribution of letters in encrypted text (case insensitive, from a to z)')
        print(characters_distribution)

        print('')
        guess_k = find_max_occurrence(characters_distribution)
        print(f'guessed k: {guess_k}')

        print('')
        print('decrypted text:')
        decrypted_text = shift_cipher_decrypt(e_temp, guess_k)
        print(decrypted_text)
need to replace this !!!
need to review comments !!!


def shift_cipher_decrypt(ciphertext, key):
    plaintext = ""
    
    for char in ciphertext:
        if char.isalpha():
            ascii_offset = ORD_a if char.islower() else ORD_A  # Determine ASCII offset based on lowercase or uppercase letter
            
            # Calculate the distance of the target character from a or A
            distance = ord(char) - ascii_offset

            # Reverse the shift by subtracting the key and taking modulo 26 to wrap around 
            shifted_distance = (distance - key) % 26

            # Convert back to ASCII by adding the offset and get the corresponding character
            decrypted_char = chr(shifted_distance + ascii_offset)

            plaintext += decrypted_char
        else:
            # If it is not an alphabetic character, retain as is.
            plaintext += char
    
    return plaintext

progress

section 1

- week 1 
- week 2 

section 2

- week 3 
- week 4 
- week 5 

References

- [](https://github.com/dwyl/english-words)

section 1

- [blablabal](http://www.google.com)
- [blablabal](http://www.google.com)

section 2

- [blablabal](http://www.google.com)
- [blablabal](http://www.google.com)
- [blablabal](http://www.google.com)