004_comission/banson_hker/phase1-fix/doc/report.md

# Report:

# describe the components involved


## Purpose / Goals:

Implement a Shift Cipher Decrypter using Python.
User can decrypt a long enough (more than 200 words) message which was originally `shift cipher encrypted`.

For demonstration purpose, the remaining half "encryption" was also implemented. (which is the vice versa of decryption)


## Assumption / Requirement:

Assuming a use case, for which message to be encrypted:

    - Only contains upper case letters, space characters, punctuation marks
    - Space characters remain unchanged during the encryption.
    - punctuation marks remain unchanged during the encryption.
    - message is longer than 200 words, alphabet distribution follows the general pattern


## Procedure / Terminology:

Given that the `encrypted` message is long enough.
A shift cipher decrypt can guess the message without knowing k.
The most frequent letter will be ‘E’ which is also reflected in the `encryped` message as letter shifting of the whole message(`encrypted`) are the same.


## data types / variable declaration and initialization (data type used)

### INTEGER

i.e.

```python
ORD_a = ord('a') # 97
```

`ORA_a` is a variable storing `97` in integer format

### STRING AND ARRAYS

```python
...
    with open(file_path,'r') as fi:
        # beginning of the process
        # read file and join the lines all
        lines = fi.readlines()
        e_temp = ''.join(lines)
...
```

at here:
- `lines` is an array of string.
- `e_temp` is a single lined string.


## data collection, input and validation/ data processing

### User input:

console

<place a menu screen capture here>

<place a menu screen capture here>

press `1` to start a encryption (encrypt file)
    - select a key you want (in numeric format)

<place a menu screen capture here>

press `2` to start a decryption (decrypt file)

<place a menu screen capture here>

press `q` to quit


## modularity/ reusability/ portability

(TODO: need to capture from text book)

### Terminology

- decryption algorithm (k guessing)

### Background:

Letter `e` is counted to have the most occurrence in daily english. As the process of `shift xxx encryption` is shifting letter by k(a unknown integer) times. Assuming the k used for the whole message are the same (or even in a regular pattern that already known). The letter occurrence will be reflected in the `encrypted` message (i.e. `e` -> k's shift -> `m` for this case) as well. That's why the k can be guessed by counting the most occurrence letter and assuming that is the letter `e` in the original message.

### Implementation

```python
need to replace this !!!
need to review comments !!!

def find_max_occurrence(char_occurrences):
    # find distance to the letter e (case in-sensitive)

    # get the letter of the most occurrences. i.e. m
    # by subtract between this letter to e, k can be guess

    # find max occurrence and its index
    max_idx = char_occurrences.index(max(char_occurrences))

    # subtract it with index of e -> 4
    return max_idx - 4

def count_letter_occurrence(txt_in):
    # letter e, as stated have the most occurrence in the message by statistics.
    # as 'Shift Cipher' is a encryption by letter shifting, the letters have good chance
    # to have the most occurrence too in the encrypted text.
    output = [0] * 26    # bucket for 26 letters

    for char in txt_in:
        if char.isalpha():
            output[ord(char.lower()) - ORD_a] += 1

    # output contains the statistics of paragraph letter by letter
    return output

...

```

<diagram showing for loop >


```python
need to replace this !!!
need to review comments !!!

def decrypt_file(file_path):
    # will open an encrypted file and decrypt it by a guessed key

    with open(file_path,'r') as fi:
        # beginning of the process
        # read file and join the lines all
        lines = fi.readlines()
        e_temp = ''.join(lines)

        characters_distribution = count_letter_occurrence(e_temp)

        print('')
        print('distribution of letters in encrypted text (case insensitive, from a to z)')
        print(characters_distribution)

        print('')
        guess_k = find_max_occurrence(characters_distribution)
        print(f'guessed k: {guess_k}')

        print('')
        print('decrypted text:')
        decrypted_text = shift_cipher_decrypt(e_temp, guess_k)
        print(decrypted_text)
```

<diagram showing for loop >

```python
need to replace this !!!
need to review comments !!!


def shift_cipher_decrypt(ciphertext, key):
    plaintext = ""

    for char in ciphertext:
        if char.isalpha():
            ascii_offset = ORD_a if char.islower() else ORD_A  # Determine ASCII offset based on lowercase or uppercase letter

            # Calculate the distance of the target character from a or A
            distance = ord(char) - ascii_offset

            # Reverse the shift by subtracting the key and taking modulo 26 to wrap around
            shifted_distance = (distance - key) % 26

            # Convert back to ASCII by adding the offset and get the corresponding character
            decrypted_char = chr(shifted_distance + ascii_offset)

            plaintext += decrypted_char
        else:
            # If it is not an alphabetic character, retain as is.
            plaintext += char

    return plaintext
```

<diagram showing for loop >


## progress

### section 1
    - week 1
    - week 2

### section 2
    - week 3
    - week 4
    - week 5


## References
    - [](https://github.com/dwyl/english-words)
### section 1
    - [blablabal](http://www.google.com)
    - [blablabal](http://www.google.com)

### section 2
    - [blablabal](http://www.google.com)
    - [blablabal](http://www.google.com)
    - [blablabal](http://www.google.com)