# CS4185 Multimedia Technologies and Applications

## What does this program do?

- Loads aninput imageand 1000 database imagesto be compared with it.

- Converts the images to grayscale

- Compares the base image with the database image using pixel-by-pixel difference.

- Displays the numerical matching parameters obtained.

- Displays the input image and the best match result.

## Basic Requirements (80%)

- Automatically changing some setting according to the extracted features is allowed.

Students are required to finish the following four tasks in the basic requirements:

1.  Improve the number of correctly matched images (20%)
1.  Modify the above program to retrieve similar images (20%)

    - Utilize color information.

      - Color Models

        - The RGB Color Models
          - Quantization

      - Color Histograms
        - disadvantages
          1. Color similarity across histogram bins is not considered
          1. Spatial color layout is not considered

    - Using different layout.

      - Color Layout
        - http://en.wikipedia.org/wiki/Color_layout_descriptor
        - Need for Color Layout -> Global color features give too many false positives.
        - How it works:
          - Divide the whole image into sub-blocks.
          - Extract features from each sub-block.
        - Can we go one step further?
          - Divide the image into regions based on color feature concentration.
          - This process is called segmentation

    - Utilize edge and shape information.

      - Circle Hough transform:

        - https://en.wikipedia.org/wiki/Circle_Hough_Transform
        - http://docs.opencv.org/doc/tutorials/imgproc/imgtrans/hough_circle/hough_circle.html

      - segmentation

        - Features for local regions in the image
        - Interest points: corners, edges and others
        - Keypoints:
          - points in images, which are invariant to image translation, scale and rotation, and are minimally affected by noise and small distortions
        - Scale-invariant feature transform (SIFT) by David Lowe

          - Image content is transformed into local feature coordinates that are invariant to translation, rotation, scale, and other imaging parameters
          - advantages:

            - Locality:features are local, so robust to occlusion and clutter (no prior segmentation)
            - Distinctiveness:individual features can be matched to a large database of objects
            - Quantity:many features can be generated for even small objects
            - Efficiency:close to real-time performance - Extensibility:can easily be extended to wide range of differing feature types, with each adding robustness

          - Detect keypoints using the SIFT detector
            - `tutorial3` -> page 13 ~ 18
          - R-trees, SR-Trees ?

    - Features fusion
      - Image Features Measures
      - Image Distance Measures
      - db
        - tutorial4 -> page 9

1.  Improve on the Precision (20%)

    - The percentage of retrieved images that are matched

1.  Improve on the Recall (20%)

    - the percentage of matched images that are retrieved.

## The extension includes two parts,

technical improvementand UI design.

### The technical improvement :

- 15% of marks will be given based on the technical difficulties
- may include new retrieval algorithms
  - (e.g., 80+% of precision and 55+% of recall),
- high dimensional data indexing
  - (efficiently storing and managing the features extracted from the database, modifying the program so that it does not need to compute the features every time),
- retrieval algorithms for particular types of images(e.g., sunset images, images containing human faces),
- a crawler to obtain images from the internet, or adding semantic informationto help improve the retrieval performance.

### UI design

- 10% will be given based on the UI design.

### Submission

- Program
- Demo
- Report