{ "cells": [ { "cell_type": "markdown", "id": "e7d963fc", "metadata": {}, "source": [ "# Working with Data frame" ] }, { "cell_type": "markdown", "id": "ffb6ef0a", "metadata": {}, "source": [ "## 1. Pandas Library" ] }, { "cell_type": "markdown", "id": "a080c24c", "metadata": {}, "source": [ "Pandas library is the one of the most populated used library for manipulating with data. We use the Series and Dataframe data structure extensively as these are much more powerful and useful to manipulate with data when compare with list and dictionary in python.\n", "\n", "There's another very popular library called Numpy. Pandas bulid on top of it and we usually use pandas directly." ] }, { "cell_type": "code", "execution_count": 1, "id": "122da828", "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "markdown", "id": "af8c9270", "metadata": {}, "source": [ "## 2. Pandas Series" ] }, { "cell_type": "markdown", "id": "7c82119d", "metadata": {}, "source": [ "A series is very similar to a list. We can easily convert a list to a simple series. A series also has index." ] }, { "cell_type": "code", "execution_count": 2, "id": "a9bfcdbb", "metadata": {}, "outputs": [], "source": [ "stocks = [\"AAPL\", \"BABA\", \"DIDI\", \"MSFT\", \"AMZN\", \"ADBE\", \"TSLA\", \"MS\", \"V\", \"MA\", \"GS\"]" ] }, { "cell_type": "code", "execution_count": 3, "id": "c9026b35", "metadata": {}, "outputs": [], "source": [ "stocks_series = pd.Series(stocks)" ] }, { "cell_type": "code", "execution_count": 4, "id": "66a0bbd3", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 AAPL\n", "1 BABA\n", "2 DIDI\n", "3 MSFT\n", "4 AMZN\n", "5 ADBE\n", "6 TSLA\n", "7 MS\n", "8 V\n", "9 MA\n", "10 GS\n", "dtype: object" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stocks_series" ] }, { "cell_type": "markdown", "id": "22f5a6b3", "metadata": {}, "source": [ "Getting the values using index" ] }, { "cell_type": "code", "execution_count": 7, "id": "db7b2e69", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1 BABA\n", "2 DIDI\n", "dtype: object" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stocks_series[1:3]" ] }, { "cell_type": "code", "execution_count": 10, "id": "31dd2927", "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "'AAPL'" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stocks_series[0]" ] }, { "cell_type": "code", "execution_count": 13, "id": "935307a6", "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "2 DIDI\n", "3 MSFT\n", "4 AMZN\n", "5 ADBE\n", "dtype: object" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stocks_series[2:6]" ] }, { "cell_type": "markdown", "id": "88c61e98", "metadata": {}, "source": [ "The difference between list and series is that we can use not use interger as index. Now it looks more like a dictionary. And we can create it from a dictionary" ] }, { "cell_type": "code", "execution_count": 14, "id": "59d345f2", "metadata": { "scrolled": true }, "outputs": [ { "ename": "NameError", "evalue": "name 'sales' is not defined", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", "Cell \u001b[0;32mIn [14], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m sales_series \u001b[38;5;241m=\u001b[39m pd\u001b[38;5;241m.\u001b[39mSeries(\u001b[43msales\u001b[49m)\n", "\u001b[0;31mNameError\u001b[0m: name 'sales' is not defined" ] } ], "source": [ "sales_series = pd.Series(sales)" ] }, { "cell_type": "code", "execution_count": 10, "id": "64b9ea46", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Central Branch 10000\n", "TST Branch 2000\n", "Mongkok Branch 3000\n", "dtype: int64" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sales_series" ] }, { "cell_type": "markdown", "id": "b4e8c74c", "metadata": {}, "source": [ "Getting the number using index" ] }, { "cell_type": "code", "execution_count": 11, "id": "7e79920a", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "10000" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sales_series[\"Central Branch\"]" ] }, { "cell_type": "code", "execution_count": 8, "id": "cbaa8383", "metadata": {}, "outputs": [], "source": [ "sales = {'Central Branch' : 10000,\n", " 'TST Branch' : 2000,\n", " 'Mongkok Branch' : 3000}" ] }, { "cell_type": "markdown", "id": "388d2ece", "metadata": {}, "source": [ "## 3 Pandas Dataframe" ] }, { "cell_type": "markdown", "id": "c756f438", "metadata": {}, "source": [ "You can consider the Series is one column of data on an excel spreadsheet. A dataframe has mulitple series and you can consider that the data of a whole spreadsheet" ] }, { "cell_type": "markdown", "id": "12aa8a6f", "metadata": {}, "source": [ "### 3.1 Create dataframe from csv" ] }, { "cell_type": "code", "execution_count": 15, "id": "0a88f1c9", "metadata": {}, "outputs": [ { "ename": "FileNotFoundError", "evalue": "[Errno 2] No such file or directory: 'AAPL.csv'", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mFileNotFoundError\u001b[0m Traceback (most recent call last)", "Cell \u001b[0;32mIn [15], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m aapl \u001b[38;5;241m=\u001b[39m \u001b[43mpd\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mread_csv\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mAAPL.csv\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\n", "File \u001b[0;32m~/.local/share/virtualenvs/Note-Vc8kZtnp/lib64/python3.11/site-packages/pandas/util/_decorators.py:211\u001b[0m, in \u001b[0;36mdeprecate_kwarg.._deprecate_kwarg..wrapper\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 209\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m 210\u001b[0m kwargs[new_arg_name] \u001b[38;5;241m=\u001b[39m new_arg_value\n\u001b[0;32m--> 211\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mfunc\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n", "File \u001b[0;32m~/.local/share/virtualenvs/Note-Vc8kZtnp/lib64/python3.11/site-packages/pandas/util/_decorators.py:331\u001b[0m, in \u001b[0;36mdeprecate_nonkeyword_arguments..decorate..wrapper\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 325\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mlen\u001b[39m(args) \u001b[38;5;241m>\u001b[39m num_allow_args:\n\u001b[1;32m 326\u001b[0m warnings\u001b[38;5;241m.\u001b[39mwarn(\n\u001b[1;32m 327\u001b[0m msg\u001b[38;5;241m.\u001b[39mformat(arguments\u001b[38;5;241m=\u001b[39m_format_argument_list(allow_args)),\n\u001b[1;32m 328\u001b[0m \u001b[38;5;167;01mFutureWarning\u001b[39;00m,\n\u001b[1;32m 329\u001b[0m stacklevel\u001b[38;5;241m=\u001b[39mfind_stack_level(),\n\u001b[1;32m 330\u001b[0m )\n\u001b[0;32m--> 331\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mfunc\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n", "File \u001b[0;32m~/.local/share/virtualenvs/Note-Vc8kZtnp/lib64/python3.11/site-packages/pandas/io/parsers/readers.py:950\u001b[0m, in \u001b[0;36mread_csv\u001b[0;34m(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)\u001b[0m\n\u001b[1;32m 935\u001b[0m kwds_defaults \u001b[38;5;241m=\u001b[39m _refine_defaults_read(\n\u001b[1;32m 936\u001b[0m dialect,\n\u001b[1;32m 937\u001b[0m delimiter,\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 946\u001b[0m defaults\u001b[38;5;241m=\u001b[39m{\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mdelimiter\u001b[39m\u001b[38;5;124m\"\u001b[39m: \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m,\u001b[39m\u001b[38;5;124m\"\u001b[39m},\n\u001b[1;32m 947\u001b[0m )\n\u001b[1;32m 948\u001b[0m kwds\u001b[38;5;241m.\u001b[39mupdate(kwds_defaults)\n\u001b[0;32m--> 950\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43m_read\u001b[49m\u001b[43m(\u001b[49m\u001b[43mfilepath_or_buffer\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mkwds\u001b[49m\u001b[43m)\u001b[49m\n", "File \u001b[0;32m~/.local/share/virtualenvs/Note-Vc8kZtnp/lib64/python3.11/site-packages/pandas/io/parsers/readers.py:605\u001b[0m, in \u001b[0;36m_read\u001b[0;34m(filepath_or_buffer, kwds)\u001b[0m\n\u001b[1;32m 602\u001b[0m _validate_names(kwds\u001b[38;5;241m.\u001b[39mget(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mnames\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;28;01mNone\u001b[39;00m))\n\u001b[1;32m 604\u001b[0m \u001b[38;5;66;03m# Create the parser.\u001b[39;00m\n\u001b[0;32m--> 605\u001b[0m parser \u001b[38;5;241m=\u001b[39m \u001b[43mTextFileReader\u001b[49m\u001b[43m(\u001b[49m\u001b[43mfilepath_or_buffer\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwds\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 607\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m chunksize \u001b[38;5;129;01mor\u001b[39;00m iterator:\n\u001b[1;32m 608\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m parser\n", "File \u001b[0;32m~/.local/share/virtualenvs/Note-Vc8kZtnp/lib64/python3.11/site-packages/pandas/io/parsers/readers.py:1442\u001b[0m, in \u001b[0;36mTextFileReader.__init__\u001b[0;34m(self, f, engine, **kwds)\u001b[0m\n\u001b[1;32m 1439\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39moptions[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mhas_index_names\u001b[39m\u001b[38;5;124m\"\u001b[39m] \u001b[38;5;241m=\u001b[39m kwds[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mhas_index_names\u001b[39m\u001b[38;5;124m\"\u001b[39m]\n\u001b[1;32m 1441\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mhandles: IOHandles \u001b[38;5;241m|\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[0;32m-> 1442\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_engine \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_make_engine\u001b[49m\u001b[43m(\u001b[49m\u001b[43mf\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mengine\u001b[49m\u001b[43m)\u001b[49m\n", "File \u001b[0;32m~/.local/share/virtualenvs/Note-Vc8kZtnp/lib64/python3.11/site-packages/pandas/io/parsers/readers.py:1735\u001b[0m, in \u001b[0;36mTextFileReader._make_engine\u001b[0;34m(self, f, engine)\u001b[0m\n\u001b[1;32m 1733\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mb\u001b[39m\u001b[38;5;124m\"\u001b[39m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;129;01min\u001b[39;00m mode:\n\u001b[1;32m 1734\u001b[0m mode \u001b[38;5;241m+\u001b[39m\u001b[38;5;241m=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mb\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[0;32m-> 1735\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mhandles \u001b[38;5;241m=\u001b[39m \u001b[43mget_handle\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 1736\u001b[0m \u001b[43m \u001b[49m\u001b[43mf\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1737\u001b[0m \u001b[43m \u001b[49m\u001b[43mmode\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1738\u001b[0m \u001b[43m \u001b[49m\u001b[43mencoding\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43moptions\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mget\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mencoding\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01mNone\u001b[39;49;00m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1739\u001b[0m \u001b[43m \u001b[49m\u001b[43mcompression\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43moptions\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mget\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mcompression\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01mNone\u001b[39;49;00m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1740\u001b[0m \u001b[43m \u001b[49m\u001b[43mmemory_map\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43moptions\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mget\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mmemory_map\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01mFalse\u001b[39;49;00m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1741\u001b[0m \u001b[43m \u001b[49m\u001b[43mis_text\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mis_text\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1742\u001b[0m \u001b[43m \u001b[49m\u001b[43merrors\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43moptions\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mget\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mencoding_errors\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mstrict\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1743\u001b[0m \u001b[43m \u001b[49m\u001b[43mstorage_options\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43moptions\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mget\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mstorage_options\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01mNone\u001b[39;49;00m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1744\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 1745\u001b[0m \u001b[38;5;28;01massert\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mhandles \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[1;32m 1746\u001b[0m f \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mhandles\u001b[38;5;241m.\u001b[39mhandle\n", "File \u001b[0;32m~/.local/share/virtualenvs/Note-Vc8kZtnp/lib64/python3.11/site-packages/pandas/io/common.py:856\u001b[0m, in \u001b[0;36mget_handle\u001b[0;34m(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)\u001b[0m\n\u001b[1;32m 851\u001b[0m \u001b[38;5;28;01melif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(handle, \u001b[38;5;28mstr\u001b[39m):\n\u001b[1;32m 852\u001b[0m \u001b[38;5;66;03m# Check whether the filename is to be opened in binary mode.\u001b[39;00m\n\u001b[1;32m 853\u001b[0m \u001b[38;5;66;03m# Binary mode does not support 'encoding' and 'newline'.\u001b[39;00m\n\u001b[1;32m 854\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m ioargs\u001b[38;5;241m.\u001b[39mencoding \u001b[38;5;129;01mand\u001b[39;00m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mb\u001b[39m\u001b[38;5;124m\"\u001b[39m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;129;01min\u001b[39;00m ioargs\u001b[38;5;241m.\u001b[39mmode:\n\u001b[1;32m 855\u001b[0m \u001b[38;5;66;03m# Encoding\u001b[39;00m\n\u001b[0;32m--> 856\u001b[0m handle \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mopen\u001b[39;49m\u001b[43m(\u001b[49m\n\u001b[1;32m 857\u001b[0m \u001b[43m \u001b[49m\u001b[43mhandle\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 858\u001b[0m \u001b[43m \u001b[49m\u001b[43mioargs\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mmode\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 859\u001b[0m \u001b[43m \u001b[49m\u001b[43mencoding\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mioargs\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mencoding\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 860\u001b[0m \u001b[43m \u001b[49m\u001b[43merrors\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43merrors\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 861\u001b[0m \u001b[43m \u001b[49m\u001b[43mnewline\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[1;32m 862\u001b[0m \u001b[43m \u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 863\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m 864\u001b[0m \u001b[38;5;66;03m# Binary mode\u001b[39;00m\n\u001b[1;32m 865\u001b[0m handle \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mopen\u001b[39m(handle, ioargs\u001b[38;5;241m.\u001b[39mmode)\n", "\u001b[0;31mFileNotFoundError\u001b[0m: [Errno 2] No such file or directory: 'AAPL.csv'" ] } ], "source": [ "aapl = pd.read_csv(\"AAPL.csv\")" ] }, { "cell_type": "code", "execution_count": 16, "id": "20064351", "metadata": {}, "outputs": [ { "ename": "NameError", "evalue": "name 'aapl' is not defined", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", "Cell \u001b[0;32mIn [16], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43maapl\u001b[49m\n", "\u001b[0;31mNameError\u001b[0m: name 'aapl' is not defined" ] } ], "source": [ "aapl" ] }, { "cell_type": "code", "execution_count": 14, "id": "a6ae2efc", "metadata": {}, "outputs": [], "source": [ "aapl_proper_index = pd.read_csv(\"AAPL.csv\", parse_dates=True, index_col='Date')" ] }, { "cell_type": "code", "execution_count": 15, "id": "92625a87", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
OpenHighLowCloseAdj CloseVolume
Date
2019-10-2861.85500062.31250061.68000062.26250161.65081096572800
2019-10-2962.24250062.43750060.64250260.82249860.224953142839600
2019-10-3061.18999961.32500160.30250260.81499960.217525124522000
2019-10-3161.81000162.29250059.31499962.18999961.579021139162000
2019-11-0162.38499863.98249862.29000163.95500263.326683151125200
.....................
2020-10-21116.669998118.709999116.449997116.870003116.87000389946000
2020-10-22117.449997118.040001114.589996115.750000115.750000101988000
2020-10-23116.389999116.550003114.279999115.040001115.04000182572600
2020-10-26114.010002116.550003112.879997115.050003115.050003111850700
2020-10-27115.489998117.279999114.540001116.599998116.59999891927700
\n", "

253 rows × 6 columns

\n", "
" ], "text/plain": [ " Open High Low Close Adj Close \\\n", "Date \n", "2019-10-28 61.855000 62.312500 61.680000 62.262501 61.650810 \n", "2019-10-29 62.242500 62.437500 60.642502 60.822498 60.224953 \n", "2019-10-30 61.189999 61.325001 60.302502 60.814999 60.217525 \n", "2019-10-31 61.810001 62.292500 59.314999 62.189999 61.579021 \n", "2019-11-01 62.384998 63.982498 62.290001 63.955002 63.326683 \n", "... ... ... ... ... ... \n", "2020-10-21 116.669998 118.709999 116.449997 116.870003 116.870003 \n", "2020-10-22 117.449997 118.040001 114.589996 115.750000 115.750000 \n", "2020-10-23 116.389999 116.550003 114.279999 115.040001 115.040001 \n", "2020-10-26 114.010002 116.550003 112.879997 115.050003 115.050003 \n", "2020-10-27 115.489998 117.279999 114.540001 116.599998 116.599998 \n", "\n", " Volume \n", "Date \n", "2019-10-28 96572800 \n", "2019-10-29 142839600 \n", "2019-10-30 124522000 \n", "2019-10-31 139162000 \n", "2019-11-01 151125200 \n", "... ... \n", "2020-10-21 89946000 \n", "2020-10-22 101988000 \n", "2020-10-23 82572600 \n", "2020-10-26 111850700 \n", "2020-10-27 91927700 \n", "\n", "[253 rows x 6 columns]" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "aapl_proper_index" ] }, { "cell_type": "markdown", "id": "7a3a1988", "metadata": {}, "source": [ "### 3.2 From Quandl" ] }, { "cell_type": "code", "execution_count": 18, "id": "da0ee6dc", "metadata": { "scrolled": false }, "outputs": [], "source": [ "import quandl" ] }, { "cell_type": "code", "execution_count": 19, "id": "3d0ac03e", "metadata": {}, "outputs": [], "source": [ "quandl.ApiConfig.api_key = 'x9M_pZutNNPnha1WDdjZ'\n", "ck = quandl.get('HKEX/00001', start_date='2020-10-20', end_date='2021-10-20')" ] }, { "cell_type": "code", "execution_count": 20, "id": "ebda1d4c", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Nominal PriceNet ChangeChange (%)BidAskP/E(x)HighLowPrevious CloseShare Volume (000)Turnover (000)Lot Size
Date
2020-10-2046.05NoneNone46.0546.10None46.3545.8546.354193.0192921.0None
2020-10-2146.15NoneNone46.1546.20None46.5045.9546.054830.0223077.0None
2020-10-2246.10NoneNone46.1046.15None46.3545.9046.154902.0226000.0None
2020-10-2346.40NoneNone46.4046.45None46.5045.8046.103815.0176451.0None
2020-10-2746.75NoneNone46.7546.80None47.1546.5046.4012095.0566845.0None
.......................................
2021-10-1252.60NoneNone52.6052.65None53.1052.3552.952712.0142802.0None
2021-10-1552.50NoneNone52.5052.55None52.9552.0052.605067.0266129.0None
2021-10-1852.70NoneNone52.7052.75None53.0052.3052.504036.0212393.0None
2021-10-1953.30NoneNone53.2553.30None53.5052.9552.702484.0132423.0None
2021-10-2053.25NoneNone53.2053.25None53.4052.8053.303649.0193972.0None
\n", "

247 rows × 12 columns

\n", "
" ], "text/plain": [ " Nominal Price Net Change Change (%) Bid Ask P/E(x) High \\\n", "Date \n", "2020-10-20 46.05 None None 46.05 46.10 None 46.35 \n", "2020-10-21 46.15 None None 46.15 46.20 None 46.50 \n", "2020-10-22 46.10 None None 46.10 46.15 None 46.35 \n", "2020-10-23 46.40 None None 46.40 46.45 None 46.50 \n", "2020-10-27 46.75 None None 46.75 46.80 None 47.15 \n", "... ... ... ... ... ... ... ... \n", "2021-10-12 52.60 None None 52.60 52.65 None 53.10 \n", "2021-10-15 52.50 None None 52.50 52.55 None 52.95 \n", "2021-10-18 52.70 None None 52.70 52.75 None 53.00 \n", "2021-10-19 53.30 None None 53.25 53.30 None 53.50 \n", "2021-10-20 53.25 None None 53.20 53.25 None 53.40 \n", "\n", " Low Previous Close Share Volume (000) Turnover (000) Lot Size \n", "Date \n", "2020-10-20 45.85 46.35 4193.0 192921.0 None \n", "2020-10-21 45.95 46.05 4830.0 223077.0 None \n", "2020-10-22 45.90 46.15 4902.0 226000.0 None \n", "2020-10-23 45.80 46.10 3815.0 176451.0 None \n", "2020-10-27 46.50 46.40 12095.0 566845.0 None \n", "... ... ... ... ... ... \n", "2021-10-12 52.35 52.95 2712.0 142802.0 None \n", "2021-10-15 52.00 52.60 5067.0 266129.0 None \n", "2021-10-18 52.30 52.50 4036.0 212393.0 None \n", "2021-10-19 52.95 52.70 2484.0 132423.0 None \n", "2021-10-20 52.80 53.30 3649.0 193972.0 None \n", "\n", "[247 rows x 12 columns]" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ck" ] }, { "cell_type": "markdown", "id": "db245a3b", "metadata": {}, "source": [ "### 3.3 From Series" ] }, { "cell_type": "code", "execution_count": 21, "id": "c389d843", "metadata": {}, "outputs": [], "source": [ "costs = {'Central Branch' : 300000,\n", " 'TST Branch' : 50000,\n", " 'Mongkok Branch' : 20000}" ] }, { "cell_type": "code", "execution_count": 22, "id": "b6936c74", "metadata": {}, "outputs": [ { "ename": "NameError", "evalue": "name 'sales' is not defined", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", "Cell \u001b[0;32mIn [22], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m branch_summary \u001b[38;5;241m=\u001b[39m pd\u001b[38;5;241m.\u001b[39mDataFrame({\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124msales\u001b[39m\u001b[38;5;124m\"\u001b[39m: \u001b[43msales\u001b[49m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mcosts\u001b[39m\u001b[38;5;124m\"\u001b[39m: costs})\n", "\u001b[0;31mNameError\u001b[0m: name 'sales' is not defined" ] } ], "source": [ "branch_summary = pd.DataFrame({\"sales\": sales, \"costs\": costs})" ] }, { "cell_type": "code", "execution_count": 21, "id": "9107a2e3", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
salescosts
Central Branch10000300000
TST Branch200050000
Mongkok Branch300020000
\n", "
" ], "text/plain": [ " sales costs\n", "Central Branch 10000 300000\n", "TST Branch 2000 50000\n", "Mongkok Branch 3000 20000" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "branch_summary" ] }, { "cell_type": "markdown", "id": "5bab7a42", "metadata": {}, "source": [ "### 3.4 Getting data from dataframe (getting rows with date)" ] }, { "cell_type": "code", "execution_count": 23, "id": "6a84288d", "metadata": {}, "outputs": [ { "ename": "NameError", "evalue": "name 'aapl_proper_index' is not defined", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", "Cell \u001b[0;32mIn [23], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43maapl_proper_index\u001b[49m\u001b[38;5;241m.\u001b[39mloc[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m2019-10-30\u001b[39m\u001b[38;5;124m\"\u001b[39m]\n", "\u001b[0;31mNameError\u001b[0m: name 'aapl_proper_index' is not defined" ] } ], "source": [ "aapl_proper_index.loc[\"2019-10-30\"]" ] }, { "cell_type": "code", "execution_count": 23, "id": "60c748c1", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
OpenHighLowCloseAdj CloseVolume
Date
2019-10-3061.18999961.32500160.30250260.81499960.217525124522000
2019-10-3161.81000162.29250059.31499962.18999961.579021139162000
2019-11-0162.38499863.98249862.29000163.95500263.326683151125200
2019-11-0464.33249764.46250263.84500164.37500063.742554103272000
2019-11-0564.26249764.54750164.08000264.28250163.65097079897600
2019-11-0664.19249764.37249863.84249964.30999863.67819275864400
2019-11-0764.68499865.08750264.52749664.85749864.41311694940400
2019-11-0864.67250165.11000164.21250265.03500464.58940969986400
2019-11-1164.57499765.61750064.57000065.55000365.10087681821200
2019-11-1265.38749765.69750265.23000365.48999865.04128387388800
2019-11-1365.28250166.19500065.26750266.11750065.664490102734400
2019-11-1465.93750066.22000165.52500265.66000465.21012189182800
2019-11-1565.91999866.44500065.75250266.44000265.984779100206400
\n", "
" ], "text/plain": [ " Open High Low Close Adj Close Volume\n", "Date \n", "2019-10-30 61.189999 61.325001 60.302502 60.814999 60.217525 124522000\n", "2019-10-31 61.810001 62.292500 59.314999 62.189999 61.579021 139162000\n", "2019-11-01 62.384998 63.982498 62.290001 63.955002 63.326683 151125200\n", "2019-11-04 64.332497 64.462502 63.845001 64.375000 63.742554 103272000\n", "2019-11-05 64.262497 64.547501 64.080002 64.282501 63.650970 79897600\n", "2019-11-06 64.192497 64.372498 63.842499 64.309998 63.678192 75864400\n", "2019-11-07 64.684998 65.087502 64.527496 64.857498 64.413116 94940400\n", "2019-11-08 64.672501 65.110001 64.212502 65.035004 64.589409 69986400\n", "2019-11-11 64.574997 65.617500 64.570000 65.550003 65.100876 81821200\n", "2019-11-12 65.387497 65.697502 65.230003 65.489998 65.041283 87388800\n", "2019-11-13 65.282501 66.195000 65.267502 66.117500 65.664490 102734400\n", "2019-11-14 65.937500 66.220001 65.525002 65.660004 65.210121 89182800\n", "2019-11-15 65.919998 66.445000 65.752502 66.440002 65.984779 100206400" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "aapl_proper_index.loc[\"2019-10-30\":\"2019-11-15\"]" ] }, { "cell_type": "code", "execution_count": 24, "id": "b0c747d8", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
OpenHighLowCloseAdj CloseVolume
Date
2019-11-0162.38499863.98249862.29000163.95500263.326683151125200
2019-11-0464.33249764.46250263.84500164.37500063.742554103272000
2019-11-0564.26249764.54750164.08000264.28250163.65097079897600
2019-11-0664.19249764.37249863.84249964.30999863.67819275864400
2019-11-0764.68499865.08750264.52749664.85749864.41311694940400
2019-11-0864.67250165.11000164.21250265.03500464.58940969986400
2019-11-1164.57499765.61750064.57000065.55000365.10087681821200
2019-11-1265.38749765.69750265.23000365.48999865.04128387388800
2019-11-1365.28250166.19500065.26750266.11750065.664490102734400
2019-11-1465.93750066.22000165.52500265.66000465.21012189182800
2019-11-1565.91999866.44500065.75250266.44000265.984779100206400
2019-11-1866.44999766.85749866.05750366.77500266.31749086703200
2019-11-1966.97499867.00000066.34750466.57250266.11637176167200
2019-11-2066.38500266.51999765.09999865.79750165.346687106234400
2019-11-2165.92250166.00250265.29499865.50250265.053703121395200
2019-11-2265.64749965.79499865.20999965.44500064.99659765325200
2019-11-2565.67749866.61000165.62999766.59249966.13623084020400
2019-11-2666.73500166.79000165.62500066.07250265.619789105207600
2019-11-2766.39499766.99500366.32749966.95999966.50121365235600
2019-11-2966.65000267.00000066.47499866.81250066.35472946617600
\n", "
" ], "text/plain": [ " Open High Low Close Adj Close Volume\n", "Date \n", "2019-11-01 62.384998 63.982498 62.290001 63.955002 63.326683 151125200\n", "2019-11-04 64.332497 64.462502 63.845001 64.375000 63.742554 103272000\n", "2019-11-05 64.262497 64.547501 64.080002 64.282501 63.650970 79897600\n", "2019-11-06 64.192497 64.372498 63.842499 64.309998 63.678192 75864400\n", "2019-11-07 64.684998 65.087502 64.527496 64.857498 64.413116 94940400\n", "2019-11-08 64.672501 65.110001 64.212502 65.035004 64.589409 69986400\n", "2019-11-11 64.574997 65.617500 64.570000 65.550003 65.100876 81821200\n", "2019-11-12 65.387497 65.697502 65.230003 65.489998 65.041283 87388800\n", "2019-11-13 65.282501 66.195000 65.267502 66.117500 65.664490 102734400\n", "2019-11-14 65.937500 66.220001 65.525002 65.660004 65.210121 89182800\n", "2019-11-15 65.919998 66.445000 65.752502 66.440002 65.984779 100206400\n", "2019-11-18 66.449997 66.857498 66.057503 66.775002 66.317490 86703200\n", "2019-11-19 66.974998 67.000000 66.347504 66.572502 66.116371 76167200\n", "2019-11-20 66.385002 66.519997 65.099998 65.797501 65.346687 106234400\n", "2019-11-21 65.922501 66.002502 65.294998 65.502502 65.053703 121395200\n", "2019-11-22 65.647499 65.794998 65.209999 65.445000 64.996597 65325200\n", "2019-11-25 65.677498 66.610001 65.629997 66.592499 66.136230 84020400\n", "2019-11-26 66.735001 66.790001 65.625000 66.072502 65.619789 105207600\n", "2019-11-27 66.394997 66.995003 66.327499 66.959999 66.501213 65235600\n", "2019-11-29 66.650002 67.000000 66.474998 66.812500 66.354729 46617600" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "aapl_proper_index.loc[\"2019-11\"]" ] }, { "cell_type": "markdown", "id": "de65e36f", "metadata": {}, "source": [ "### 3.5 Getting data from dataframe (get a series)" ] }, { "cell_type": "code", "execution_count": 25, "id": "ff87318b", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Date\n", "2019-11-01 63.955002\n", "2019-11-04 64.375000\n", "2019-11-05 64.282501\n", "2019-11-06 64.309998\n", "2019-11-07 64.857498\n", "2019-11-08 65.035004\n", "2019-11-11 65.550003\n", "2019-11-12 65.489998\n", "2019-11-13 66.117500\n", "2019-11-14 65.660004\n", "2019-11-15 66.440002\n", "2019-11-18 66.775002\n", "2019-11-19 66.572502\n", "2019-11-20 65.797501\n", "2019-11-21 65.502502\n", "2019-11-22 65.445000\n", "2019-11-25 66.592499\n", "2019-11-26 66.072502\n", "2019-11-27 66.959999\n", "2019-11-29 66.812500\n", "Name: Close, dtype: float64" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "aapl_proper_index.loc[\"2019-11\"][\"Close\"]" ] }, { "cell_type": "markdown", "id": "1b48da3d", "metadata": {}, "source": [ "### 3.6 Getting data from dataframe (get multiple column from a dataframe)" ] }, { "cell_type": "code", "execution_count": 26, "id": "a39c7d23", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
OpenClose
Date
2019-11-0162.38499863.955002
2019-11-0464.33249764.375000
2019-11-0564.26249764.282501
2019-11-0664.19249764.309998
2019-11-0764.68499864.857498
2019-11-0864.67250165.035004
2019-11-1164.57499765.550003
2019-11-1265.38749765.489998
2019-11-1365.28250166.117500
2019-11-1465.93750065.660004
2019-11-1565.91999866.440002
2019-11-1866.44999766.775002
2019-11-1966.97499866.572502
2019-11-2066.38500265.797501
2019-11-2165.92250165.502502
2019-11-2265.64749965.445000
2019-11-2565.67749866.592499
2019-11-2666.73500166.072502
2019-11-2766.39499766.959999
2019-11-2966.65000266.812500
\n", "
" ], "text/plain": [ " Open Close\n", "Date \n", "2019-11-01 62.384998 63.955002\n", "2019-11-04 64.332497 64.375000\n", "2019-11-05 64.262497 64.282501\n", "2019-11-06 64.192497 64.309998\n", "2019-11-07 64.684998 64.857498\n", "2019-11-08 64.672501 65.035004\n", "2019-11-11 64.574997 65.550003\n", "2019-11-12 65.387497 65.489998\n", "2019-11-13 65.282501 66.117500\n", "2019-11-14 65.937500 65.660004\n", "2019-11-15 65.919998 66.440002\n", "2019-11-18 66.449997 66.775002\n", "2019-11-19 66.974998 66.572502\n", "2019-11-20 66.385002 65.797501\n", "2019-11-21 65.922501 65.502502\n", "2019-11-22 65.647499 65.445000\n", "2019-11-25 65.677498 66.592499\n", "2019-11-26 66.735001 66.072502\n", "2019-11-27 66.394997 66.959999\n", "2019-11-29 66.650002 66.812500" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "aapl_proper_index.loc[\"2019-11\"][[\"Open\",\"Close\"]]" ] }, { "cell_type": "markdown", "id": "7fe98d18", "metadata": {}, "source": [ "### 3.7 Getting data from dataframe (that's not a date/integer)" ] }, { "cell_type": "code", "execution_count": 27, "id": "a3e09fd0", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "sales 10000\n", "costs 300000\n", "Name: Central Branch, dtype: int64" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "branch_summary.loc[\"Central Branch\"]" ] }, { "cell_type": "code", "execution_count": 28, "id": "114ba40f", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Central Branch 10000\n", "TST Branch 2000\n", "Mongkok Branch 3000\n", "Name: sales, dtype: int64" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "branch_summary[\"sales\"]" ] }, { "cell_type": "markdown", "id": "15222ed8", "metadata": {}, "source": [ "### 3.8 Getting data from dataframe (using implicit index)" ] }, { "cell_type": "code", "execution_count": 29, "id": "72c1e9ba", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['Central Branch', 'TST Branch', 'Mongkok Branch'], dtype='object')" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "branch_summary.index" ] }, { "cell_type": "code", "execution_count": 30, "id": "e34b9495", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "DatetimeIndex(['2019-10-28', '2019-10-29', '2019-10-30', '2019-10-31',\n", " '2019-11-01', '2019-11-04', '2019-11-05', '2019-11-06',\n", " '2019-11-07', '2019-11-08',\n", " ...\n", " '2020-10-14', '2020-10-15', '2020-10-16', '2020-10-19',\n", " '2020-10-20', '2020-10-21', '2020-10-22', '2020-10-23',\n", " '2020-10-26', '2020-10-27'],\n", " dtype='datetime64[ns]', name='Date', length=253, freq=None)" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "aapl_proper_index.index" ] }, { "cell_type": "code", "execution_count": 31, "id": "a32d711d", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "RangeIndex(start=0, stop=253, step=1)" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "aapl.index" ] }, { "cell_type": "code", "execution_count": 32, "id": "0836e844", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
OpenHighLowCloseAdj CloseVolume
Date
2019-10-2861.85500062.31250061.68000062.26250161.65081096572800
2019-10-2962.24250062.43750060.64250260.82249860.224953142839600
2019-10-3061.18999961.32500160.30250260.81499960.217525124522000
2019-10-3161.81000162.29250059.31499962.18999961.579021139162000
2019-11-0162.38499863.98249862.29000163.95500263.326683151125200
2019-11-0464.33249764.46250263.84500164.37500063.742554103272000
2019-11-0564.26249764.54750164.08000264.28250163.65097079897600
2019-11-0664.19249764.37249863.84249964.30999863.67819275864400
2019-11-0764.68499865.08750264.52749664.85749864.41311694940400
2019-11-0864.67250165.11000164.21250265.03500464.58940969986400
\n", "
" ], "text/plain": [ " Open High Low Close Adj Close Volume\n", "Date \n", "2019-10-28 61.855000 62.312500 61.680000 62.262501 61.650810 96572800\n", "2019-10-29 62.242500 62.437500 60.642502 60.822498 60.224953 142839600\n", "2019-10-30 61.189999 61.325001 60.302502 60.814999 60.217525 124522000\n", "2019-10-31 61.810001 62.292500 59.314999 62.189999 61.579021 139162000\n", "2019-11-01 62.384998 63.982498 62.290001 63.955002 63.326683 151125200\n", "2019-11-04 64.332497 64.462502 63.845001 64.375000 63.742554 103272000\n", "2019-11-05 64.262497 64.547501 64.080002 64.282501 63.650970 79897600\n", "2019-11-06 64.192497 64.372498 63.842499 64.309998 63.678192 75864400\n", "2019-11-07 64.684998 65.087502 64.527496 64.857498 64.413116 94940400\n", "2019-11-08 64.672501 65.110001 64.212502 65.035004 64.589409 69986400" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "aapl_proper_index.iloc[0:10]" ] }, { "cell_type": "markdown", "id": "d7a78ecf", "metadata": {}, "source": [ "## 4. Filtering" ] }, { "cell_type": "markdown", "id": "7706e9dd", "metadata": {}, "source": [ "### 4.1 Single condition" ] }, { "cell_type": "code", "execution_count": 33, "id": "ba29f518", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Date\n", "2019-10-28 False\n", "2019-10-29 False\n", "2019-10-30 False\n", "2019-10-31 False\n", "2019-11-01 False\n", " ... \n", "2020-10-21 True\n", "2020-10-22 True\n", "2020-10-23 True\n", "2020-10-26 True\n", "2020-10-27 True\n", "Name: Open, Length: 253, dtype: bool" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "aapl_proper_index[\"Open\"] > 100" ] }, { "cell_type": "code", "execution_count": 34, "id": "22ac3ad1", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
OpenHighLowCloseAdj CloseVolume
Date
2020-07-31102.885002106.415001100.824997106.260002106.068756374336800
2020-08-03108.199997111.637497107.892502108.937500108.741440308151200
2020-08-04109.132500110.790001108.387497109.665001109.467628173071600
2020-08-05109.377502110.392502108.897499110.062500109.864410121992000
2020-08-06110.404999114.412498109.797501113.902496113.697502202428800
.....................
2020-10-21116.669998118.709999116.449997116.870003116.87000389946000
2020-10-22117.449997118.040001114.589996115.750000115.750000101988000
2020-10-23116.389999116.550003114.279999115.040001115.04000182572600
2020-10-26114.010002116.550003112.879997115.050003115.050003111850700
2020-10-27115.489998117.279999114.540001116.599998116.59999891927700
\n", "

62 rows × 6 columns

\n", "
" ], "text/plain": [ " Open High Low Close Adj Close \\\n", "Date \n", "2020-07-31 102.885002 106.415001 100.824997 106.260002 106.068756 \n", "2020-08-03 108.199997 111.637497 107.892502 108.937500 108.741440 \n", "2020-08-04 109.132500 110.790001 108.387497 109.665001 109.467628 \n", "2020-08-05 109.377502 110.392502 108.897499 110.062500 109.864410 \n", "2020-08-06 110.404999 114.412498 109.797501 113.902496 113.697502 \n", "... ... ... ... ... ... \n", "2020-10-21 116.669998 118.709999 116.449997 116.870003 116.870003 \n", "2020-10-22 117.449997 118.040001 114.589996 115.750000 115.750000 \n", "2020-10-23 116.389999 116.550003 114.279999 115.040001 115.040001 \n", "2020-10-26 114.010002 116.550003 112.879997 115.050003 115.050003 \n", "2020-10-27 115.489998 117.279999 114.540001 116.599998 116.599998 \n", "\n", " Volume \n", "Date \n", "2020-07-31 374336800 \n", "2020-08-03 308151200 \n", "2020-08-04 173071600 \n", "2020-08-05 121992000 \n", "2020-08-06 202428800 \n", "... ... \n", "2020-10-21 89946000 \n", "2020-10-22 101988000 \n", "2020-10-23 82572600 \n", "2020-10-26 111850700 \n", "2020-10-27 91927700 \n", "\n", "[62 rows x 6 columns]" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "aapl_proper_index[aapl_proper_index[\"Open\"] > 100]" ] }, { "cell_type": "markdown", "id": "d0ff9063", "metadata": {}, "source": [ "## 4.2 multiple condition" ] }, { "cell_type": "code", "execution_count": 35, "id": "1c5574e2", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Date\n", "2019-10-28 False\n", "2019-10-29 False\n", "2019-10-30 False\n", "2019-10-31 False\n", "2019-11-01 False\n", " ... \n", "2020-10-21 False\n", "2020-10-22 True\n", "2020-10-23 False\n", "2020-10-26 True\n", "2020-10-27 False\n", "Length: 253, dtype: bool" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(aapl_proper_index[\"Open\"] > 100) & (aapl_proper_index[\"Volume\"] > 100000000)" ] }, { "cell_type": "code", "execution_count": 36, "id": "6ab72ebf", "metadata": {}, "outputs": [], "source": [ "cond = (aapl_proper_index[\"Open\"] > 100) & (aapl_proper_index[\"Volume\"] > 100000000)" ] }, { "cell_type": "code", "execution_count": 37, "id": "cfe5d612", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
OpenHighLowCloseAdj CloseVolume
Date
2020-07-31102.885002106.415001100.824997106.260002106.068756374336800
2020-08-03108.199997111.637497107.892502108.937500108.741440308151200
2020-08-04109.132500110.790001108.387497109.665001109.467628173071600
2020-08-05109.377502110.392502108.897499110.062500109.864410121992000
2020-08-06110.404999114.412498109.797501113.902496113.697502202428800
2020-08-07113.205002113.675003110.292503111.112503111.112503198045600
2020-08-10112.599998113.775002110.000000112.727501112.727501212403600
2020-08-11111.970001112.482498109.107498109.375000109.375000187902400
2020-08-12110.497498113.275002110.297501113.010002113.010002165944800
2020-08-13114.430000116.042503113.927498115.010002115.010002210082000
2020-08-14114.830002115.000000113.044998114.907501114.907501165565200
2020-08-17116.062500116.087502113.962502114.607498114.607498119561600
2020-08-18114.352501116.000000114.007500115.562500115.562500105633600
2020-08-19115.982498117.162498115.610001115.707497115.707497145538000
2020-08-20115.750000118.392502115.732498118.275002118.275002126907200
2020-08-21119.262497124.867500119.250000124.370003124.370003338054800
2020-08-24128.697495128.785004123.937500125.857498125.857498345937600
2020-08-25124.697502125.180000123.052498124.824997124.824997211495600
2020-08-26126.180000126.992500125.082497126.522499126.522499163022400
2020-08-27127.142502127.485001123.832497125.010002125.010002155552400
2020-08-28126.012497126.442497124.577499124.807503124.807503187630000
2020-08-31127.580002131.000000126.000000129.039993129.039993225702700
2020-09-01132.759995134.800003130.529999134.179993134.179993152470100
2020-09-02137.589996137.979996127.000000131.399994131.399994200119000
2020-09-03126.910004128.839996120.500000120.879997120.879997257599600
2020-09-04120.070000123.699997110.889999120.959999120.959999332607200
2020-09-08113.949997118.989998112.680000112.820000112.820000231366600
2020-09-09117.260002119.139999115.260002117.320000117.320000176940500
2020-09-10120.360001120.500000112.500000113.489998113.489998182274400
2020-09-11114.570000115.230003110.000000112.000000112.000000180860300
2020-09-14114.720001115.930000112.800003115.360001115.360001140150100
2020-09-15118.330002118.830002113.610001115.540001115.540001184642000
2020-09-16115.230003116.000000112.040001112.129997112.129997154679000
2020-09-17109.720001112.199997108.709999110.339996110.339996178011000
2020-09-18110.400002110.879997106.089996106.839996106.839996287104900
2020-09-21104.540001110.190002103.099998110.080002110.080002195713800
2020-09-22112.680000112.860001109.160004111.809998111.809998183055400
2020-09-23111.620003112.110001106.769997107.120003107.120003150718700
2020-09-24105.169998110.250000105.000000108.220001108.220001167743300
2020-09-25108.430000112.440002107.669998112.279999112.279999149981400
2020-09-28115.010002115.320000112.779999114.959999114.959999137672400
2020-09-30113.790001117.260002113.620003115.809998115.809998142675200
2020-10-01117.639999117.720001115.830002116.790001116.790001116120400
2020-10-02112.889999115.370003112.220001113.019997113.019997144712000
2020-10-05113.910004116.650002113.550003116.500000116.500000106243800
2020-10-06115.699997116.120003112.250000113.160004113.160004161498200
2020-10-09115.279999117.000000114.919998116.970001116.970001100506900
2020-10-12120.059998125.180000119.279999124.400002124.400002240226800
2020-10-13125.269997125.389999119.650002121.099998121.099998262330500
2020-10-14121.000000123.029999119.620003121.190002121.190002151062300
2020-10-15118.720001121.199997118.150002120.709999120.709999112559200
2020-10-16121.279999121.550003118.809998119.019997119.019997115393800
2020-10-19119.959999120.419998115.660004115.980003115.980003120639300
2020-10-20116.199997118.980003115.629997117.510002117.510002124423700
2020-10-22117.449997118.040001114.589996115.750000115.750000101988000
2020-10-26114.010002116.550003112.879997115.050003115.050003111850700
\n", "
" ], "text/plain": [ " Open High Low Close Adj Close \\\n", "Date \n", "2020-07-31 102.885002 106.415001 100.824997 106.260002 106.068756 \n", "2020-08-03 108.199997 111.637497 107.892502 108.937500 108.741440 \n", "2020-08-04 109.132500 110.790001 108.387497 109.665001 109.467628 \n", "2020-08-05 109.377502 110.392502 108.897499 110.062500 109.864410 \n", "2020-08-06 110.404999 114.412498 109.797501 113.902496 113.697502 \n", "2020-08-07 113.205002 113.675003 110.292503 111.112503 111.112503 \n", "2020-08-10 112.599998 113.775002 110.000000 112.727501 112.727501 \n", "2020-08-11 111.970001 112.482498 109.107498 109.375000 109.375000 \n", "2020-08-12 110.497498 113.275002 110.297501 113.010002 113.010002 \n", "2020-08-13 114.430000 116.042503 113.927498 115.010002 115.010002 \n", "2020-08-14 114.830002 115.000000 113.044998 114.907501 114.907501 \n", "2020-08-17 116.062500 116.087502 113.962502 114.607498 114.607498 \n", "2020-08-18 114.352501 116.000000 114.007500 115.562500 115.562500 \n", "2020-08-19 115.982498 117.162498 115.610001 115.707497 115.707497 \n", "2020-08-20 115.750000 118.392502 115.732498 118.275002 118.275002 \n", "2020-08-21 119.262497 124.867500 119.250000 124.370003 124.370003 \n", "2020-08-24 128.697495 128.785004 123.937500 125.857498 125.857498 \n", "2020-08-25 124.697502 125.180000 123.052498 124.824997 124.824997 \n", "2020-08-26 126.180000 126.992500 125.082497 126.522499 126.522499 \n", "2020-08-27 127.142502 127.485001 123.832497 125.010002 125.010002 \n", "2020-08-28 126.012497 126.442497 124.577499 124.807503 124.807503 \n", "2020-08-31 127.580002 131.000000 126.000000 129.039993 129.039993 \n", "2020-09-01 132.759995 134.800003 130.529999 134.179993 134.179993 \n", "2020-09-02 137.589996 137.979996 127.000000 131.399994 131.399994 \n", "2020-09-03 126.910004 128.839996 120.500000 120.879997 120.879997 \n", "2020-09-04 120.070000 123.699997 110.889999 120.959999 120.959999 \n", "2020-09-08 113.949997 118.989998 112.680000 112.820000 112.820000 \n", "2020-09-09 117.260002 119.139999 115.260002 117.320000 117.320000 \n", "2020-09-10 120.360001 120.500000 112.500000 113.489998 113.489998 \n", "2020-09-11 114.570000 115.230003 110.000000 112.000000 112.000000 \n", "2020-09-14 114.720001 115.930000 112.800003 115.360001 115.360001 \n", "2020-09-15 118.330002 118.830002 113.610001 115.540001 115.540001 \n", "2020-09-16 115.230003 116.000000 112.040001 112.129997 112.129997 \n", "2020-09-17 109.720001 112.199997 108.709999 110.339996 110.339996 \n", "2020-09-18 110.400002 110.879997 106.089996 106.839996 106.839996 \n", "2020-09-21 104.540001 110.190002 103.099998 110.080002 110.080002 \n", "2020-09-22 112.680000 112.860001 109.160004 111.809998 111.809998 \n", "2020-09-23 111.620003 112.110001 106.769997 107.120003 107.120003 \n", "2020-09-24 105.169998 110.250000 105.000000 108.220001 108.220001 \n", "2020-09-25 108.430000 112.440002 107.669998 112.279999 112.279999 \n", "2020-09-28 115.010002 115.320000 112.779999 114.959999 114.959999 \n", "2020-09-30 113.790001 117.260002 113.620003 115.809998 115.809998 \n", "2020-10-01 117.639999 117.720001 115.830002 116.790001 116.790001 \n", "2020-10-02 112.889999 115.370003 112.220001 113.019997 113.019997 \n", "2020-10-05 113.910004 116.650002 113.550003 116.500000 116.500000 \n", "2020-10-06 115.699997 116.120003 112.250000 113.160004 113.160004 \n", "2020-10-09 115.279999 117.000000 114.919998 116.970001 116.970001 \n", "2020-10-12 120.059998 125.180000 119.279999 124.400002 124.400002 \n", "2020-10-13 125.269997 125.389999 119.650002 121.099998 121.099998 \n", "2020-10-14 121.000000 123.029999 119.620003 121.190002 121.190002 \n", "2020-10-15 118.720001 121.199997 118.150002 120.709999 120.709999 \n", "2020-10-16 121.279999 121.550003 118.809998 119.019997 119.019997 \n", "2020-10-19 119.959999 120.419998 115.660004 115.980003 115.980003 \n", "2020-10-20 116.199997 118.980003 115.629997 117.510002 117.510002 \n", "2020-10-22 117.449997 118.040001 114.589996 115.750000 115.750000 \n", "2020-10-26 114.010002 116.550003 112.879997 115.050003 115.050003 \n", "\n", " Volume \n", "Date \n", "2020-07-31 374336800 \n", "2020-08-03 308151200 \n", "2020-08-04 173071600 \n", "2020-08-05 121992000 \n", "2020-08-06 202428800 \n", "2020-08-07 198045600 \n", "2020-08-10 212403600 \n", "2020-08-11 187902400 \n", "2020-08-12 165944800 \n", "2020-08-13 210082000 \n", "2020-08-14 165565200 \n", "2020-08-17 119561600 \n", "2020-08-18 105633600 \n", "2020-08-19 145538000 \n", "2020-08-20 126907200 \n", "2020-08-21 338054800 \n", "2020-08-24 345937600 \n", "2020-08-25 211495600 \n", "2020-08-26 163022400 \n", "2020-08-27 155552400 \n", "2020-08-28 187630000 \n", "2020-08-31 225702700 \n", "2020-09-01 152470100 \n", "2020-09-02 200119000 \n", "2020-09-03 257599600 \n", "2020-09-04 332607200 \n", "2020-09-08 231366600 \n", "2020-09-09 176940500 \n", "2020-09-10 182274400 \n", "2020-09-11 180860300 \n", "2020-09-14 140150100 \n", "2020-09-15 184642000 \n", "2020-09-16 154679000 \n", "2020-09-17 178011000 \n", "2020-09-18 287104900 \n", "2020-09-21 195713800 \n", "2020-09-22 183055400 \n", "2020-09-23 150718700 \n", "2020-09-24 167743300 \n", "2020-09-25 149981400 \n", "2020-09-28 137672400 \n", "2020-09-30 142675200 \n", "2020-10-01 116120400 \n", "2020-10-02 144712000 \n", "2020-10-05 106243800 \n", "2020-10-06 161498200 \n", "2020-10-09 100506900 \n", "2020-10-12 240226800 \n", "2020-10-13 262330500 \n", "2020-10-14 151062300 \n", "2020-10-15 112559200 \n", "2020-10-16 115393800 \n", "2020-10-19 120639300 \n", "2020-10-20 124423700 \n", "2020-10-22 101988000 \n", "2020-10-26 111850700 " ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "aapl_proper_index[cond]" ] }, { "cell_type": "markdown", "id": "649b6f86", "metadata": {}, "source": [ "#### Side notes" ] }, { "cell_type": "markdown", "id": "599252b7", "metadata": {}, "source": [ "Showing the top " ] }, { "cell_type": "code", "execution_count": 38, "id": "3b0d0293", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
OpenHighLowCloseAdj CloseVolume
Date
2020-07-31102.885002106.415001100.824997106.260002106.068756374336800
2020-08-03108.199997111.637497107.892502108.937500108.741440308151200
2020-08-04109.132500110.790001108.387497109.665001109.467628173071600
2020-08-05109.377502110.392502108.897499110.062500109.864410121992000
2020-08-06110.404999114.412498109.797501113.902496113.697502202428800
2020-08-07113.205002113.675003110.292503111.112503111.112503198045600
2020-08-10112.599998113.775002110.000000112.727501112.727501212403600
2020-08-11111.970001112.482498109.107498109.375000109.375000187902400
2020-08-12110.497498113.275002110.297501113.010002113.010002165944800
2020-08-13114.430000116.042503113.927498115.010002115.010002210082000
2020-08-14114.830002115.000000113.044998114.907501114.907501165565200
2020-08-17116.062500116.087502113.962502114.607498114.607498119561600
2020-08-18114.352501116.000000114.007500115.562500115.562500105633600
2020-08-19115.982498117.162498115.610001115.707497115.707497145538000
2020-08-20115.750000118.392502115.732498118.275002118.275002126907200
2020-08-21119.262497124.867500119.250000124.370003124.370003338054800
2020-08-24128.697495128.785004123.937500125.857498125.857498345937600
2020-08-25124.697502125.180000123.052498124.824997124.824997211495600
2020-08-26126.180000126.992500125.082497126.522499126.522499163022400
2020-08-27127.142502127.485001123.832497125.010002125.010002155552400
\n", "
" ], "text/plain": [ " Open High Low Close Adj Close \\\n", "Date \n", "2020-07-31 102.885002 106.415001 100.824997 106.260002 106.068756 \n", "2020-08-03 108.199997 111.637497 107.892502 108.937500 108.741440 \n", "2020-08-04 109.132500 110.790001 108.387497 109.665001 109.467628 \n", "2020-08-05 109.377502 110.392502 108.897499 110.062500 109.864410 \n", "2020-08-06 110.404999 114.412498 109.797501 113.902496 113.697502 \n", "2020-08-07 113.205002 113.675003 110.292503 111.112503 111.112503 \n", "2020-08-10 112.599998 113.775002 110.000000 112.727501 112.727501 \n", "2020-08-11 111.970001 112.482498 109.107498 109.375000 109.375000 \n", "2020-08-12 110.497498 113.275002 110.297501 113.010002 113.010002 \n", "2020-08-13 114.430000 116.042503 113.927498 115.010002 115.010002 \n", "2020-08-14 114.830002 115.000000 113.044998 114.907501 114.907501 \n", "2020-08-17 116.062500 116.087502 113.962502 114.607498 114.607498 \n", "2020-08-18 114.352501 116.000000 114.007500 115.562500 115.562500 \n", "2020-08-19 115.982498 117.162498 115.610001 115.707497 115.707497 \n", "2020-08-20 115.750000 118.392502 115.732498 118.275002 118.275002 \n", "2020-08-21 119.262497 124.867500 119.250000 124.370003 124.370003 \n", "2020-08-24 128.697495 128.785004 123.937500 125.857498 125.857498 \n", "2020-08-25 124.697502 125.180000 123.052498 124.824997 124.824997 \n", "2020-08-26 126.180000 126.992500 125.082497 126.522499 126.522499 \n", "2020-08-27 127.142502 127.485001 123.832497 125.010002 125.010002 \n", "\n", " Volume \n", "Date \n", "2020-07-31 374336800 \n", "2020-08-03 308151200 \n", "2020-08-04 173071600 \n", "2020-08-05 121992000 \n", "2020-08-06 202428800 \n", "2020-08-07 198045600 \n", "2020-08-10 212403600 \n", "2020-08-11 187902400 \n", "2020-08-12 165944800 \n", "2020-08-13 210082000 \n", "2020-08-14 165565200 \n", "2020-08-17 119561600 \n", "2020-08-18 105633600 \n", "2020-08-19 145538000 \n", "2020-08-20 126907200 \n", "2020-08-21 338054800 \n", "2020-08-24 345937600 \n", "2020-08-25 211495600 \n", "2020-08-26 163022400 \n", "2020-08-27 155552400 " ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "aapl_proper_index[cond].head(20)" ] }, { "cell_type": "code", "execution_count": 39, "id": "d4bd5fc0", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
OpenHighLowCloseAdj CloseVolume
Date
2020-10-16121.279999121.550003118.809998119.019997119.019997115393800
2020-10-19119.959999120.419998115.660004115.980003115.980003120639300
2020-10-20116.199997118.980003115.629997117.510002117.510002124423700
2020-10-22117.449997118.040001114.589996115.750000115.750000101988000
2020-10-26114.010002116.550003112.879997115.050003115.050003111850700
\n", "
" ], "text/plain": [ " Open High Low Close Adj Close \\\n", "Date \n", "2020-10-16 121.279999 121.550003 118.809998 119.019997 119.019997 \n", "2020-10-19 119.959999 120.419998 115.660004 115.980003 115.980003 \n", "2020-10-20 116.199997 118.980003 115.629997 117.510002 117.510002 \n", "2020-10-22 117.449997 118.040001 114.589996 115.750000 115.750000 \n", "2020-10-26 114.010002 116.550003 112.879997 115.050003 115.050003 \n", "\n", " Volume \n", "Date \n", "2020-10-16 115393800 \n", "2020-10-19 120639300 \n", "2020-10-20 124423700 \n", "2020-10-22 101988000 \n", "2020-10-26 111850700 " ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "aapl_proper_index[cond].tail(5)" ] }, { "cell_type": "markdown", "id": "e7fcccd2", "metadata": {}, "source": [ "## 4.3 query" ] }, { "cell_type": "code", "execution_count": 40, "id": "f866b899", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
OpenHighLowCloseAdj CloseVolume
Date
2020-07-31102.885002106.415001100.824997106.260002106.068756374336800
2020-08-03108.199997111.637497107.892502108.937500108.741440308151200
2020-08-04109.132500110.790001108.387497109.665001109.467628173071600
2020-08-05109.377502110.392502108.897499110.062500109.864410121992000
2020-08-06110.404999114.412498109.797501113.902496113.697502202428800
2020-08-07113.205002113.675003110.292503111.112503111.112503198045600
2020-08-10112.599998113.775002110.000000112.727501112.727501212403600
2020-08-11111.970001112.482498109.107498109.375000109.375000187902400
2020-08-12110.497498113.275002110.297501113.010002113.010002165944800
2020-08-13114.430000116.042503113.927498115.010002115.010002210082000
\n", "
" ], "text/plain": [ " Open High Low Close Adj Close \\\n", "Date \n", "2020-07-31 102.885002 106.415001 100.824997 106.260002 106.068756 \n", "2020-08-03 108.199997 111.637497 107.892502 108.937500 108.741440 \n", "2020-08-04 109.132500 110.790001 108.387497 109.665001 109.467628 \n", "2020-08-05 109.377502 110.392502 108.897499 110.062500 109.864410 \n", "2020-08-06 110.404999 114.412498 109.797501 113.902496 113.697502 \n", "2020-08-07 113.205002 113.675003 110.292503 111.112503 111.112503 \n", "2020-08-10 112.599998 113.775002 110.000000 112.727501 112.727501 \n", "2020-08-11 111.970001 112.482498 109.107498 109.375000 109.375000 \n", "2020-08-12 110.497498 113.275002 110.297501 113.010002 113.010002 \n", "2020-08-13 114.430000 116.042503 113.927498 115.010002 115.010002 \n", "\n", " Volume \n", "Date \n", "2020-07-31 374336800 \n", "2020-08-03 308151200 \n", "2020-08-04 173071600 \n", "2020-08-05 121992000 \n", "2020-08-06 202428800 \n", "2020-08-07 198045600 \n", "2020-08-10 212403600 \n", "2020-08-11 187902400 \n", "2020-08-12 165944800 \n", "2020-08-13 210082000 " ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "aapl_proper_index[cond].query(\"Open > 100 and Volume > 110000000\").head(10)" ] }, { "cell_type": "markdown", "id": "932bccc7", "metadata": {}, "source": [ "# 5. New columns" ] }, { "cell_type": "markdown", "id": "023d022d", "metadata": {}, "source": [ "## 5.1 Density Example" ] }, { "cell_type": "code", "execution_count": 41, "id": "83ebf03c", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
populationarea
California38332521423967
Texas26448193695662
New York19651127141297
Florida19552860170312
Illinois12882135149995
\n", "
" ], "text/plain": [ " population area\n", "California 38332521 423967\n", "Texas 26448193 695662\n", "New York 19651127 141297\n", "Florida 19552860 170312\n", "Illinois 12882135 149995" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\n", "population = pd.Series({'California': 38332521,\n", " 'Texas': 26448193,\n", " 'New York': 19651127,\n", " 'Florida': 19552860,\n", " 'Illinois': 12882135}\n", ")\n", "\n", "area = pd.Series({'California': 423967, \n", " 'Texas': 695662, \n", " 'New York': 141297,\n", " 'Florida': 170312, \n", " 'Illinois': 149995})\n", "\n", "states = pd.DataFrame( {'population': population,'area': area} )\n", "states" ] }, { "cell_type": "code", "execution_count": 42, "id": "278f4672", "metadata": {}, "outputs": [], "source": [ "states[\"density\"] = states[\"population\"] / states[\"area\"]" ] }, { "cell_type": "code", "execution_count": 43, "id": "005b3ac5", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
populationareadensity
California3833252142396790.413926
Texas2644819369566238.018740
New York19651127141297139.076746
Florida19552860170312114.806121
Illinois1288213514999585.883763
\n", "
" ], "text/plain": [ " population area density\n", "California 38332521 423967 90.413926\n", "Texas 26448193 695662 38.018740\n", "New York 19651127 141297 139.076746\n", "Florida 19552860 170312 114.806121\n", "Illinois 12882135 149995 85.883763" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "states" ] }, { "cell_type": "markdown", "id": "8526db19", "metadata": {}, "source": [ "## 5.2 Stocks example" ] }, { "cell_type": "code", "execution_count": 44, "id": "59008d49", "metadata": {}, "outputs": [], "source": [ "aapl_proper_index[\"Percent Changes\"] = aapl_proper_index[\"Close\"].pct_change()" ] }, { "cell_type": "code", "execution_count": 45, "id": "3115a2c6", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
OpenHighLowCloseAdj CloseVolumePercent Changes
Date
2019-10-2861.85500062.31250061.68000062.26250161.65081096572800NaN
2019-10-2962.24250062.43750060.64250260.82249860.224953142839600-0.023128
2019-10-3061.18999961.32500160.30250260.81499960.217525124522000-0.000123
2019-10-3161.81000162.29250059.31499962.18999961.5790211391620000.022610
2019-11-0162.38499863.98249862.29000163.95500263.3266831511252000.028381
........................
2020-10-21116.669998118.709999116.449997116.870003116.87000389946000-0.005446
2020-10-22117.449997118.040001114.589996115.750000115.750000101988000-0.009583
2020-10-23116.389999116.550003114.279999115.040001115.04000182572600-0.006134
2020-10-26114.010002116.550003112.879997115.050003115.0500031118507000.000087
2020-10-27115.489998117.279999114.540001116.599998116.599998919277000.013472
\n", "

253 rows × 7 columns

\n", "
" ], "text/plain": [ " Open High Low Close Adj Close \\\n", "Date \n", "2019-10-28 61.855000 62.312500 61.680000 62.262501 61.650810 \n", "2019-10-29 62.242500 62.437500 60.642502 60.822498 60.224953 \n", "2019-10-30 61.189999 61.325001 60.302502 60.814999 60.217525 \n", "2019-10-31 61.810001 62.292500 59.314999 62.189999 61.579021 \n", "2019-11-01 62.384998 63.982498 62.290001 63.955002 63.326683 \n", "... ... ... ... ... ... \n", "2020-10-21 116.669998 118.709999 116.449997 116.870003 116.870003 \n", "2020-10-22 117.449997 118.040001 114.589996 115.750000 115.750000 \n", "2020-10-23 116.389999 116.550003 114.279999 115.040001 115.040001 \n", "2020-10-26 114.010002 116.550003 112.879997 115.050003 115.050003 \n", "2020-10-27 115.489998 117.279999 114.540001 116.599998 116.599998 \n", "\n", " Volume Percent Changes \n", "Date \n", "2019-10-28 96572800 NaN \n", "2019-10-29 142839600 -0.023128 \n", "2019-10-30 124522000 -0.000123 \n", "2019-10-31 139162000 0.022610 \n", "2019-11-01 151125200 0.028381 \n", "... ... ... \n", "2020-10-21 89946000 -0.005446 \n", "2020-10-22 101988000 -0.009583 \n", "2020-10-23 82572600 -0.006134 \n", "2020-10-26 111850700 0.000087 \n", "2020-10-27 91927700 0.013472 \n", "\n", "[253 rows x 7 columns]" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "aapl_proper_index" ] }, { "cell_type": "markdown", "id": "3279df01", "metadata": {}, "source": [ "# 6. Aggregation" ] }, { "cell_type": "markdown", "id": "9dec0caa", "metadata": {}, "source": [ "## 6.1 Basic operations" ] }, { "cell_type": "code", "execution_count": 55, "id": "876e6b85", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.0028956964634767705" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "aapl_proper_index[\"Percent Changes\"].mean()" ] }, { "cell_type": "code", "execution_count": 51, "id": "27c153e5", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.11980826040056836" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "aapl_proper_index[\"Percent Changes\"].max()" ] }, { "cell_type": "code", "execution_count": 52, "id": "23e031ff", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "-0.12864694751232164" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "aapl_proper_index[\"Percent Changes\"].min()" ] }, { "cell_type": "code", "execution_count": 56, "id": "92849318", "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "0.0024045071214521263" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "aapl_proper_index[\"Percent Changes\"].median()" ] }, { "cell_type": "code", "execution_count": 53, "id": "e6f50326", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.01985402460478214" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "aapl_proper_index[aapl_proper_index[\"Percent Changes\"] > 0][\"Percent Changes\"].mean()" ] }, { "cell_type": "code", "execution_count": 57, "id": "a3d0e96b", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "-0.01881547236798305" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "aapl_proper_index[aapl_proper_index[\"Percent Changes\"] < 0][\"Percent Changes\"].mean()" ] }, { "cell_type": "markdown", "id": "fd7781d1", "metadata": {}, "source": [ "## 6.2 Grouping" ] }, { "cell_type": "markdown", "id": "64ec7f60", "metadata": {}, "source": [ "We use planets discovery data as an example for grouping" ] }, { "cell_type": "code", "execution_count": 96, "id": "26b8b374", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(1035, 6)" ] }, "execution_count": 96, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import seaborn as sns\n", "planets = sns.load_dataset('planets')\n", "planets.shape" ] }, { "cell_type": "code", "execution_count": 97, "id": "3c8426a5", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
methodnumberorbital_periodmassdistanceyear
0Radial Velocity1269.3000007.1077.402006
1Radial Velocity1874.7740002.2156.952008
2Radial Velocity1763.0000002.6019.842011
3Radial Velocity1326.03000019.40110.622007
4Radial Velocity1516.22000010.50119.472009
.....................
1030Transit13.941507NaN172.002006
1031Transit12.615864NaN148.002007
1032Transit13.191524NaN174.002007
1033Transit14.125083NaN293.002008
1034Transit14.187757NaN260.002008
\n", "

1035 rows × 6 columns

\n", "
" ], "text/plain": [ " method number orbital_period mass distance year\n", "0 Radial Velocity 1 269.300000 7.10 77.40 2006\n", "1 Radial Velocity 1 874.774000 2.21 56.95 2008\n", "2 Radial Velocity 1 763.000000 2.60 19.84 2011\n", "3 Radial Velocity 1 326.030000 19.40 110.62 2007\n", "4 Radial Velocity 1 516.220000 10.50 119.47 2009\n", "... ... ... ... ... ... ...\n", "1030 Transit 1 3.941507 NaN 172.00 2006\n", "1031 Transit 1 2.615864 NaN 148.00 2007\n", "1032 Transit 1 3.191524 NaN 174.00 2007\n", "1033 Transit 1 4.125083 NaN 293.00 2008\n", "1034 Transit 1 4.187757 NaN 260.00 2008\n", "\n", "[1035 rows x 6 columns]" ] }, "execution_count": 97, "metadata": {}, "output_type": "execute_result" } ], "source": [ "planets" ] }, { "cell_type": "code", "execution_count": 98, "id": "9872086c", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "method\n", "Astrometry 631.180000\n", "Eclipse Timing Variations 4343.500000\n", "Imaging 27500.000000\n", "Microlensing 3300.000000\n", "Orbital Brightness Modulation 0.342887\n", "Pulsar Timing 66.541900\n", "Pulsation Timing Variations 1170.000000\n", "Radial Velocity 360.200000\n", "Transit 5.714932\n", "Transit Timing Variations 57.011000\n", "Name: orbital_period, dtype: float64" ] }, "execution_count": 98, "metadata": {}, "output_type": "execute_result" } ], "source": [ "planets.groupby('method')['orbital_period'].median()" ] }, { "cell_type": "code", "execution_count": 101, "id": "8399a771", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
method
Astrometry2.0631.180000544.217663246.360000438.770000631.180000823.5900001016.000000
Eclipse Timing Variations9.04751.6444442499.1309451916.2500002900.0000004343.5000005767.00000010220.000000
Imaging12.0118247.737500213978.1772774639.1500008343.90000027500.00000094250.000000730000.000000
Microlensing7.03153.5714291113.1663331825.0000002375.0000003300.0000003550.0000005100.000000
Orbital Brightness Modulation3.00.7093070.7254930.2401040.2914960.3428870.9439081.544929
Pulsar Timing5.07343.02120116313.2655730.09070625.26200066.54190098.21140036525.000000
Pulsation Timing Variations1.01170.000000NaN1170.0000001170.0000001170.0000001170.0000001170.000000
Radial Velocity553.0823.3546801454.9262100.73654038.021000360.200000982.00000017337.500000
Transit397.021.10207346.1858930.3550003.1606305.71493216.145700331.600590
Transit Timing Variations3.079.78350071.59988422.33950039.67525057.011000108.505500160.000000
\n", "
" ], "text/plain": [ " count mean std \\\n", "method \n", "Astrometry 2.0 631.180000 544.217663 \n", "Eclipse Timing Variations 9.0 4751.644444 2499.130945 \n", "Imaging 12.0 118247.737500 213978.177277 \n", "Microlensing 7.0 3153.571429 1113.166333 \n", "Orbital Brightness Modulation 3.0 0.709307 0.725493 \n", "Pulsar Timing 5.0 7343.021201 16313.265573 \n", "Pulsation Timing Variations 1.0 1170.000000 NaN \n", "Radial Velocity 553.0 823.354680 1454.926210 \n", "Transit 397.0 21.102073 46.185893 \n", "Transit Timing Variations 3.0 79.783500 71.599884 \n", "\n", " min 25% 50% \\\n", "method \n", "Astrometry 246.360000 438.770000 631.180000 \n", "Eclipse Timing Variations 1916.250000 2900.000000 4343.500000 \n", "Imaging 4639.150000 8343.900000 27500.000000 \n", "Microlensing 1825.000000 2375.000000 3300.000000 \n", "Orbital Brightness Modulation 0.240104 0.291496 0.342887 \n", "Pulsar Timing 0.090706 25.262000 66.541900 \n", "Pulsation Timing Variations 1170.000000 1170.000000 1170.000000 \n", "Radial Velocity 0.736540 38.021000 360.200000 \n", "Transit 0.355000 3.160630 5.714932 \n", "Transit Timing Variations 22.339500 39.675250 57.011000 \n", "\n", " 75% max \n", "method \n", "Astrometry 823.590000 1016.000000 \n", "Eclipse Timing Variations 5767.000000 10220.000000 \n", "Imaging 94250.000000 730000.000000 \n", "Microlensing 3550.000000 5100.000000 \n", "Orbital Brightness Modulation 0.943908 1.544929 \n", "Pulsar Timing 98.211400 36525.000000 \n", "Pulsation Timing Variations 1170.000000 1170.000000 \n", "Radial Velocity 982.000000 17337.500000 \n", "Transit 16.145700 331.600590 \n", "Transit Timing Variations 108.505500 160.000000 " ] }, "execution_count": 101, "metadata": {}, "output_type": "execute_result" } ], "source": [ "planets.groupby('method')['orbital_period'].describe()" ] }, { "cell_type": "code", "execution_count": 100, "id": "4cd98382", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "method\n", "Astrometry 2\n", "Eclipse Timing Variations 9\n", "Imaging 38\n", "Microlensing 23\n", "Orbital Brightness Modulation 3\n", "Pulsar Timing 5\n", "Pulsation Timing Variations 1\n", "Radial Velocity 553\n", "Transit 397\n", "Transit Timing Variations 4\n", "Name: number, dtype: int64" ] }, "execution_count": 100, "metadata": {}, "output_type": "execute_result" } ], "source": [ "planets.groupby('method')[\"number\"].count()" ] }, { "cell_type": "markdown", "id": "1430d995", "metadata": {}, "source": [ "# 7. Joining Data" ] }, { "cell_type": "markdown", "id": "c3abca64", "metadata": {}, "source": [ "## 7.1 Merge (or join)" ] }, { "cell_type": "code", "execution_count": 106, "id": "de4cab14", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
employeegroup
0BobAccounting
1JakeEngineering
2LisaEngineering
3SueHR
\n", "
" ], "text/plain": [ " employee group\n", "0 Bob Accounting\n", "1 Jake Engineering\n", "2 Lisa Engineering\n", "3 Sue HR" ] }, "execution_count": 106, "metadata": {}, "output_type": "execute_result" } ], "source": [ "department = pd.DataFrame({'employee': ['Bob', 'Jake', 'Lisa', 'Sue'],\n", " 'group': ['Accounting', 'Engineering', 'Engineering', 'HR']})\n", "\n", "department" ] }, { "cell_type": "code", "execution_count": 107, "id": "6301c1f2", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
employeehire_date
0Lisa2004
1Bob2008
2Jake2012
3Sue2014
\n", "
" ], "text/plain": [ " employee hire_date\n", "0 Lisa 2004\n", "1 Bob 2008\n", "2 Jake 2012\n", "3 Sue 2014" ] }, "execution_count": 107, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hire_date = pd.DataFrame({'employee': ['Lisa', 'Bob', 'Jake', 'Sue'],\n", " 'hire_date': [2004, 2008, 2012, 2014]})\n", "\n", "hire_date" ] }, { "cell_type": "code", "execution_count": 111, "id": "7a4f923b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
employeegrouphire_date
0BobAccounting2008
1JakeEngineering2012
2LisaEngineering2004
3SueHR2014
\n", "
" ], "text/plain": [ " employee group hire_date\n", "0 Bob Accounting 2008\n", "1 Jake Engineering 2012\n", "2 Lisa Engineering 2004\n", "3 Sue HR 2014" ] }, "execution_count": 111, "metadata": {}, "output_type": "execute_result" } ], "source": [ "employee = pd.merge(department, hire_date)\n", "employee" ] }, { "cell_type": "code", "execution_count": 112, "id": "0bab79da", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
employeegrouphire_date
0BobAccounting2008
1JakeEngineering2012
2LisaEngineering2004
3SueHR2014
\n", "
" ], "text/plain": [ " employee group hire_date\n", "0 Bob Accounting 2008\n", "1 Jake Engineering 2012\n", "2 Lisa Engineering 2004\n", "3 Sue HR 2014" ] }, "execution_count": 112, "metadata": {}, "output_type": "execute_result" } ], "source": [ "employee = pd.merge(department, hire_date, on=\"employee\")\n", "employee" ] }, { "cell_type": "code", "execution_count": 113, "id": "29d1d5e0", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namesalary
0Bob70000
1Jake80000
2Lisa120000
3Sue90000
\n", "
" ], "text/plain": [ " name salary\n", "0 Bob 70000\n", "1 Jake 80000\n", "2 Lisa 120000\n", "3 Sue 90000" ] }, "execution_count": 113, "metadata": {}, "output_type": "execute_result" } ], "source": [ "salary = pd.DataFrame({'name': ['Bob', 'Jake', 'Lisa', 'Sue'],\n", " 'salary': [70000, 80000, 120000, 90000]})\n", "salary" ] }, { "cell_type": "code", "execution_count": 118, "id": "200d57d0", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
employeegroupnamesalary
0BobAccountingBob70000
1JakeEngineeringJake80000
2LisaEngineeringLisa120000
3SueHRSue90000
\n", "
" ], "text/plain": [ " employee group name salary\n", "0 Bob Accounting Bob 70000\n", "1 Jake Engineering Jake 80000\n", "2 Lisa Engineering Lisa 120000\n", "3 Sue HR Sue 90000" ] }, "execution_count": 118, "metadata": {}, "output_type": "execute_result" } ], "source": [ "employee = pd.merge(department, salary, left_on=\"employee\", right_on=\"name\")\n", "employee" ] }, { "cell_type": "code", "execution_count": 119, "id": "cd98a441", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
employeegroupsalary
0BobAccounting70000
1JakeEngineering80000
2LisaEngineering120000
3SueHR90000
\n", "
" ], "text/plain": [ " employee group salary\n", "0 Bob Accounting 70000\n", "1 Jake Engineering 80000\n", "2 Lisa Engineering 120000\n", "3 Sue HR 90000" ] }, "execution_count": 119, "metadata": {}, "output_type": "execute_result" } ], "source": [ "employee = employee.drop('name',axis=1)\n", "employee" ] }, { "cell_type": "markdown", "id": "3762163c", "metadata": {}, "source": [ "## 7.2 one to many merging" ] }, { "cell_type": "code", "execution_count": 121, "id": "b1aa1315", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
groupsupervisor
0AccountingCarly
1EngineeringGuido
2HRSteve
\n", "
" ], "text/plain": [ " group supervisor\n", "0 Accounting Carly\n", "1 Engineering Guido\n", "2 HR Steve" ] }, "execution_count": 121, "metadata": {}, "output_type": "execute_result" } ], "source": [ "supervisor = pd.DataFrame({'group': ['Accounting', 'Engineering', 'HR'],\n", " 'supervisor': ['Carly', 'Guido', 'Steve']})\n", "supervisor" ] }, { "cell_type": "code", "execution_count": 126, "id": "b01c9052", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
employeegroupsalarysupervisor
0BobAccounting70000Carly
1JakeEngineering80000Guido
2LisaEngineering120000Guido
3SueHR90000Steve
\n", "
" ], "text/plain": [ " employee group salary supervisor\n", "0 Bob Accounting 70000 Carly\n", "1 Jake Engineering 80000 Guido\n", "2 Lisa Engineering 120000 Guido\n", "3 Sue HR 90000 Steve" ] }, "execution_count": 126, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.merge(employee,supervisor)" ] }, { "cell_type": "markdown", "id": "211fc106", "metadata": {}, "source": [ "# 7.3 Many to Many merging" ] }, { "cell_type": "code", "execution_count": 128, "id": "87fb2052", "metadata": {}, "outputs": [], "source": [ "skills = pd.DataFrame({'group': ['Accounting', 'Accounting','Engineering', \n", " 'Engineering', 'HR', 'HR'],\n", " 'skills': ['math', 'spreadsheets', 'coding', \n", " 'linux','spreadsheets', 'organization']})" ] }, { "cell_type": "code", "execution_count": 129, "id": "e1d7ebc4", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
employeegroupsalaryskills
0BobAccounting70000math
1BobAccounting70000spreadsheets
2JakeEngineering80000coding
3JakeEngineering80000linux
4LisaEngineering120000coding
5LisaEngineering120000linux
6SueHR90000spreadsheets
7SueHR90000organization
\n", "
" ], "text/plain": [ " employee group salary skills\n", "0 Bob Accounting 70000 math\n", "1 Bob Accounting 70000 spreadsheets\n", "2 Jake Engineering 80000 coding\n", "3 Jake Engineering 80000 linux\n", "4 Lisa Engineering 120000 coding\n", "5 Lisa Engineering 120000 linux\n", "6 Sue HR 90000 spreadsheets\n", "7 Sue HR 90000 organization" ] }, "execution_count": 129, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.merge(employee,skills)" ] }, { "cell_type": "markdown", "id": "ec89a9e8", "metadata": {}, "source": [ "It's a very strange set of data. Make sure you know how to use it for many-to-many merging" ] }, { "cell_type": "markdown", "id": "1d976a18", "metadata": {}, "source": [ "## 7.4 Inner Join / Outer Join / Left Join / Right Join" ] }, { "cell_type": "code", "execution_count": 132, "id": "1c7edc09", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namefood
0Peterfish
1Paulbeans
2Marybread
\n", "
" ], "text/plain": [ " name food\n", "0 Peter fish\n", "1 Paul beans\n", "2 Mary bread" ] }, "execution_count": 132, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fav_food = pd.DataFrame({'name': ['Peter', 'Paul', 'Mary'],\n", " 'food': ['fish', 'beans', 'bread']},\n", " columns=['name', 'food'])\n", "fav_food" ] }, { "cell_type": "code", "execution_count": 134, "id": "74aa124b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namedrink
0Marywine
1Josephbeer
\n", "
" ], "text/plain": [ " name drink\n", "0 Mary wine\n", "1 Joseph beer" ] }, "execution_count": 134, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fav_drink = pd.DataFrame({'name': ['Mary', 'Joseph'],\n", " 'drink': ['wine', 'beer']},\n", " columns=['name', 'drink'])\n", "fav_drink" ] }, { "cell_type": "code", "execution_count": 135, "id": "48886a2f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namefooddrink
0Marybreadwine
\n", "
" ], "text/plain": [ " name food drink\n", "0 Mary bread wine" ] }, "execution_count": 135, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.merge(fav_food, fav_drink)" ] }, { "cell_type": "code", "execution_count": 136, "id": "d5e4c0e9", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namefooddrink
0PeterfishNaN
1PaulbeansNaN
2Marybreadwine
3JosephNaNbeer
\n", "
" ], "text/plain": [ " name food drink\n", "0 Peter fish NaN\n", "1 Paul beans NaN\n", "2 Mary bread wine\n", "3 Joseph NaN beer" ] }, "execution_count": 136, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.merge(fav_food, fav_drink, how=\"outer\")" ] }, { "cell_type": "code", "execution_count": 137, "id": "84ab9843", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namefooddrink
0PeterfishNaN
1PaulbeansNaN
2Marybreadwine
\n", "
" ], "text/plain": [ " name food drink\n", "0 Peter fish NaN\n", "1 Paul beans NaN\n", "2 Mary bread wine" ] }, "execution_count": 137, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.merge(fav_food, fav_drink, how=\"left\")" ] }, { "cell_type": "code", "execution_count": 138, "id": "3d11be90", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namefooddrink
0Marybreadwine
1JosephNaNbeer
\n", "
" ], "text/plain": [ " name food drink\n", "0 Mary bread wine\n", "1 Joseph NaN beer" ] }, "execution_count": 138, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.merge(fav_food, fav_drink, how=\"right\")" ] }, { "cell_type": "markdown", "id": "d8e16b5b", "metadata": {}, "source": [ "# 8. Handling Missing Data" ] }, { "cell_type": "code", "execution_count": 71, "id": "e782f9ea", "metadata": {}, "outputs": [], "source": [ "hibor = pd.read_csv(\"hibor.csv\", parse_dates=True, index_col='date')" ] }, { "cell_type": "code", "execution_count": 72, "id": "04401df8", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
overnight1 week2 weeks1 months2 months3 months6 months12 months
date
2010-01-01NaNNaNNaNNaNNaNNaNNaNNaN
2010-01-02NaNNaNNaNNaNNaNNaNNaNNaN
2010-01-03NaNNaNNaNNaNNaNNaNNaNNaN
2010-01-040.030.049710.050000.079640.118930.156790.315710.71429
2010-01-050.030.049710.050000.079640.110000.150000.299290.68929
2010-01-060.030.049000.049710.080000.110000.140000.280000.66857
2010-01-070.030.049710.049710.069640.100000.130000.260000.62857
2010-01-080.030.049710.049710.069640.100000.130000.260000.62857
2010-01-09NaNNaNNaNNaNNaNNaNNaNNaN
2010-01-10NaNNaNNaNNaNNaNNaNNaNNaN
2010-01-110.030.049710.049710.060360.090000.120000.240000.57000
\n", "
" ], "text/plain": [ " overnight 1 week 2 weeks 1 months 2 months 3 months \\\n", "date \n", "2010-01-01 NaN NaN NaN NaN NaN NaN \n", "2010-01-02 NaN NaN NaN NaN NaN NaN \n", "2010-01-03 NaN NaN NaN NaN NaN NaN \n", "2010-01-04 0.03 0.04971 0.05000 0.07964 0.11893 0.15679 \n", "2010-01-05 0.03 0.04971 0.05000 0.07964 0.11000 0.15000 \n", "2010-01-06 0.03 0.04900 0.04971 0.08000 0.11000 0.14000 \n", "2010-01-07 0.03 0.04971 0.04971 0.06964 0.10000 0.13000 \n", "2010-01-08 0.03 0.04971 0.04971 0.06964 0.10000 0.13000 \n", "2010-01-09 NaN NaN NaN NaN NaN NaN \n", "2010-01-10 NaN NaN NaN NaN NaN NaN \n", "2010-01-11 0.03 0.04971 0.04971 0.06036 0.09000 0.12000 \n", "\n", " 6 months 12 months \n", "date \n", "2010-01-01 NaN NaN \n", "2010-01-02 NaN NaN \n", "2010-01-03 NaN NaN \n", "2010-01-04 0.31571 0.71429 \n", "2010-01-05 0.29929 0.68929 \n", "2010-01-06 0.28000 0.66857 \n", "2010-01-07 0.26000 0.62857 \n", "2010-01-08 0.26000 0.62857 \n", "2010-01-09 NaN NaN \n", "2010-01-10 NaN NaN \n", "2010-01-11 0.24000 0.57000 " ] }, "execution_count": 72, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hibor" ] }, { "cell_type": "markdown", "id": "a880cff4", "metadata": {}, "source": [ "## 8.1 Check missing data" ] }, { "cell_type": "code", "execution_count": 77, "id": "6fc86cdc", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
overnight1 week2 weeks1 months2 months3 months6 months12 months
date
2010-01-01TrueTrueTrueTrueTrueTrueTrueTrue
2010-01-02TrueTrueTrueTrueTrueTrueTrueTrue
2010-01-03TrueTrueTrueTrueTrueTrueTrueTrue
2010-01-04FalseFalseFalseFalseFalseFalseFalseFalse
2010-01-05FalseFalseFalseFalseFalseFalseFalseFalse
2010-01-06FalseFalseFalseFalseFalseFalseFalseFalse
2010-01-07FalseFalseFalseFalseFalseFalseFalseFalse
2010-01-08FalseFalseFalseFalseFalseFalseFalseFalse
2010-01-09TrueTrueTrueTrueTrueTrueTrueTrue
2010-01-10TrueTrueTrueTrueTrueTrueTrueTrue
2010-01-11FalseFalseFalseFalseFalseFalseFalseFalse
\n", "
" ], "text/plain": [ " overnight 1 week 2 weeks 1 months 2 months 3 months \\\n", "date \n", "2010-01-01 True True True True True True \n", "2010-01-02 True True True True True True \n", "2010-01-03 True True True True True True \n", "2010-01-04 False False False False False False \n", "2010-01-05 False False False False False False \n", "2010-01-06 False False False False False False \n", "2010-01-07 False False False False False False \n", "2010-01-08 False False False False False False \n", "2010-01-09 True True True True True True \n", "2010-01-10 True True True True True True \n", "2010-01-11 False False False False False False \n", "\n", " 6 months 12 months \n", "date \n", "2010-01-01 True True \n", "2010-01-02 True True \n", "2010-01-03 True True \n", "2010-01-04 False False \n", "2010-01-05 False False \n", "2010-01-06 False False \n", "2010-01-07 False False \n", "2010-01-08 False False \n", "2010-01-09 True True \n", "2010-01-10 True True \n", "2010-01-11 False False " ] }, "execution_count": 77, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hibor.isnull()" ] }, { "cell_type": "code", "execution_count": 79, "id": "84d0ac6f", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 79, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hibor.isnull().values.any()" ] }, { "cell_type": "code", "execution_count": 80, "id": "fc5a7223", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 80, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hibor[\"overnight\"].isnull().values.any()" ] }, { "cell_type": "code", "execution_count": 84, "id": "87d7e12f", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False 6\n", "True 5\n", "Name: overnight, dtype: int64" ] }, "execution_count": 84, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hibor[\"overnight\"].isnull().value_counts()" ] }, { "cell_type": "code", "execution_count": 85, "id": "d3acd78f", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "date\n", "2010-01-01 NaN\n", "2010-01-02 NaN\n", "2010-01-03 NaN\n", "2010-01-09 NaN\n", "2010-01-10 NaN\n", "Name: overnight, dtype: float64" ] }, "execution_count": 85, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hibor[\"overnight\"][hibor[\"overnight\"].isnull()]" ] }, { "cell_type": "markdown", "id": "27237840", "metadata": {}, "source": [ "## 8.2 Drop Data" ] }, { "cell_type": "code", "execution_count": 87, "id": "f550b706", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
overnight1 week2 weeks1 months2 months3 months6 months12 months
date
2010-01-040.030.049710.050000.079640.118930.156790.315710.71429
2010-01-050.030.049710.050000.079640.110000.150000.299290.68929
2010-01-060.030.049000.049710.080000.110000.140000.280000.66857
2010-01-070.030.049710.049710.069640.100000.130000.260000.62857
2010-01-080.030.049710.049710.069640.100000.130000.260000.62857
2010-01-110.030.049710.049710.060360.090000.120000.240000.57000
\n", "
" ], "text/plain": [ " overnight 1 week 2 weeks 1 months 2 months 3 months \\\n", "date \n", "2010-01-04 0.03 0.04971 0.05000 0.07964 0.11893 0.15679 \n", "2010-01-05 0.03 0.04971 0.05000 0.07964 0.11000 0.15000 \n", "2010-01-06 0.03 0.04900 0.04971 0.08000 0.11000 0.14000 \n", "2010-01-07 0.03 0.04971 0.04971 0.06964 0.10000 0.13000 \n", "2010-01-08 0.03 0.04971 0.04971 0.06964 0.10000 0.13000 \n", "2010-01-11 0.03 0.04971 0.04971 0.06036 0.09000 0.12000 \n", "\n", " 6 months 12 months \n", "date \n", "2010-01-04 0.31571 0.71429 \n", "2010-01-05 0.29929 0.68929 \n", "2010-01-06 0.28000 0.66857 \n", "2010-01-07 0.26000 0.62857 \n", "2010-01-08 0.26000 0.62857 \n", "2010-01-11 0.24000 0.57000 " ] }, "execution_count": 87, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hibor.dropna()" ] }, { "cell_type": "markdown", "id": "3f0d7ba1", "metadata": {}, "source": [ "## 8.3 Fill with specific values" ] }, { "cell_type": "markdown", "id": "4033679b", "metadata": {}, "source": [ "Notes: Just show as an example. Does not make sense in this scenario" ] }, { "cell_type": "code", "execution_count": 90, "id": "0ac6a858", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
overnight1 week2 weeks1 months2 months3 months6 months12 months
date
2010-01-010.000.000000.000000.000000.000000.000000.000000.00000
2010-01-020.000.000000.000000.000000.000000.000000.000000.00000
2010-01-030.000.000000.000000.000000.000000.000000.000000.00000
2010-01-040.030.049710.050000.079640.118930.156790.315710.71429
2010-01-050.030.049710.050000.079640.110000.150000.299290.68929
2010-01-060.030.049000.049710.080000.110000.140000.280000.66857
2010-01-070.030.049710.049710.069640.100000.130000.260000.62857
2010-01-080.030.049710.049710.069640.100000.130000.260000.62857
2010-01-090.000.000000.000000.000000.000000.000000.000000.00000
2010-01-100.000.000000.000000.000000.000000.000000.000000.00000
2010-01-110.030.049710.049710.060360.090000.120000.240000.57000
\n", "
" ], "text/plain": [ " overnight 1 week 2 weeks 1 months 2 months 3 months \\\n", "date \n", "2010-01-01 0.00 0.00000 0.00000 0.00000 0.00000 0.00000 \n", "2010-01-02 0.00 0.00000 0.00000 0.00000 0.00000 0.00000 \n", "2010-01-03 0.00 0.00000 0.00000 0.00000 0.00000 0.00000 \n", "2010-01-04 0.03 0.04971 0.05000 0.07964 0.11893 0.15679 \n", "2010-01-05 0.03 0.04971 0.05000 0.07964 0.11000 0.15000 \n", "2010-01-06 0.03 0.04900 0.04971 0.08000 0.11000 0.14000 \n", "2010-01-07 0.03 0.04971 0.04971 0.06964 0.10000 0.13000 \n", "2010-01-08 0.03 0.04971 0.04971 0.06964 0.10000 0.13000 \n", "2010-01-09 0.00 0.00000 0.00000 0.00000 0.00000 0.00000 \n", "2010-01-10 0.00 0.00000 0.00000 0.00000 0.00000 0.00000 \n", "2010-01-11 0.03 0.04971 0.04971 0.06036 0.09000 0.12000 \n", "\n", " 6 months 12 months \n", "date \n", "2010-01-01 0.00000 0.00000 \n", "2010-01-02 0.00000 0.00000 \n", "2010-01-03 0.00000 0.00000 \n", "2010-01-04 0.31571 0.71429 \n", "2010-01-05 0.29929 0.68929 \n", "2010-01-06 0.28000 0.66857 \n", "2010-01-07 0.26000 0.62857 \n", "2010-01-08 0.26000 0.62857 \n", "2010-01-09 0.00000 0.00000 \n", "2010-01-10 0.00000 0.00000 \n", "2010-01-11 0.24000 0.57000 " ] }, "execution_count": 90, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hibor.fillna(0)" ] }, { "cell_type": "markdown", "id": "d03e428d", "metadata": {}, "source": [ "## 8.4 Fill with previous values (i.e. forward fill)\n" ] }, { "cell_type": "code", "execution_count": 92, "id": "0576a4c1", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
overnight1 week2 weeks1 months2 months3 months6 months12 months
date
2010-01-01NaNNaNNaNNaNNaNNaNNaNNaN
2010-01-02NaNNaNNaNNaNNaNNaNNaNNaN
2010-01-03NaNNaNNaNNaNNaNNaNNaNNaN
2010-01-040.030.049710.050000.079640.118930.156790.315710.71429
2010-01-050.030.049710.050000.079640.110000.150000.299290.68929
2010-01-060.030.049000.049710.080000.110000.140000.280000.66857
2010-01-070.030.049710.049710.069640.100000.130000.260000.62857
2010-01-080.030.049710.049710.069640.100000.130000.260000.62857
2010-01-090.030.049710.049710.069640.100000.130000.260000.62857
2010-01-100.030.049710.049710.069640.100000.130000.260000.62857
2010-01-110.030.049710.049710.060360.090000.120000.240000.57000
\n", "
" ], "text/plain": [ " overnight 1 week 2 weeks 1 months 2 months 3 months \\\n", "date \n", "2010-01-01 NaN NaN NaN NaN NaN NaN \n", "2010-01-02 NaN NaN NaN NaN NaN NaN \n", "2010-01-03 NaN NaN NaN NaN NaN NaN \n", "2010-01-04 0.03 0.04971 0.05000 0.07964 0.11893 0.15679 \n", "2010-01-05 0.03 0.04971 0.05000 0.07964 0.11000 0.15000 \n", "2010-01-06 0.03 0.04900 0.04971 0.08000 0.11000 0.14000 \n", "2010-01-07 0.03 0.04971 0.04971 0.06964 0.10000 0.13000 \n", "2010-01-08 0.03 0.04971 0.04971 0.06964 0.10000 0.13000 \n", "2010-01-09 0.03 0.04971 0.04971 0.06964 0.10000 0.13000 \n", "2010-01-10 0.03 0.04971 0.04971 0.06964 0.10000 0.13000 \n", "2010-01-11 0.03 0.04971 0.04971 0.06036 0.09000 0.12000 \n", "\n", " 6 months 12 months \n", "date \n", "2010-01-01 NaN NaN \n", "2010-01-02 NaN NaN \n", "2010-01-03 NaN NaN \n", "2010-01-04 0.31571 0.71429 \n", "2010-01-05 0.29929 0.68929 \n", "2010-01-06 0.28000 0.66857 \n", "2010-01-07 0.26000 0.62857 \n", "2010-01-08 0.26000 0.62857 \n", "2010-01-09 0.26000 0.62857 \n", "2010-01-10 0.26000 0.62857 \n", "2010-01-11 0.24000 0.57000 " ] }, "execution_count": 92, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hibor.fillna(method='ffill')" ] }, { "cell_type": "markdown", "id": "d916df12", "metadata": {}, "source": [ "## 8.5 Fill with next values (i.e. back fill)\n" ] }, { "cell_type": "markdown", "id": "aff4fd55", "metadata": {}, "source": [ "Remark: may not make sense in this example" ] }, { "cell_type": "code", "execution_count": 95, "id": "19fdac25", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
overnight1 week2 weeks1 months2 months3 months6 months12 months
date
2010-01-010.030.049710.050000.079640.118930.156790.315710.71429
2010-01-020.030.049710.050000.079640.118930.156790.315710.71429
2010-01-030.030.049710.050000.079640.118930.156790.315710.71429
2010-01-040.030.049710.050000.079640.118930.156790.315710.71429
2010-01-050.030.049710.050000.079640.110000.150000.299290.68929
2010-01-060.030.049000.049710.080000.110000.140000.280000.66857
2010-01-070.030.049710.049710.069640.100000.130000.260000.62857
2010-01-080.030.049710.049710.069640.100000.130000.260000.62857
2010-01-090.030.049710.049710.060360.090000.120000.240000.57000
2010-01-100.030.049710.049710.060360.090000.120000.240000.57000
2010-01-110.030.049710.049710.060360.090000.120000.240000.57000
\n", "
" ], "text/plain": [ " overnight 1 week 2 weeks 1 months 2 months 3 months \\\n", "date \n", "2010-01-01 0.03 0.04971 0.05000 0.07964 0.11893 0.15679 \n", "2010-01-02 0.03 0.04971 0.05000 0.07964 0.11893 0.15679 \n", "2010-01-03 0.03 0.04971 0.05000 0.07964 0.11893 0.15679 \n", "2010-01-04 0.03 0.04971 0.05000 0.07964 0.11893 0.15679 \n", "2010-01-05 0.03 0.04971 0.05000 0.07964 0.11000 0.15000 \n", "2010-01-06 0.03 0.04900 0.04971 0.08000 0.11000 0.14000 \n", "2010-01-07 0.03 0.04971 0.04971 0.06964 0.10000 0.13000 \n", "2010-01-08 0.03 0.04971 0.04971 0.06964 0.10000 0.13000 \n", "2010-01-09 0.03 0.04971 0.04971 0.06036 0.09000 0.12000 \n", "2010-01-10 0.03 0.04971 0.04971 0.06036 0.09000 0.12000 \n", "2010-01-11 0.03 0.04971 0.04971 0.06036 0.09000 0.12000 \n", "\n", " 6 months 12 months \n", "date \n", "2010-01-01 0.31571 0.71429 \n", "2010-01-02 0.31571 0.71429 \n", "2010-01-03 0.31571 0.71429 \n", "2010-01-04 0.31571 0.71429 \n", "2010-01-05 0.29929 0.68929 \n", "2010-01-06 0.28000 0.66857 \n", "2010-01-07 0.26000 0.62857 \n", "2010-01-08 0.26000 0.62857 \n", "2010-01-09 0.24000 0.57000 \n", "2010-01-10 0.24000 0.57000 \n", "2010-01-11 0.24000 0.57000 " ] }, "execution_count": 95, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hibor.fillna(method='bfill')" ] }, { "cell_type": "markdown", "id": "785c9ec4", "metadata": {}, "source": [ "# 9. Export CSV" ] }, { "cell_type": "markdown", "id": "fd978fcf", "metadata": {}, "source": [ "Export dataframe to a csv. Remember don't override the original file!" ] }, { "cell_type": "code", "execution_count": 46, "id": "3e53c244", "metadata": {}, "outputs": [], "source": [ "aapl_proper_index.to_csv(\"AAPL_new.csv\")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.0" } }, "nbformat": 4, "nbformat_minor": 5 }