Files
Man1130/jupyter/Man1130-python-comission/course_materials/Note/1. Working with Dataframe.ipynb
louiscklaw e44aead3d5 update,
2025-02-01 01:58:19 +08:00

7322 lines
233 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "markdown",
"id": "e7d963fc",
"metadata": {},
"source": [
"# Working with Data frame"
]
},
{
"cell_type": "markdown",
"id": "ffb6ef0a",
"metadata": {},
"source": [
"## 1. Pandas Library"
]
},
{
"cell_type": "markdown",
"id": "a080c24c",
"metadata": {},
"source": [
"Pandas library is the one of the most populated used library for manipulating with data. We use the Series and Dataframe data structure extensively as these are much more powerful and useful to manipulate with data when compare with list and dictionary in python.\n",
"\n",
"There's another very popular library called Numpy. Pandas bulid on top of it and we usually use pandas directly."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "122da828",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"id": "af8c9270",
"metadata": {},
"source": [
"## 2. Pandas Series"
]
},
{
"cell_type": "markdown",
"id": "7c82119d",
"metadata": {},
"source": [
"A series is very similar to a list. We can easily convert a list to a simple series. A series also has index."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "a9bfcdbb",
"metadata": {},
"outputs": [],
"source": [
"stocks = [\"AAPL\", \"BABA\", \"DIDI\", \"MSFT\", \"AMZN\", \"ADBE\", \"TSLA\", \"MS\", \"V\", \"MA\", \"GS\"]"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "c9026b35",
"metadata": {},
"outputs": [],
"source": [
"stocks_series = pd.Series(stocks)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "66a0bbd3",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 AAPL\n",
"1 BABA\n",
"2 DIDI\n",
"3 MSFT\n",
"4 AMZN\n",
"5 ADBE\n",
"6 TSLA\n",
"7 MS\n",
"8 V\n",
"9 MA\n",
"10 GS\n",
"dtype: object"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"stocks_series"
]
},
{
"cell_type": "markdown",
"id": "22f5a6b3",
"metadata": {},
"source": [
"Getting the values using index"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "db7b2e69",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1 BABA\n",
"2 DIDI\n",
"dtype: object"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"stocks_series[1:3]"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "31dd2927",
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"'AAPL'"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"stocks_series[0]"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "935307a6",
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"2 DIDI\n",
"3 MSFT\n",
"4 AMZN\n",
"5 ADBE\n",
"dtype: object"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"stocks_series[2:6]"
]
},
{
"cell_type": "markdown",
"id": "88c61e98",
"metadata": {},
"source": [
"The difference between list and series is that we can use not use interger as index. Now it looks more like a dictionary. And we can create it from a dictionary"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "59d345f2",
"metadata": {
"scrolled": true
},
"outputs": [
{
"ename": "NameError",
"evalue": "name 'sales' is not defined",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn [14], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m sales_series \u001b[38;5;241m=\u001b[39m pd\u001b[38;5;241m.\u001b[39mSeries(\u001b[43msales\u001b[49m)\n",
"\u001b[0;31mNameError\u001b[0m: name 'sales' is not defined"
]
}
],
"source": [
"sales_series = pd.Series(sales)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "64b9ea46",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Central Branch 10000\n",
"TST Branch 2000\n",
"Mongkok Branch 3000\n",
"dtype: int64"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sales_series"
]
},
{
"cell_type": "markdown",
"id": "b4e8c74c",
"metadata": {},
"source": [
"Getting the number using index"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "7e79920a",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"10000"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sales_series[\"Central Branch\"]"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "cbaa8383",
"metadata": {},
"outputs": [],
"source": [
"sales = {'Central Branch' : 10000,\n",
" 'TST Branch' : 2000,\n",
" 'Mongkok Branch' : 3000}"
]
},
{
"cell_type": "markdown",
"id": "388d2ece",
"metadata": {},
"source": [
"## 3 Pandas Dataframe"
]
},
{
"cell_type": "markdown",
"id": "c756f438",
"metadata": {},
"source": [
"You can consider the Series is one column of data on an excel spreadsheet. A dataframe has mulitple series and you can consider that the data of a whole spreadsheet"
]
},
{
"cell_type": "markdown",
"id": "12aa8a6f",
"metadata": {},
"source": [
"### 3.1 Create dataframe from csv"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "0a88f1c9",
"metadata": {},
"outputs": [
{
"ename": "FileNotFoundError",
"evalue": "[Errno 2] No such file or directory: 'AAPL.csv'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mFileNotFoundError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn [15], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m aapl \u001b[38;5;241m=\u001b[39m \u001b[43mpd\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mread_csv\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mAAPL.csv\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\n",
"File \u001b[0;32m~/.local/share/virtualenvs/Note-Vc8kZtnp/lib64/python3.11/site-packages/pandas/util/_decorators.py:211\u001b[0m, in \u001b[0;36mdeprecate_kwarg.<locals>._deprecate_kwarg.<locals>.wrapper\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 209\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m 210\u001b[0m kwargs[new_arg_name] \u001b[38;5;241m=\u001b[39m new_arg_value\n\u001b[0;32m--> 211\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mfunc\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n",
"File \u001b[0;32m~/.local/share/virtualenvs/Note-Vc8kZtnp/lib64/python3.11/site-packages/pandas/util/_decorators.py:331\u001b[0m, in \u001b[0;36mdeprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 325\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mlen\u001b[39m(args) \u001b[38;5;241m>\u001b[39m num_allow_args:\n\u001b[1;32m 326\u001b[0m warnings\u001b[38;5;241m.\u001b[39mwarn(\n\u001b[1;32m 327\u001b[0m msg\u001b[38;5;241m.\u001b[39mformat(arguments\u001b[38;5;241m=\u001b[39m_format_argument_list(allow_args)),\n\u001b[1;32m 328\u001b[0m \u001b[38;5;167;01mFutureWarning\u001b[39;00m,\n\u001b[1;32m 329\u001b[0m stacklevel\u001b[38;5;241m=\u001b[39mfind_stack_level(),\n\u001b[1;32m 330\u001b[0m )\n\u001b[0;32m--> 331\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mfunc\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n",
"File \u001b[0;32m~/.local/share/virtualenvs/Note-Vc8kZtnp/lib64/python3.11/site-packages/pandas/io/parsers/readers.py:950\u001b[0m, in \u001b[0;36mread_csv\u001b[0;34m(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)\u001b[0m\n\u001b[1;32m 935\u001b[0m kwds_defaults \u001b[38;5;241m=\u001b[39m _refine_defaults_read(\n\u001b[1;32m 936\u001b[0m dialect,\n\u001b[1;32m 937\u001b[0m delimiter,\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 946\u001b[0m defaults\u001b[38;5;241m=\u001b[39m{\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mdelimiter\u001b[39m\u001b[38;5;124m\"\u001b[39m: \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m,\u001b[39m\u001b[38;5;124m\"\u001b[39m},\n\u001b[1;32m 947\u001b[0m )\n\u001b[1;32m 948\u001b[0m kwds\u001b[38;5;241m.\u001b[39mupdate(kwds_defaults)\n\u001b[0;32m--> 950\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43m_read\u001b[49m\u001b[43m(\u001b[49m\u001b[43mfilepath_or_buffer\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mkwds\u001b[49m\u001b[43m)\u001b[49m\n",
"File \u001b[0;32m~/.local/share/virtualenvs/Note-Vc8kZtnp/lib64/python3.11/site-packages/pandas/io/parsers/readers.py:605\u001b[0m, in \u001b[0;36m_read\u001b[0;34m(filepath_or_buffer, kwds)\u001b[0m\n\u001b[1;32m 602\u001b[0m _validate_names(kwds\u001b[38;5;241m.\u001b[39mget(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mnames\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;28;01mNone\u001b[39;00m))\n\u001b[1;32m 604\u001b[0m \u001b[38;5;66;03m# Create the parser.\u001b[39;00m\n\u001b[0;32m--> 605\u001b[0m parser \u001b[38;5;241m=\u001b[39m \u001b[43mTextFileReader\u001b[49m\u001b[43m(\u001b[49m\u001b[43mfilepath_or_buffer\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwds\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 607\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m chunksize \u001b[38;5;129;01mor\u001b[39;00m iterator:\n\u001b[1;32m 608\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m parser\n",
"File \u001b[0;32m~/.local/share/virtualenvs/Note-Vc8kZtnp/lib64/python3.11/site-packages/pandas/io/parsers/readers.py:1442\u001b[0m, in \u001b[0;36mTextFileReader.__init__\u001b[0;34m(self, f, engine, **kwds)\u001b[0m\n\u001b[1;32m 1439\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39moptions[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mhas_index_names\u001b[39m\u001b[38;5;124m\"\u001b[39m] \u001b[38;5;241m=\u001b[39m kwds[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mhas_index_names\u001b[39m\u001b[38;5;124m\"\u001b[39m]\n\u001b[1;32m 1441\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mhandles: IOHandles \u001b[38;5;241m|\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[0;32m-> 1442\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_engine \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_make_engine\u001b[49m\u001b[43m(\u001b[49m\u001b[43mf\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mengine\u001b[49m\u001b[43m)\u001b[49m\n",
"File \u001b[0;32m~/.local/share/virtualenvs/Note-Vc8kZtnp/lib64/python3.11/site-packages/pandas/io/parsers/readers.py:1735\u001b[0m, in \u001b[0;36mTextFileReader._make_engine\u001b[0;34m(self, f, engine)\u001b[0m\n\u001b[1;32m 1733\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mb\u001b[39m\u001b[38;5;124m\"\u001b[39m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;129;01min\u001b[39;00m mode:\n\u001b[1;32m 1734\u001b[0m mode \u001b[38;5;241m+\u001b[39m\u001b[38;5;241m=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mb\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[0;32m-> 1735\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mhandles \u001b[38;5;241m=\u001b[39m \u001b[43mget_handle\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 1736\u001b[0m \u001b[43m \u001b[49m\u001b[43mf\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1737\u001b[0m \u001b[43m \u001b[49m\u001b[43mmode\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1738\u001b[0m \u001b[43m \u001b[49m\u001b[43mencoding\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43moptions\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mget\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mencoding\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01mNone\u001b[39;49;00m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1739\u001b[0m \u001b[43m \u001b[49m\u001b[43mcompression\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43moptions\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mget\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mcompression\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01mNone\u001b[39;49;00m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1740\u001b[0m \u001b[43m \u001b[49m\u001b[43mmemory_map\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43moptions\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mget\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mmemory_map\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01mFalse\u001b[39;49;00m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1741\u001b[0m \u001b[43m \u001b[49m\u001b[43mis_text\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mis_text\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1742\u001b[0m \u001b[43m \u001b[49m\u001b[43merrors\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43moptions\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mget\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mencoding_errors\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mstrict\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1743\u001b[0m \u001b[43m \u001b[49m\u001b[43mstorage_options\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43moptions\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mget\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mstorage_options\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01mNone\u001b[39;49;00m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 1744\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 1745\u001b[0m \u001b[38;5;28;01massert\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mhandles \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[1;32m 1746\u001b[0m f \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mhandles\u001b[38;5;241m.\u001b[39mhandle\n",
"File \u001b[0;32m~/.local/share/virtualenvs/Note-Vc8kZtnp/lib64/python3.11/site-packages/pandas/io/common.py:856\u001b[0m, in \u001b[0;36mget_handle\u001b[0;34m(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)\u001b[0m\n\u001b[1;32m 851\u001b[0m \u001b[38;5;28;01melif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(handle, \u001b[38;5;28mstr\u001b[39m):\n\u001b[1;32m 852\u001b[0m \u001b[38;5;66;03m# Check whether the filename is to be opened in binary mode.\u001b[39;00m\n\u001b[1;32m 853\u001b[0m \u001b[38;5;66;03m# Binary mode does not support 'encoding' and 'newline'.\u001b[39;00m\n\u001b[1;32m 854\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m ioargs\u001b[38;5;241m.\u001b[39mencoding \u001b[38;5;129;01mand\u001b[39;00m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mb\u001b[39m\u001b[38;5;124m\"\u001b[39m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;129;01min\u001b[39;00m ioargs\u001b[38;5;241m.\u001b[39mmode:\n\u001b[1;32m 855\u001b[0m \u001b[38;5;66;03m# Encoding\u001b[39;00m\n\u001b[0;32m--> 856\u001b[0m handle \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mopen\u001b[39;49m\u001b[43m(\u001b[49m\n\u001b[1;32m 857\u001b[0m \u001b[43m \u001b[49m\u001b[43mhandle\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 858\u001b[0m \u001b[43m \u001b[49m\u001b[43mioargs\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mmode\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 859\u001b[0m \u001b[43m \u001b[49m\u001b[43mencoding\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mioargs\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mencoding\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 860\u001b[0m \u001b[43m \u001b[49m\u001b[43merrors\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43merrors\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 861\u001b[0m \u001b[43m \u001b[49m\u001b[43mnewline\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[1;32m 862\u001b[0m \u001b[43m \u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 863\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m 864\u001b[0m \u001b[38;5;66;03m# Binary mode\u001b[39;00m\n\u001b[1;32m 865\u001b[0m handle \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mopen\u001b[39m(handle, ioargs\u001b[38;5;241m.\u001b[39mmode)\n",
"\u001b[0;31mFileNotFoundError\u001b[0m: [Errno 2] No such file or directory: 'AAPL.csv'"
]
}
],
"source": [
"aapl = pd.read_csv(\"AAPL.csv\")"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "20064351",
"metadata": {},
"outputs": [
{
"ename": "NameError",
"evalue": "name 'aapl' is not defined",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn [16], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43maapl\u001b[49m\n",
"\u001b[0;31mNameError\u001b[0m: name 'aapl' is not defined"
]
}
],
"source": [
"aapl"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "a6ae2efc",
"metadata": {},
"outputs": [],
"source": [
"aapl_proper_index = pd.read_csv(\"AAPL.csv\", parse_dates=True, index_col='Date')"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "92625a87",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Open</th>\n",
" <th>High</th>\n",
" <th>Low</th>\n",
" <th>Close</th>\n",
" <th>Adj Close</th>\n",
" <th>Volume</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Date</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2019-10-28</th>\n",
" <td>61.855000</td>\n",
" <td>62.312500</td>\n",
" <td>61.680000</td>\n",
" <td>62.262501</td>\n",
" <td>61.650810</td>\n",
" <td>96572800</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-10-29</th>\n",
" <td>62.242500</td>\n",
" <td>62.437500</td>\n",
" <td>60.642502</td>\n",
" <td>60.822498</td>\n",
" <td>60.224953</td>\n",
" <td>142839600</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-10-30</th>\n",
" <td>61.189999</td>\n",
" <td>61.325001</td>\n",
" <td>60.302502</td>\n",
" <td>60.814999</td>\n",
" <td>60.217525</td>\n",
" <td>124522000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-10-31</th>\n",
" <td>61.810001</td>\n",
" <td>62.292500</td>\n",
" <td>59.314999</td>\n",
" <td>62.189999</td>\n",
" <td>61.579021</td>\n",
" <td>139162000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-01</th>\n",
" <td>62.384998</td>\n",
" <td>63.982498</td>\n",
" <td>62.290001</td>\n",
" <td>63.955002</td>\n",
" <td>63.326683</td>\n",
" <td>151125200</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-10-21</th>\n",
" <td>116.669998</td>\n",
" <td>118.709999</td>\n",
" <td>116.449997</td>\n",
" <td>116.870003</td>\n",
" <td>116.870003</td>\n",
" <td>89946000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-10-22</th>\n",
" <td>117.449997</td>\n",
" <td>118.040001</td>\n",
" <td>114.589996</td>\n",
" <td>115.750000</td>\n",
" <td>115.750000</td>\n",
" <td>101988000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-10-23</th>\n",
" <td>116.389999</td>\n",
" <td>116.550003</td>\n",
" <td>114.279999</td>\n",
" <td>115.040001</td>\n",
" <td>115.040001</td>\n",
" <td>82572600</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-10-26</th>\n",
" <td>114.010002</td>\n",
" <td>116.550003</td>\n",
" <td>112.879997</td>\n",
" <td>115.050003</td>\n",
" <td>115.050003</td>\n",
" <td>111850700</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-10-27</th>\n",
" <td>115.489998</td>\n",
" <td>117.279999</td>\n",
" <td>114.540001</td>\n",
" <td>116.599998</td>\n",
" <td>116.599998</td>\n",
" <td>91927700</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>253 rows × 6 columns</p>\n",
"</div>"
],
"text/plain": [
" Open High Low Close Adj Close \\\n",
"Date \n",
"2019-10-28 61.855000 62.312500 61.680000 62.262501 61.650810 \n",
"2019-10-29 62.242500 62.437500 60.642502 60.822498 60.224953 \n",
"2019-10-30 61.189999 61.325001 60.302502 60.814999 60.217525 \n",
"2019-10-31 61.810001 62.292500 59.314999 62.189999 61.579021 \n",
"2019-11-01 62.384998 63.982498 62.290001 63.955002 63.326683 \n",
"... ... ... ... ... ... \n",
"2020-10-21 116.669998 118.709999 116.449997 116.870003 116.870003 \n",
"2020-10-22 117.449997 118.040001 114.589996 115.750000 115.750000 \n",
"2020-10-23 116.389999 116.550003 114.279999 115.040001 115.040001 \n",
"2020-10-26 114.010002 116.550003 112.879997 115.050003 115.050003 \n",
"2020-10-27 115.489998 117.279999 114.540001 116.599998 116.599998 \n",
"\n",
" Volume \n",
"Date \n",
"2019-10-28 96572800 \n",
"2019-10-29 142839600 \n",
"2019-10-30 124522000 \n",
"2019-10-31 139162000 \n",
"2019-11-01 151125200 \n",
"... ... \n",
"2020-10-21 89946000 \n",
"2020-10-22 101988000 \n",
"2020-10-23 82572600 \n",
"2020-10-26 111850700 \n",
"2020-10-27 91927700 \n",
"\n",
"[253 rows x 6 columns]"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"aapl_proper_index"
]
},
{
"cell_type": "markdown",
"id": "7a3a1988",
"metadata": {},
"source": [
"### 3.2 From Quandl"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "da0ee6dc",
"metadata": {
"scrolled": false
},
"outputs": [],
"source": [
"import quandl"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "3d0ac03e",
"metadata": {},
"outputs": [],
"source": [
"quandl.ApiConfig.api_key = 'x9M_pZutNNPnha1WDdjZ'\n",
"ck = quandl.get('HKEX/00001', start_date='2020-10-20', end_date='2021-10-20')"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "ebda1d4c",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Nominal Price</th>\n",
" <th>Net Change</th>\n",
" <th>Change (%)</th>\n",
" <th>Bid</th>\n",
" <th>Ask</th>\n",
" <th>P/E(x)</th>\n",
" <th>High</th>\n",
" <th>Low</th>\n",
" <th>Previous Close</th>\n",
" <th>Share Volume (000)</th>\n",
" <th>Turnover (000)</th>\n",
" <th>Lot Size</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Date</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2020-10-20</th>\n",
" <td>46.05</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>46.05</td>\n",
" <td>46.10</td>\n",
" <td>None</td>\n",
" <td>46.35</td>\n",
" <td>45.85</td>\n",
" <td>46.35</td>\n",
" <td>4193.0</td>\n",
" <td>192921.0</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-10-21</th>\n",
" <td>46.15</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>46.15</td>\n",
" <td>46.20</td>\n",
" <td>None</td>\n",
" <td>46.50</td>\n",
" <td>45.95</td>\n",
" <td>46.05</td>\n",
" <td>4830.0</td>\n",
" <td>223077.0</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-10-22</th>\n",
" <td>46.10</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>46.10</td>\n",
" <td>46.15</td>\n",
" <td>None</td>\n",
" <td>46.35</td>\n",
" <td>45.90</td>\n",
" <td>46.15</td>\n",
" <td>4902.0</td>\n",
" <td>226000.0</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-10-23</th>\n",
" <td>46.40</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>46.40</td>\n",
" <td>46.45</td>\n",
" <td>None</td>\n",
" <td>46.50</td>\n",
" <td>45.80</td>\n",
" <td>46.10</td>\n",
" <td>3815.0</td>\n",
" <td>176451.0</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-10-27</th>\n",
" <td>46.75</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>46.75</td>\n",
" <td>46.80</td>\n",
" <td>None</td>\n",
" <td>47.15</td>\n",
" <td>46.50</td>\n",
" <td>46.40</td>\n",
" <td>12095.0</td>\n",
" <td>566845.0</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2021-10-12</th>\n",
" <td>52.60</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>52.60</td>\n",
" <td>52.65</td>\n",
" <td>None</td>\n",
" <td>53.10</td>\n",
" <td>52.35</td>\n",
" <td>52.95</td>\n",
" <td>2712.0</td>\n",
" <td>142802.0</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2021-10-15</th>\n",
" <td>52.50</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>52.50</td>\n",
" <td>52.55</td>\n",
" <td>None</td>\n",
" <td>52.95</td>\n",
" <td>52.00</td>\n",
" <td>52.60</td>\n",
" <td>5067.0</td>\n",
" <td>266129.0</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2021-10-18</th>\n",
" <td>52.70</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>52.70</td>\n",
" <td>52.75</td>\n",
" <td>None</td>\n",
" <td>53.00</td>\n",
" <td>52.30</td>\n",
" <td>52.50</td>\n",
" <td>4036.0</td>\n",
" <td>212393.0</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2021-10-19</th>\n",
" <td>53.30</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>53.25</td>\n",
" <td>53.30</td>\n",
" <td>None</td>\n",
" <td>53.50</td>\n",
" <td>52.95</td>\n",
" <td>52.70</td>\n",
" <td>2484.0</td>\n",
" <td>132423.0</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2021-10-20</th>\n",
" <td>53.25</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>53.20</td>\n",
" <td>53.25</td>\n",
" <td>None</td>\n",
" <td>53.40</td>\n",
" <td>52.80</td>\n",
" <td>53.30</td>\n",
" <td>3649.0</td>\n",
" <td>193972.0</td>\n",
" <td>None</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>247 rows × 12 columns</p>\n",
"</div>"
],
"text/plain": [
" Nominal Price Net Change Change (%) Bid Ask P/E(x) High \\\n",
"Date \n",
"2020-10-20 46.05 None None 46.05 46.10 None 46.35 \n",
"2020-10-21 46.15 None None 46.15 46.20 None 46.50 \n",
"2020-10-22 46.10 None None 46.10 46.15 None 46.35 \n",
"2020-10-23 46.40 None None 46.40 46.45 None 46.50 \n",
"2020-10-27 46.75 None None 46.75 46.80 None 47.15 \n",
"... ... ... ... ... ... ... ... \n",
"2021-10-12 52.60 None None 52.60 52.65 None 53.10 \n",
"2021-10-15 52.50 None None 52.50 52.55 None 52.95 \n",
"2021-10-18 52.70 None None 52.70 52.75 None 53.00 \n",
"2021-10-19 53.30 None None 53.25 53.30 None 53.50 \n",
"2021-10-20 53.25 None None 53.20 53.25 None 53.40 \n",
"\n",
" Low Previous Close Share Volume (000) Turnover (000) Lot Size \n",
"Date \n",
"2020-10-20 45.85 46.35 4193.0 192921.0 None \n",
"2020-10-21 45.95 46.05 4830.0 223077.0 None \n",
"2020-10-22 45.90 46.15 4902.0 226000.0 None \n",
"2020-10-23 45.80 46.10 3815.0 176451.0 None \n",
"2020-10-27 46.50 46.40 12095.0 566845.0 None \n",
"... ... ... ... ... ... \n",
"2021-10-12 52.35 52.95 2712.0 142802.0 None \n",
"2021-10-15 52.00 52.60 5067.0 266129.0 None \n",
"2021-10-18 52.30 52.50 4036.0 212393.0 None \n",
"2021-10-19 52.95 52.70 2484.0 132423.0 None \n",
"2021-10-20 52.80 53.30 3649.0 193972.0 None \n",
"\n",
"[247 rows x 12 columns]"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ck"
]
},
{
"cell_type": "markdown",
"id": "db245a3b",
"metadata": {},
"source": [
"### 3.3 From Series"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "c389d843",
"metadata": {},
"outputs": [],
"source": [
"costs = {'Central Branch' : 300000,\n",
" 'TST Branch' : 50000,\n",
" 'Mongkok Branch' : 20000}"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "b6936c74",
"metadata": {},
"outputs": [
{
"ename": "NameError",
"evalue": "name 'sales' is not defined",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn [22], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m branch_summary \u001b[38;5;241m=\u001b[39m pd\u001b[38;5;241m.\u001b[39mDataFrame({\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124msales\u001b[39m\u001b[38;5;124m\"\u001b[39m: \u001b[43msales\u001b[49m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mcosts\u001b[39m\u001b[38;5;124m\"\u001b[39m: costs})\n",
"\u001b[0;31mNameError\u001b[0m: name 'sales' is not defined"
]
}
],
"source": [
"branch_summary = pd.DataFrame({\"sales\": sales, \"costs\": costs})"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "9107a2e3",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>sales</th>\n",
" <th>costs</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Central Branch</th>\n",
" <td>10000</td>\n",
" <td>300000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>TST Branch</th>\n",
" <td>2000</td>\n",
" <td>50000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Mongkok Branch</th>\n",
" <td>3000</td>\n",
" <td>20000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" sales costs\n",
"Central Branch 10000 300000\n",
"TST Branch 2000 50000\n",
"Mongkok Branch 3000 20000"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"branch_summary"
]
},
{
"cell_type": "markdown",
"id": "5bab7a42",
"metadata": {},
"source": [
"### 3.4 Getting data from dataframe (getting rows with date)"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "6a84288d",
"metadata": {},
"outputs": [
{
"ename": "NameError",
"evalue": "name 'aapl_proper_index' is not defined",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn [23], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43maapl_proper_index\u001b[49m\u001b[38;5;241m.\u001b[39mloc[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m2019-10-30\u001b[39m\u001b[38;5;124m\"\u001b[39m]\n",
"\u001b[0;31mNameError\u001b[0m: name 'aapl_proper_index' is not defined"
]
}
],
"source": [
"aapl_proper_index.loc[\"2019-10-30\"]"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "60c748c1",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Open</th>\n",
" <th>High</th>\n",
" <th>Low</th>\n",
" <th>Close</th>\n",
" <th>Adj Close</th>\n",
" <th>Volume</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Date</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2019-10-30</th>\n",
" <td>61.189999</td>\n",
" <td>61.325001</td>\n",
" <td>60.302502</td>\n",
" <td>60.814999</td>\n",
" <td>60.217525</td>\n",
" <td>124522000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-10-31</th>\n",
" <td>61.810001</td>\n",
" <td>62.292500</td>\n",
" <td>59.314999</td>\n",
" <td>62.189999</td>\n",
" <td>61.579021</td>\n",
" <td>139162000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-01</th>\n",
" <td>62.384998</td>\n",
" <td>63.982498</td>\n",
" <td>62.290001</td>\n",
" <td>63.955002</td>\n",
" <td>63.326683</td>\n",
" <td>151125200</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-04</th>\n",
" <td>64.332497</td>\n",
" <td>64.462502</td>\n",
" <td>63.845001</td>\n",
" <td>64.375000</td>\n",
" <td>63.742554</td>\n",
" <td>103272000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-05</th>\n",
" <td>64.262497</td>\n",
" <td>64.547501</td>\n",
" <td>64.080002</td>\n",
" <td>64.282501</td>\n",
" <td>63.650970</td>\n",
" <td>79897600</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-06</th>\n",
" <td>64.192497</td>\n",
" <td>64.372498</td>\n",
" <td>63.842499</td>\n",
" <td>64.309998</td>\n",
" <td>63.678192</td>\n",
" <td>75864400</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-07</th>\n",
" <td>64.684998</td>\n",
" <td>65.087502</td>\n",
" <td>64.527496</td>\n",
" <td>64.857498</td>\n",
" <td>64.413116</td>\n",
" <td>94940400</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-08</th>\n",
" <td>64.672501</td>\n",
" <td>65.110001</td>\n",
" <td>64.212502</td>\n",
" <td>65.035004</td>\n",
" <td>64.589409</td>\n",
" <td>69986400</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-11</th>\n",
" <td>64.574997</td>\n",
" <td>65.617500</td>\n",
" <td>64.570000</td>\n",
" <td>65.550003</td>\n",
" <td>65.100876</td>\n",
" <td>81821200</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-12</th>\n",
" <td>65.387497</td>\n",
" <td>65.697502</td>\n",
" <td>65.230003</td>\n",
" <td>65.489998</td>\n",
" <td>65.041283</td>\n",
" <td>87388800</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-13</th>\n",
" <td>65.282501</td>\n",
" <td>66.195000</td>\n",
" <td>65.267502</td>\n",
" <td>66.117500</td>\n",
" <td>65.664490</td>\n",
" <td>102734400</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-14</th>\n",
" <td>65.937500</td>\n",
" <td>66.220001</td>\n",
" <td>65.525002</td>\n",
" <td>65.660004</td>\n",
" <td>65.210121</td>\n",
" <td>89182800</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-15</th>\n",
" <td>65.919998</td>\n",
" <td>66.445000</td>\n",
" <td>65.752502</td>\n",
" <td>66.440002</td>\n",
" <td>65.984779</td>\n",
" <td>100206400</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Open High Low Close Adj Close Volume\n",
"Date \n",
"2019-10-30 61.189999 61.325001 60.302502 60.814999 60.217525 124522000\n",
"2019-10-31 61.810001 62.292500 59.314999 62.189999 61.579021 139162000\n",
"2019-11-01 62.384998 63.982498 62.290001 63.955002 63.326683 151125200\n",
"2019-11-04 64.332497 64.462502 63.845001 64.375000 63.742554 103272000\n",
"2019-11-05 64.262497 64.547501 64.080002 64.282501 63.650970 79897600\n",
"2019-11-06 64.192497 64.372498 63.842499 64.309998 63.678192 75864400\n",
"2019-11-07 64.684998 65.087502 64.527496 64.857498 64.413116 94940400\n",
"2019-11-08 64.672501 65.110001 64.212502 65.035004 64.589409 69986400\n",
"2019-11-11 64.574997 65.617500 64.570000 65.550003 65.100876 81821200\n",
"2019-11-12 65.387497 65.697502 65.230003 65.489998 65.041283 87388800\n",
"2019-11-13 65.282501 66.195000 65.267502 66.117500 65.664490 102734400\n",
"2019-11-14 65.937500 66.220001 65.525002 65.660004 65.210121 89182800\n",
"2019-11-15 65.919998 66.445000 65.752502 66.440002 65.984779 100206400"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"aapl_proper_index.loc[\"2019-10-30\":\"2019-11-15\"]"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "b0c747d8",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Open</th>\n",
" <th>High</th>\n",
" <th>Low</th>\n",
" <th>Close</th>\n",
" <th>Adj Close</th>\n",
" <th>Volume</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Date</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2019-11-01</th>\n",
" <td>62.384998</td>\n",
" <td>63.982498</td>\n",
" <td>62.290001</td>\n",
" <td>63.955002</td>\n",
" <td>63.326683</td>\n",
" <td>151125200</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-04</th>\n",
" <td>64.332497</td>\n",
" <td>64.462502</td>\n",
" <td>63.845001</td>\n",
" <td>64.375000</td>\n",
" <td>63.742554</td>\n",
" <td>103272000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-05</th>\n",
" <td>64.262497</td>\n",
" <td>64.547501</td>\n",
" <td>64.080002</td>\n",
" <td>64.282501</td>\n",
" <td>63.650970</td>\n",
" <td>79897600</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-06</th>\n",
" <td>64.192497</td>\n",
" <td>64.372498</td>\n",
" <td>63.842499</td>\n",
" <td>64.309998</td>\n",
" <td>63.678192</td>\n",
" <td>75864400</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-07</th>\n",
" <td>64.684998</td>\n",
" <td>65.087502</td>\n",
" <td>64.527496</td>\n",
" <td>64.857498</td>\n",
" <td>64.413116</td>\n",
" <td>94940400</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-08</th>\n",
" <td>64.672501</td>\n",
" <td>65.110001</td>\n",
" <td>64.212502</td>\n",
" <td>65.035004</td>\n",
" <td>64.589409</td>\n",
" <td>69986400</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-11</th>\n",
" <td>64.574997</td>\n",
" <td>65.617500</td>\n",
" <td>64.570000</td>\n",
" <td>65.550003</td>\n",
" <td>65.100876</td>\n",
" <td>81821200</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-12</th>\n",
" <td>65.387497</td>\n",
" <td>65.697502</td>\n",
" <td>65.230003</td>\n",
" <td>65.489998</td>\n",
" <td>65.041283</td>\n",
" <td>87388800</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-13</th>\n",
" <td>65.282501</td>\n",
" <td>66.195000</td>\n",
" <td>65.267502</td>\n",
" <td>66.117500</td>\n",
" <td>65.664490</td>\n",
" <td>102734400</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-14</th>\n",
" <td>65.937500</td>\n",
" <td>66.220001</td>\n",
" <td>65.525002</td>\n",
" <td>65.660004</td>\n",
" <td>65.210121</td>\n",
" <td>89182800</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-15</th>\n",
" <td>65.919998</td>\n",
" <td>66.445000</td>\n",
" <td>65.752502</td>\n",
" <td>66.440002</td>\n",
" <td>65.984779</td>\n",
" <td>100206400</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-18</th>\n",
" <td>66.449997</td>\n",
" <td>66.857498</td>\n",
" <td>66.057503</td>\n",
" <td>66.775002</td>\n",
" <td>66.317490</td>\n",
" <td>86703200</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-19</th>\n",
" <td>66.974998</td>\n",
" <td>67.000000</td>\n",
" <td>66.347504</td>\n",
" <td>66.572502</td>\n",
" <td>66.116371</td>\n",
" <td>76167200</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-20</th>\n",
" <td>66.385002</td>\n",
" <td>66.519997</td>\n",
" <td>65.099998</td>\n",
" <td>65.797501</td>\n",
" <td>65.346687</td>\n",
" <td>106234400</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-21</th>\n",
" <td>65.922501</td>\n",
" <td>66.002502</td>\n",
" <td>65.294998</td>\n",
" <td>65.502502</td>\n",
" <td>65.053703</td>\n",
" <td>121395200</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-22</th>\n",
" <td>65.647499</td>\n",
" <td>65.794998</td>\n",
" <td>65.209999</td>\n",
" <td>65.445000</td>\n",
" <td>64.996597</td>\n",
" <td>65325200</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-25</th>\n",
" <td>65.677498</td>\n",
" <td>66.610001</td>\n",
" <td>65.629997</td>\n",
" <td>66.592499</td>\n",
" <td>66.136230</td>\n",
" <td>84020400</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-26</th>\n",
" <td>66.735001</td>\n",
" <td>66.790001</td>\n",
" <td>65.625000</td>\n",
" <td>66.072502</td>\n",
" <td>65.619789</td>\n",
" <td>105207600</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-27</th>\n",
" <td>66.394997</td>\n",
" <td>66.995003</td>\n",
" <td>66.327499</td>\n",
" <td>66.959999</td>\n",
" <td>66.501213</td>\n",
" <td>65235600</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-29</th>\n",
" <td>66.650002</td>\n",
" <td>67.000000</td>\n",
" <td>66.474998</td>\n",
" <td>66.812500</td>\n",
" <td>66.354729</td>\n",
" <td>46617600</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Open High Low Close Adj Close Volume\n",
"Date \n",
"2019-11-01 62.384998 63.982498 62.290001 63.955002 63.326683 151125200\n",
"2019-11-04 64.332497 64.462502 63.845001 64.375000 63.742554 103272000\n",
"2019-11-05 64.262497 64.547501 64.080002 64.282501 63.650970 79897600\n",
"2019-11-06 64.192497 64.372498 63.842499 64.309998 63.678192 75864400\n",
"2019-11-07 64.684998 65.087502 64.527496 64.857498 64.413116 94940400\n",
"2019-11-08 64.672501 65.110001 64.212502 65.035004 64.589409 69986400\n",
"2019-11-11 64.574997 65.617500 64.570000 65.550003 65.100876 81821200\n",
"2019-11-12 65.387497 65.697502 65.230003 65.489998 65.041283 87388800\n",
"2019-11-13 65.282501 66.195000 65.267502 66.117500 65.664490 102734400\n",
"2019-11-14 65.937500 66.220001 65.525002 65.660004 65.210121 89182800\n",
"2019-11-15 65.919998 66.445000 65.752502 66.440002 65.984779 100206400\n",
"2019-11-18 66.449997 66.857498 66.057503 66.775002 66.317490 86703200\n",
"2019-11-19 66.974998 67.000000 66.347504 66.572502 66.116371 76167200\n",
"2019-11-20 66.385002 66.519997 65.099998 65.797501 65.346687 106234400\n",
"2019-11-21 65.922501 66.002502 65.294998 65.502502 65.053703 121395200\n",
"2019-11-22 65.647499 65.794998 65.209999 65.445000 64.996597 65325200\n",
"2019-11-25 65.677498 66.610001 65.629997 66.592499 66.136230 84020400\n",
"2019-11-26 66.735001 66.790001 65.625000 66.072502 65.619789 105207600\n",
"2019-11-27 66.394997 66.995003 66.327499 66.959999 66.501213 65235600\n",
"2019-11-29 66.650002 67.000000 66.474998 66.812500 66.354729 46617600"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"aapl_proper_index.loc[\"2019-11\"]"
]
},
{
"cell_type": "markdown",
"id": "de65e36f",
"metadata": {},
"source": [
"### 3.5 Getting data from dataframe (get a series)"
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "ff87318b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Date\n",
"2019-11-01 63.955002\n",
"2019-11-04 64.375000\n",
"2019-11-05 64.282501\n",
"2019-11-06 64.309998\n",
"2019-11-07 64.857498\n",
"2019-11-08 65.035004\n",
"2019-11-11 65.550003\n",
"2019-11-12 65.489998\n",
"2019-11-13 66.117500\n",
"2019-11-14 65.660004\n",
"2019-11-15 66.440002\n",
"2019-11-18 66.775002\n",
"2019-11-19 66.572502\n",
"2019-11-20 65.797501\n",
"2019-11-21 65.502502\n",
"2019-11-22 65.445000\n",
"2019-11-25 66.592499\n",
"2019-11-26 66.072502\n",
"2019-11-27 66.959999\n",
"2019-11-29 66.812500\n",
"Name: Close, dtype: float64"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"aapl_proper_index.loc[\"2019-11\"][\"Close\"]"
]
},
{
"cell_type": "markdown",
"id": "1b48da3d",
"metadata": {},
"source": [
"### 3.6 Getting data from dataframe (get multiple column from a dataframe)"
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "a39c7d23",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Open</th>\n",
" <th>Close</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Date</th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2019-11-01</th>\n",
" <td>62.384998</td>\n",
" <td>63.955002</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-04</th>\n",
" <td>64.332497</td>\n",
" <td>64.375000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-05</th>\n",
" <td>64.262497</td>\n",
" <td>64.282501</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-06</th>\n",
" <td>64.192497</td>\n",
" <td>64.309998</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-07</th>\n",
" <td>64.684998</td>\n",
" <td>64.857498</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-08</th>\n",
" <td>64.672501</td>\n",
" <td>65.035004</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-11</th>\n",
" <td>64.574997</td>\n",
" <td>65.550003</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-12</th>\n",
" <td>65.387497</td>\n",
" <td>65.489998</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-13</th>\n",
" <td>65.282501</td>\n",
" <td>66.117500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-14</th>\n",
" <td>65.937500</td>\n",
" <td>65.660004</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-15</th>\n",
" <td>65.919998</td>\n",
" <td>66.440002</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-18</th>\n",
" <td>66.449997</td>\n",
" <td>66.775002</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-19</th>\n",
" <td>66.974998</td>\n",
" <td>66.572502</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-20</th>\n",
" <td>66.385002</td>\n",
" <td>65.797501</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-21</th>\n",
" <td>65.922501</td>\n",
" <td>65.502502</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-22</th>\n",
" <td>65.647499</td>\n",
" <td>65.445000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-25</th>\n",
" <td>65.677498</td>\n",
" <td>66.592499</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-26</th>\n",
" <td>66.735001</td>\n",
" <td>66.072502</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-27</th>\n",
" <td>66.394997</td>\n",
" <td>66.959999</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-29</th>\n",
" <td>66.650002</td>\n",
" <td>66.812500</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Open Close\n",
"Date \n",
"2019-11-01 62.384998 63.955002\n",
"2019-11-04 64.332497 64.375000\n",
"2019-11-05 64.262497 64.282501\n",
"2019-11-06 64.192497 64.309998\n",
"2019-11-07 64.684998 64.857498\n",
"2019-11-08 64.672501 65.035004\n",
"2019-11-11 64.574997 65.550003\n",
"2019-11-12 65.387497 65.489998\n",
"2019-11-13 65.282501 66.117500\n",
"2019-11-14 65.937500 65.660004\n",
"2019-11-15 65.919998 66.440002\n",
"2019-11-18 66.449997 66.775002\n",
"2019-11-19 66.974998 66.572502\n",
"2019-11-20 66.385002 65.797501\n",
"2019-11-21 65.922501 65.502502\n",
"2019-11-22 65.647499 65.445000\n",
"2019-11-25 65.677498 66.592499\n",
"2019-11-26 66.735001 66.072502\n",
"2019-11-27 66.394997 66.959999\n",
"2019-11-29 66.650002 66.812500"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"aapl_proper_index.loc[\"2019-11\"][[\"Open\",\"Close\"]]"
]
},
{
"cell_type": "markdown",
"id": "7fe98d18",
"metadata": {},
"source": [
"### 3.7 Getting data from dataframe (that's not a date/integer)"
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "a3e09fd0",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"sales 10000\n",
"costs 300000\n",
"Name: Central Branch, dtype: int64"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"branch_summary.loc[\"Central Branch\"]"
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "114ba40f",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Central Branch 10000\n",
"TST Branch 2000\n",
"Mongkok Branch 3000\n",
"Name: sales, dtype: int64"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"branch_summary[\"sales\"]"
]
},
{
"cell_type": "markdown",
"id": "15222ed8",
"metadata": {},
"source": [
"### 3.8 Getting data from dataframe (using implicit index)"
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "72c1e9ba",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index(['Central Branch', 'TST Branch', 'Mongkok Branch'], dtype='object')"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"branch_summary.index"
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "e34b9495",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"DatetimeIndex(['2019-10-28', '2019-10-29', '2019-10-30', '2019-10-31',\n",
" '2019-11-01', '2019-11-04', '2019-11-05', '2019-11-06',\n",
" '2019-11-07', '2019-11-08',\n",
" ...\n",
" '2020-10-14', '2020-10-15', '2020-10-16', '2020-10-19',\n",
" '2020-10-20', '2020-10-21', '2020-10-22', '2020-10-23',\n",
" '2020-10-26', '2020-10-27'],\n",
" dtype='datetime64[ns]', name='Date', length=253, freq=None)"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"aapl_proper_index.index"
]
},
{
"cell_type": "code",
"execution_count": 31,
"id": "a32d711d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"RangeIndex(start=0, stop=253, step=1)"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"aapl.index"
]
},
{
"cell_type": "code",
"execution_count": 32,
"id": "0836e844",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Open</th>\n",
" <th>High</th>\n",
" <th>Low</th>\n",
" <th>Close</th>\n",
" <th>Adj Close</th>\n",
" <th>Volume</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Date</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2019-10-28</th>\n",
" <td>61.855000</td>\n",
" <td>62.312500</td>\n",
" <td>61.680000</td>\n",
" <td>62.262501</td>\n",
" <td>61.650810</td>\n",
" <td>96572800</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-10-29</th>\n",
" <td>62.242500</td>\n",
" <td>62.437500</td>\n",
" <td>60.642502</td>\n",
" <td>60.822498</td>\n",
" <td>60.224953</td>\n",
" <td>142839600</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-10-30</th>\n",
" <td>61.189999</td>\n",
" <td>61.325001</td>\n",
" <td>60.302502</td>\n",
" <td>60.814999</td>\n",
" <td>60.217525</td>\n",
" <td>124522000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-10-31</th>\n",
" <td>61.810001</td>\n",
" <td>62.292500</td>\n",
" <td>59.314999</td>\n",
" <td>62.189999</td>\n",
" <td>61.579021</td>\n",
" <td>139162000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-01</th>\n",
" <td>62.384998</td>\n",
" <td>63.982498</td>\n",
" <td>62.290001</td>\n",
" <td>63.955002</td>\n",
" <td>63.326683</td>\n",
" <td>151125200</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-04</th>\n",
" <td>64.332497</td>\n",
" <td>64.462502</td>\n",
" <td>63.845001</td>\n",
" <td>64.375000</td>\n",
" <td>63.742554</td>\n",
" <td>103272000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-05</th>\n",
" <td>64.262497</td>\n",
" <td>64.547501</td>\n",
" <td>64.080002</td>\n",
" <td>64.282501</td>\n",
" <td>63.650970</td>\n",
" <td>79897600</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-06</th>\n",
" <td>64.192497</td>\n",
" <td>64.372498</td>\n",
" <td>63.842499</td>\n",
" <td>64.309998</td>\n",
" <td>63.678192</td>\n",
" <td>75864400</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-07</th>\n",
" <td>64.684998</td>\n",
" <td>65.087502</td>\n",
" <td>64.527496</td>\n",
" <td>64.857498</td>\n",
" <td>64.413116</td>\n",
" <td>94940400</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-08</th>\n",
" <td>64.672501</td>\n",
" <td>65.110001</td>\n",
" <td>64.212502</td>\n",
" <td>65.035004</td>\n",
" <td>64.589409</td>\n",
" <td>69986400</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Open High Low Close Adj Close Volume\n",
"Date \n",
"2019-10-28 61.855000 62.312500 61.680000 62.262501 61.650810 96572800\n",
"2019-10-29 62.242500 62.437500 60.642502 60.822498 60.224953 142839600\n",
"2019-10-30 61.189999 61.325001 60.302502 60.814999 60.217525 124522000\n",
"2019-10-31 61.810001 62.292500 59.314999 62.189999 61.579021 139162000\n",
"2019-11-01 62.384998 63.982498 62.290001 63.955002 63.326683 151125200\n",
"2019-11-04 64.332497 64.462502 63.845001 64.375000 63.742554 103272000\n",
"2019-11-05 64.262497 64.547501 64.080002 64.282501 63.650970 79897600\n",
"2019-11-06 64.192497 64.372498 63.842499 64.309998 63.678192 75864400\n",
"2019-11-07 64.684998 65.087502 64.527496 64.857498 64.413116 94940400\n",
"2019-11-08 64.672501 65.110001 64.212502 65.035004 64.589409 69986400"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"aapl_proper_index.iloc[0:10]"
]
},
{
"cell_type": "markdown",
"id": "d7a78ecf",
"metadata": {},
"source": [
"## 4. Filtering"
]
},
{
"cell_type": "markdown",
"id": "7706e9dd",
"metadata": {},
"source": [
"### 4.1 Single condition"
]
},
{
"cell_type": "code",
"execution_count": 33,
"id": "ba29f518",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Date\n",
"2019-10-28 False\n",
"2019-10-29 False\n",
"2019-10-30 False\n",
"2019-10-31 False\n",
"2019-11-01 False\n",
" ... \n",
"2020-10-21 True\n",
"2020-10-22 True\n",
"2020-10-23 True\n",
"2020-10-26 True\n",
"2020-10-27 True\n",
"Name: Open, Length: 253, dtype: bool"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"aapl_proper_index[\"Open\"] > 100"
]
},
{
"cell_type": "code",
"execution_count": 34,
"id": "22ac3ad1",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Open</th>\n",
" <th>High</th>\n",
" <th>Low</th>\n",
" <th>Close</th>\n",
" <th>Adj Close</th>\n",
" <th>Volume</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Date</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2020-07-31</th>\n",
" <td>102.885002</td>\n",
" <td>106.415001</td>\n",
" <td>100.824997</td>\n",
" <td>106.260002</td>\n",
" <td>106.068756</td>\n",
" <td>374336800</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-03</th>\n",
" <td>108.199997</td>\n",
" <td>111.637497</td>\n",
" <td>107.892502</td>\n",
" <td>108.937500</td>\n",
" <td>108.741440</td>\n",
" <td>308151200</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-04</th>\n",
" <td>109.132500</td>\n",
" <td>110.790001</td>\n",
" <td>108.387497</td>\n",
" <td>109.665001</td>\n",
" <td>109.467628</td>\n",
" <td>173071600</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-05</th>\n",
" <td>109.377502</td>\n",
" <td>110.392502</td>\n",
" <td>108.897499</td>\n",
" <td>110.062500</td>\n",
" <td>109.864410</td>\n",
" <td>121992000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-06</th>\n",
" <td>110.404999</td>\n",
" <td>114.412498</td>\n",
" <td>109.797501</td>\n",
" <td>113.902496</td>\n",
" <td>113.697502</td>\n",
" <td>202428800</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-10-21</th>\n",
" <td>116.669998</td>\n",
" <td>118.709999</td>\n",
" <td>116.449997</td>\n",
" <td>116.870003</td>\n",
" <td>116.870003</td>\n",
" <td>89946000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-10-22</th>\n",
" <td>117.449997</td>\n",
" <td>118.040001</td>\n",
" <td>114.589996</td>\n",
" <td>115.750000</td>\n",
" <td>115.750000</td>\n",
" <td>101988000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-10-23</th>\n",
" <td>116.389999</td>\n",
" <td>116.550003</td>\n",
" <td>114.279999</td>\n",
" <td>115.040001</td>\n",
" <td>115.040001</td>\n",
" <td>82572600</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-10-26</th>\n",
" <td>114.010002</td>\n",
" <td>116.550003</td>\n",
" <td>112.879997</td>\n",
" <td>115.050003</td>\n",
" <td>115.050003</td>\n",
" <td>111850700</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-10-27</th>\n",
" <td>115.489998</td>\n",
" <td>117.279999</td>\n",
" <td>114.540001</td>\n",
" <td>116.599998</td>\n",
" <td>116.599998</td>\n",
" <td>91927700</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>62 rows × 6 columns</p>\n",
"</div>"
],
"text/plain": [
" Open High Low Close Adj Close \\\n",
"Date \n",
"2020-07-31 102.885002 106.415001 100.824997 106.260002 106.068756 \n",
"2020-08-03 108.199997 111.637497 107.892502 108.937500 108.741440 \n",
"2020-08-04 109.132500 110.790001 108.387497 109.665001 109.467628 \n",
"2020-08-05 109.377502 110.392502 108.897499 110.062500 109.864410 \n",
"2020-08-06 110.404999 114.412498 109.797501 113.902496 113.697502 \n",
"... ... ... ... ... ... \n",
"2020-10-21 116.669998 118.709999 116.449997 116.870003 116.870003 \n",
"2020-10-22 117.449997 118.040001 114.589996 115.750000 115.750000 \n",
"2020-10-23 116.389999 116.550003 114.279999 115.040001 115.040001 \n",
"2020-10-26 114.010002 116.550003 112.879997 115.050003 115.050003 \n",
"2020-10-27 115.489998 117.279999 114.540001 116.599998 116.599998 \n",
"\n",
" Volume \n",
"Date \n",
"2020-07-31 374336800 \n",
"2020-08-03 308151200 \n",
"2020-08-04 173071600 \n",
"2020-08-05 121992000 \n",
"2020-08-06 202428800 \n",
"... ... \n",
"2020-10-21 89946000 \n",
"2020-10-22 101988000 \n",
"2020-10-23 82572600 \n",
"2020-10-26 111850700 \n",
"2020-10-27 91927700 \n",
"\n",
"[62 rows x 6 columns]"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"aapl_proper_index[aapl_proper_index[\"Open\"] > 100]"
]
},
{
"cell_type": "markdown",
"id": "d0ff9063",
"metadata": {},
"source": [
"## 4.2 multiple condition"
]
},
{
"cell_type": "code",
"execution_count": 35,
"id": "1c5574e2",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Date\n",
"2019-10-28 False\n",
"2019-10-29 False\n",
"2019-10-30 False\n",
"2019-10-31 False\n",
"2019-11-01 False\n",
" ... \n",
"2020-10-21 False\n",
"2020-10-22 True\n",
"2020-10-23 False\n",
"2020-10-26 True\n",
"2020-10-27 False\n",
"Length: 253, dtype: bool"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(aapl_proper_index[\"Open\"] > 100) & (aapl_proper_index[\"Volume\"] > 100000000)"
]
},
{
"cell_type": "code",
"execution_count": 36,
"id": "6ab72ebf",
"metadata": {},
"outputs": [],
"source": [
"cond = (aapl_proper_index[\"Open\"] > 100) & (aapl_proper_index[\"Volume\"] > 100000000)"
]
},
{
"cell_type": "code",
"execution_count": 37,
"id": "cfe5d612",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Open</th>\n",
" <th>High</th>\n",
" <th>Low</th>\n",
" <th>Close</th>\n",
" <th>Adj Close</th>\n",
" <th>Volume</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Date</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2020-07-31</th>\n",
" <td>102.885002</td>\n",
" <td>106.415001</td>\n",
" <td>100.824997</td>\n",
" <td>106.260002</td>\n",
" <td>106.068756</td>\n",
" <td>374336800</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-03</th>\n",
" <td>108.199997</td>\n",
" <td>111.637497</td>\n",
" <td>107.892502</td>\n",
" <td>108.937500</td>\n",
" <td>108.741440</td>\n",
" <td>308151200</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-04</th>\n",
" <td>109.132500</td>\n",
" <td>110.790001</td>\n",
" <td>108.387497</td>\n",
" <td>109.665001</td>\n",
" <td>109.467628</td>\n",
" <td>173071600</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-05</th>\n",
" <td>109.377502</td>\n",
" <td>110.392502</td>\n",
" <td>108.897499</td>\n",
" <td>110.062500</td>\n",
" <td>109.864410</td>\n",
" <td>121992000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-06</th>\n",
" <td>110.404999</td>\n",
" <td>114.412498</td>\n",
" <td>109.797501</td>\n",
" <td>113.902496</td>\n",
" <td>113.697502</td>\n",
" <td>202428800</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-07</th>\n",
" <td>113.205002</td>\n",
" <td>113.675003</td>\n",
" <td>110.292503</td>\n",
" <td>111.112503</td>\n",
" <td>111.112503</td>\n",
" <td>198045600</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-10</th>\n",
" <td>112.599998</td>\n",
" <td>113.775002</td>\n",
" <td>110.000000</td>\n",
" <td>112.727501</td>\n",
" <td>112.727501</td>\n",
" <td>212403600</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-11</th>\n",
" <td>111.970001</td>\n",
" <td>112.482498</td>\n",
" <td>109.107498</td>\n",
" <td>109.375000</td>\n",
" <td>109.375000</td>\n",
" <td>187902400</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-12</th>\n",
" <td>110.497498</td>\n",
" <td>113.275002</td>\n",
" <td>110.297501</td>\n",
" <td>113.010002</td>\n",
" <td>113.010002</td>\n",
" <td>165944800</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-13</th>\n",
" <td>114.430000</td>\n",
" <td>116.042503</td>\n",
" <td>113.927498</td>\n",
" <td>115.010002</td>\n",
" <td>115.010002</td>\n",
" <td>210082000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-14</th>\n",
" <td>114.830002</td>\n",
" <td>115.000000</td>\n",
" <td>113.044998</td>\n",
" <td>114.907501</td>\n",
" <td>114.907501</td>\n",
" <td>165565200</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-17</th>\n",
" <td>116.062500</td>\n",
" <td>116.087502</td>\n",
" <td>113.962502</td>\n",
" <td>114.607498</td>\n",
" <td>114.607498</td>\n",
" <td>119561600</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-18</th>\n",
" <td>114.352501</td>\n",
" <td>116.000000</td>\n",
" <td>114.007500</td>\n",
" <td>115.562500</td>\n",
" <td>115.562500</td>\n",
" <td>105633600</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-19</th>\n",
" <td>115.982498</td>\n",
" <td>117.162498</td>\n",
" <td>115.610001</td>\n",
" <td>115.707497</td>\n",
" <td>115.707497</td>\n",
" <td>145538000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-20</th>\n",
" <td>115.750000</td>\n",
" <td>118.392502</td>\n",
" <td>115.732498</td>\n",
" <td>118.275002</td>\n",
" <td>118.275002</td>\n",
" <td>126907200</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-21</th>\n",
" <td>119.262497</td>\n",
" <td>124.867500</td>\n",
" <td>119.250000</td>\n",
" <td>124.370003</td>\n",
" <td>124.370003</td>\n",
" <td>338054800</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-24</th>\n",
" <td>128.697495</td>\n",
" <td>128.785004</td>\n",
" <td>123.937500</td>\n",
" <td>125.857498</td>\n",
" <td>125.857498</td>\n",
" <td>345937600</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-25</th>\n",
" <td>124.697502</td>\n",
" <td>125.180000</td>\n",
" <td>123.052498</td>\n",
" <td>124.824997</td>\n",
" <td>124.824997</td>\n",
" <td>211495600</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-26</th>\n",
" <td>126.180000</td>\n",
" <td>126.992500</td>\n",
" <td>125.082497</td>\n",
" <td>126.522499</td>\n",
" <td>126.522499</td>\n",
" <td>163022400</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-27</th>\n",
" <td>127.142502</td>\n",
" <td>127.485001</td>\n",
" <td>123.832497</td>\n",
" <td>125.010002</td>\n",
" <td>125.010002</td>\n",
" <td>155552400</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-28</th>\n",
" <td>126.012497</td>\n",
" <td>126.442497</td>\n",
" <td>124.577499</td>\n",
" <td>124.807503</td>\n",
" <td>124.807503</td>\n",
" <td>187630000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-31</th>\n",
" <td>127.580002</td>\n",
" <td>131.000000</td>\n",
" <td>126.000000</td>\n",
" <td>129.039993</td>\n",
" <td>129.039993</td>\n",
" <td>225702700</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-09-01</th>\n",
" <td>132.759995</td>\n",
" <td>134.800003</td>\n",
" <td>130.529999</td>\n",
" <td>134.179993</td>\n",
" <td>134.179993</td>\n",
" <td>152470100</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-09-02</th>\n",
" <td>137.589996</td>\n",
" <td>137.979996</td>\n",
" <td>127.000000</td>\n",
" <td>131.399994</td>\n",
" <td>131.399994</td>\n",
" <td>200119000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-09-03</th>\n",
" <td>126.910004</td>\n",
" <td>128.839996</td>\n",
" <td>120.500000</td>\n",
" <td>120.879997</td>\n",
" <td>120.879997</td>\n",
" <td>257599600</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-09-04</th>\n",
" <td>120.070000</td>\n",
" <td>123.699997</td>\n",
" <td>110.889999</td>\n",
" <td>120.959999</td>\n",
" <td>120.959999</td>\n",
" <td>332607200</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-09-08</th>\n",
" <td>113.949997</td>\n",
" <td>118.989998</td>\n",
" <td>112.680000</td>\n",
" <td>112.820000</td>\n",
" <td>112.820000</td>\n",
" <td>231366600</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-09-09</th>\n",
" <td>117.260002</td>\n",
" <td>119.139999</td>\n",
" <td>115.260002</td>\n",
" <td>117.320000</td>\n",
" <td>117.320000</td>\n",
" <td>176940500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-09-10</th>\n",
" <td>120.360001</td>\n",
" <td>120.500000</td>\n",
" <td>112.500000</td>\n",
" <td>113.489998</td>\n",
" <td>113.489998</td>\n",
" <td>182274400</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-09-11</th>\n",
" <td>114.570000</td>\n",
" <td>115.230003</td>\n",
" <td>110.000000</td>\n",
" <td>112.000000</td>\n",
" <td>112.000000</td>\n",
" <td>180860300</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-09-14</th>\n",
" <td>114.720001</td>\n",
" <td>115.930000</td>\n",
" <td>112.800003</td>\n",
" <td>115.360001</td>\n",
" <td>115.360001</td>\n",
" <td>140150100</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-09-15</th>\n",
" <td>118.330002</td>\n",
" <td>118.830002</td>\n",
" <td>113.610001</td>\n",
" <td>115.540001</td>\n",
" <td>115.540001</td>\n",
" <td>184642000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-09-16</th>\n",
" <td>115.230003</td>\n",
" <td>116.000000</td>\n",
" <td>112.040001</td>\n",
" <td>112.129997</td>\n",
" <td>112.129997</td>\n",
" <td>154679000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-09-17</th>\n",
" <td>109.720001</td>\n",
" <td>112.199997</td>\n",
" <td>108.709999</td>\n",
" <td>110.339996</td>\n",
" <td>110.339996</td>\n",
" <td>178011000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-09-18</th>\n",
" <td>110.400002</td>\n",
" <td>110.879997</td>\n",
" <td>106.089996</td>\n",
" <td>106.839996</td>\n",
" <td>106.839996</td>\n",
" <td>287104900</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-09-21</th>\n",
" <td>104.540001</td>\n",
" <td>110.190002</td>\n",
" <td>103.099998</td>\n",
" <td>110.080002</td>\n",
" <td>110.080002</td>\n",
" <td>195713800</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-09-22</th>\n",
" <td>112.680000</td>\n",
" <td>112.860001</td>\n",
" <td>109.160004</td>\n",
" <td>111.809998</td>\n",
" <td>111.809998</td>\n",
" <td>183055400</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-09-23</th>\n",
" <td>111.620003</td>\n",
" <td>112.110001</td>\n",
" <td>106.769997</td>\n",
" <td>107.120003</td>\n",
" <td>107.120003</td>\n",
" <td>150718700</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-09-24</th>\n",
" <td>105.169998</td>\n",
" <td>110.250000</td>\n",
" <td>105.000000</td>\n",
" <td>108.220001</td>\n",
" <td>108.220001</td>\n",
" <td>167743300</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-09-25</th>\n",
" <td>108.430000</td>\n",
" <td>112.440002</td>\n",
" <td>107.669998</td>\n",
" <td>112.279999</td>\n",
" <td>112.279999</td>\n",
" <td>149981400</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-09-28</th>\n",
" <td>115.010002</td>\n",
" <td>115.320000</td>\n",
" <td>112.779999</td>\n",
" <td>114.959999</td>\n",
" <td>114.959999</td>\n",
" <td>137672400</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-09-30</th>\n",
" <td>113.790001</td>\n",
" <td>117.260002</td>\n",
" <td>113.620003</td>\n",
" <td>115.809998</td>\n",
" <td>115.809998</td>\n",
" <td>142675200</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-10-01</th>\n",
" <td>117.639999</td>\n",
" <td>117.720001</td>\n",
" <td>115.830002</td>\n",
" <td>116.790001</td>\n",
" <td>116.790001</td>\n",
" <td>116120400</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-10-02</th>\n",
" <td>112.889999</td>\n",
" <td>115.370003</td>\n",
" <td>112.220001</td>\n",
" <td>113.019997</td>\n",
" <td>113.019997</td>\n",
" <td>144712000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-10-05</th>\n",
" <td>113.910004</td>\n",
" <td>116.650002</td>\n",
" <td>113.550003</td>\n",
" <td>116.500000</td>\n",
" <td>116.500000</td>\n",
" <td>106243800</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-10-06</th>\n",
" <td>115.699997</td>\n",
" <td>116.120003</td>\n",
" <td>112.250000</td>\n",
" <td>113.160004</td>\n",
" <td>113.160004</td>\n",
" <td>161498200</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-10-09</th>\n",
" <td>115.279999</td>\n",
" <td>117.000000</td>\n",
" <td>114.919998</td>\n",
" <td>116.970001</td>\n",
" <td>116.970001</td>\n",
" <td>100506900</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-10-12</th>\n",
" <td>120.059998</td>\n",
" <td>125.180000</td>\n",
" <td>119.279999</td>\n",
" <td>124.400002</td>\n",
" <td>124.400002</td>\n",
" <td>240226800</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-10-13</th>\n",
" <td>125.269997</td>\n",
" <td>125.389999</td>\n",
" <td>119.650002</td>\n",
" <td>121.099998</td>\n",
" <td>121.099998</td>\n",
" <td>262330500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-10-14</th>\n",
" <td>121.000000</td>\n",
" <td>123.029999</td>\n",
" <td>119.620003</td>\n",
" <td>121.190002</td>\n",
" <td>121.190002</td>\n",
" <td>151062300</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-10-15</th>\n",
" <td>118.720001</td>\n",
" <td>121.199997</td>\n",
" <td>118.150002</td>\n",
" <td>120.709999</td>\n",
" <td>120.709999</td>\n",
" <td>112559200</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-10-16</th>\n",
" <td>121.279999</td>\n",
" <td>121.550003</td>\n",
" <td>118.809998</td>\n",
" <td>119.019997</td>\n",
" <td>119.019997</td>\n",
" <td>115393800</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-10-19</th>\n",
" <td>119.959999</td>\n",
" <td>120.419998</td>\n",
" <td>115.660004</td>\n",
" <td>115.980003</td>\n",
" <td>115.980003</td>\n",
" <td>120639300</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-10-20</th>\n",
" <td>116.199997</td>\n",
" <td>118.980003</td>\n",
" <td>115.629997</td>\n",
" <td>117.510002</td>\n",
" <td>117.510002</td>\n",
" <td>124423700</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-10-22</th>\n",
" <td>117.449997</td>\n",
" <td>118.040001</td>\n",
" <td>114.589996</td>\n",
" <td>115.750000</td>\n",
" <td>115.750000</td>\n",
" <td>101988000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-10-26</th>\n",
" <td>114.010002</td>\n",
" <td>116.550003</td>\n",
" <td>112.879997</td>\n",
" <td>115.050003</td>\n",
" <td>115.050003</td>\n",
" <td>111850700</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Open High Low Close Adj Close \\\n",
"Date \n",
"2020-07-31 102.885002 106.415001 100.824997 106.260002 106.068756 \n",
"2020-08-03 108.199997 111.637497 107.892502 108.937500 108.741440 \n",
"2020-08-04 109.132500 110.790001 108.387497 109.665001 109.467628 \n",
"2020-08-05 109.377502 110.392502 108.897499 110.062500 109.864410 \n",
"2020-08-06 110.404999 114.412498 109.797501 113.902496 113.697502 \n",
"2020-08-07 113.205002 113.675003 110.292503 111.112503 111.112503 \n",
"2020-08-10 112.599998 113.775002 110.000000 112.727501 112.727501 \n",
"2020-08-11 111.970001 112.482498 109.107498 109.375000 109.375000 \n",
"2020-08-12 110.497498 113.275002 110.297501 113.010002 113.010002 \n",
"2020-08-13 114.430000 116.042503 113.927498 115.010002 115.010002 \n",
"2020-08-14 114.830002 115.000000 113.044998 114.907501 114.907501 \n",
"2020-08-17 116.062500 116.087502 113.962502 114.607498 114.607498 \n",
"2020-08-18 114.352501 116.000000 114.007500 115.562500 115.562500 \n",
"2020-08-19 115.982498 117.162498 115.610001 115.707497 115.707497 \n",
"2020-08-20 115.750000 118.392502 115.732498 118.275002 118.275002 \n",
"2020-08-21 119.262497 124.867500 119.250000 124.370003 124.370003 \n",
"2020-08-24 128.697495 128.785004 123.937500 125.857498 125.857498 \n",
"2020-08-25 124.697502 125.180000 123.052498 124.824997 124.824997 \n",
"2020-08-26 126.180000 126.992500 125.082497 126.522499 126.522499 \n",
"2020-08-27 127.142502 127.485001 123.832497 125.010002 125.010002 \n",
"2020-08-28 126.012497 126.442497 124.577499 124.807503 124.807503 \n",
"2020-08-31 127.580002 131.000000 126.000000 129.039993 129.039993 \n",
"2020-09-01 132.759995 134.800003 130.529999 134.179993 134.179993 \n",
"2020-09-02 137.589996 137.979996 127.000000 131.399994 131.399994 \n",
"2020-09-03 126.910004 128.839996 120.500000 120.879997 120.879997 \n",
"2020-09-04 120.070000 123.699997 110.889999 120.959999 120.959999 \n",
"2020-09-08 113.949997 118.989998 112.680000 112.820000 112.820000 \n",
"2020-09-09 117.260002 119.139999 115.260002 117.320000 117.320000 \n",
"2020-09-10 120.360001 120.500000 112.500000 113.489998 113.489998 \n",
"2020-09-11 114.570000 115.230003 110.000000 112.000000 112.000000 \n",
"2020-09-14 114.720001 115.930000 112.800003 115.360001 115.360001 \n",
"2020-09-15 118.330002 118.830002 113.610001 115.540001 115.540001 \n",
"2020-09-16 115.230003 116.000000 112.040001 112.129997 112.129997 \n",
"2020-09-17 109.720001 112.199997 108.709999 110.339996 110.339996 \n",
"2020-09-18 110.400002 110.879997 106.089996 106.839996 106.839996 \n",
"2020-09-21 104.540001 110.190002 103.099998 110.080002 110.080002 \n",
"2020-09-22 112.680000 112.860001 109.160004 111.809998 111.809998 \n",
"2020-09-23 111.620003 112.110001 106.769997 107.120003 107.120003 \n",
"2020-09-24 105.169998 110.250000 105.000000 108.220001 108.220001 \n",
"2020-09-25 108.430000 112.440002 107.669998 112.279999 112.279999 \n",
"2020-09-28 115.010002 115.320000 112.779999 114.959999 114.959999 \n",
"2020-09-30 113.790001 117.260002 113.620003 115.809998 115.809998 \n",
"2020-10-01 117.639999 117.720001 115.830002 116.790001 116.790001 \n",
"2020-10-02 112.889999 115.370003 112.220001 113.019997 113.019997 \n",
"2020-10-05 113.910004 116.650002 113.550003 116.500000 116.500000 \n",
"2020-10-06 115.699997 116.120003 112.250000 113.160004 113.160004 \n",
"2020-10-09 115.279999 117.000000 114.919998 116.970001 116.970001 \n",
"2020-10-12 120.059998 125.180000 119.279999 124.400002 124.400002 \n",
"2020-10-13 125.269997 125.389999 119.650002 121.099998 121.099998 \n",
"2020-10-14 121.000000 123.029999 119.620003 121.190002 121.190002 \n",
"2020-10-15 118.720001 121.199997 118.150002 120.709999 120.709999 \n",
"2020-10-16 121.279999 121.550003 118.809998 119.019997 119.019997 \n",
"2020-10-19 119.959999 120.419998 115.660004 115.980003 115.980003 \n",
"2020-10-20 116.199997 118.980003 115.629997 117.510002 117.510002 \n",
"2020-10-22 117.449997 118.040001 114.589996 115.750000 115.750000 \n",
"2020-10-26 114.010002 116.550003 112.879997 115.050003 115.050003 \n",
"\n",
" Volume \n",
"Date \n",
"2020-07-31 374336800 \n",
"2020-08-03 308151200 \n",
"2020-08-04 173071600 \n",
"2020-08-05 121992000 \n",
"2020-08-06 202428800 \n",
"2020-08-07 198045600 \n",
"2020-08-10 212403600 \n",
"2020-08-11 187902400 \n",
"2020-08-12 165944800 \n",
"2020-08-13 210082000 \n",
"2020-08-14 165565200 \n",
"2020-08-17 119561600 \n",
"2020-08-18 105633600 \n",
"2020-08-19 145538000 \n",
"2020-08-20 126907200 \n",
"2020-08-21 338054800 \n",
"2020-08-24 345937600 \n",
"2020-08-25 211495600 \n",
"2020-08-26 163022400 \n",
"2020-08-27 155552400 \n",
"2020-08-28 187630000 \n",
"2020-08-31 225702700 \n",
"2020-09-01 152470100 \n",
"2020-09-02 200119000 \n",
"2020-09-03 257599600 \n",
"2020-09-04 332607200 \n",
"2020-09-08 231366600 \n",
"2020-09-09 176940500 \n",
"2020-09-10 182274400 \n",
"2020-09-11 180860300 \n",
"2020-09-14 140150100 \n",
"2020-09-15 184642000 \n",
"2020-09-16 154679000 \n",
"2020-09-17 178011000 \n",
"2020-09-18 287104900 \n",
"2020-09-21 195713800 \n",
"2020-09-22 183055400 \n",
"2020-09-23 150718700 \n",
"2020-09-24 167743300 \n",
"2020-09-25 149981400 \n",
"2020-09-28 137672400 \n",
"2020-09-30 142675200 \n",
"2020-10-01 116120400 \n",
"2020-10-02 144712000 \n",
"2020-10-05 106243800 \n",
"2020-10-06 161498200 \n",
"2020-10-09 100506900 \n",
"2020-10-12 240226800 \n",
"2020-10-13 262330500 \n",
"2020-10-14 151062300 \n",
"2020-10-15 112559200 \n",
"2020-10-16 115393800 \n",
"2020-10-19 120639300 \n",
"2020-10-20 124423700 \n",
"2020-10-22 101988000 \n",
"2020-10-26 111850700 "
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"aapl_proper_index[cond]"
]
},
{
"cell_type": "markdown",
"id": "649b6f86",
"metadata": {},
"source": [
"#### Side notes"
]
},
{
"cell_type": "markdown",
"id": "599252b7",
"metadata": {},
"source": [
"Showing the top "
]
},
{
"cell_type": "code",
"execution_count": 38,
"id": "3b0d0293",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Open</th>\n",
" <th>High</th>\n",
" <th>Low</th>\n",
" <th>Close</th>\n",
" <th>Adj Close</th>\n",
" <th>Volume</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Date</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2020-07-31</th>\n",
" <td>102.885002</td>\n",
" <td>106.415001</td>\n",
" <td>100.824997</td>\n",
" <td>106.260002</td>\n",
" <td>106.068756</td>\n",
" <td>374336800</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-03</th>\n",
" <td>108.199997</td>\n",
" <td>111.637497</td>\n",
" <td>107.892502</td>\n",
" <td>108.937500</td>\n",
" <td>108.741440</td>\n",
" <td>308151200</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-04</th>\n",
" <td>109.132500</td>\n",
" <td>110.790001</td>\n",
" <td>108.387497</td>\n",
" <td>109.665001</td>\n",
" <td>109.467628</td>\n",
" <td>173071600</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-05</th>\n",
" <td>109.377502</td>\n",
" <td>110.392502</td>\n",
" <td>108.897499</td>\n",
" <td>110.062500</td>\n",
" <td>109.864410</td>\n",
" <td>121992000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-06</th>\n",
" <td>110.404999</td>\n",
" <td>114.412498</td>\n",
" <td>109.797501</td>\n",
" <td>113.902496</td>\n",
" <td>113.697502</td>\n",
" <td>202428800</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-07</th>\n",
" <td>113.205002</td>\n",
" <td>113.675003</td>\n",
" <td>110.292503</td>\n",
" <td>111.112503</td>\n",
" <td>111.112503</td>\n",
" <td>198045600</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-10</th>\n",
" <td>112.599998</td>\n",
" <td>113.775002</td>\n",
" <td>110.000000</td>\n",
" <td>112.727501</td>\n",
" <td>112.727501</td>\n",
" <td>212403600</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-11</th>\n",
" <td>111.970001</td>\n",
" <td>112.482498</td>\n",
" <td>109.107498</td>\n",
" <td>109.375000</td>\n",
" <td>109.375000</td>\n",
" <td>187902400</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-12</th>\n",
" <td>110.497498</td>\n",
" <td>113.275002</td>\n",
" <td>110.297501</td>\n",
" <td>113.010002</td>\n",
" <td>113.010002</td>\n",
" <td>165944800</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-13</th>\n",
" <td>114.430000</td>\n",
" <td>116.042503</td>\n",
" <td>113.927498</td>\n",
" <td>115.010002</td>\n",
" <td>115.010002</td>\n",
" <td>210082000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-14</th>\n",
" <td>114.830002</td>\n",
" <td>115.000000</td>\n",
" <td>113.044998</td>\n",
" <td>114.907501</td>\n",
" <td>114.907501</td>\n",
" <td>165565200</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-17</th>\n",
" <td>116.062500</td>\n",
" <td>116.087502</td>\n",
" <td>113.962502</td>\n",
" <td>114.607498</td>\n",
" <td>114.607498</td>\n",
" <td>119561600</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-18</th>\n",
" <td>114.352501</td>\n",
" <td>116.000000</td>\n",
" <td>114.007500</td>\n",
" <td>115.562500</td>\n",
" <td>115.562500</td>\n",
" <td>105633600</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-19</th>\n",
" <td>115.982498</td>\n",
" <td>117.162498</td>\n",
" <td>115.610001</td>\n",
" <td>115.707497</td>\n",
" <td>115.707497</td>\n",
" <td>145538000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-20</th>\n",
" <td>115.750000</td>\n",
" <td>118.392502</td>\n",
" <td>115.732498</td>\n",
" <td>118.275002</td>\n",
" <td>118.275002</td>\n",
" <td>126907200</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-21</th>\n",
" <td>119.262497</td>\n",
" <td>124.867500</td>\n",
" <td>119.250000</td>\n",
" <td>124.370003</td>\n",
" <td>124.370003</td>\n",
" <td>338054800</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-24</th>\n",
" <td>128.697495</td>\n",
" <td>128.785004</td>\n",
" <td>123.937500</td>\n",
" <td>125.857498</td>\n",
" <td>125.857498</td>\n",
" <td>345937600</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-25</th>\n",
" <td>124.697502</td>\n",
" <td>125.180000</td>\n",
" <td>123.052498</td>\n",
" <td>124.824997</td>\n",
" <td>124.824997</td>\n",
" <td>211495600</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-26</th>\n",
" <td>126.180000</td>\n",
" <td>126.992500</td>\n",
" <td>125.082497</td>\n",
" <td>126.522499</td>\n",
" <td>126.522499</td>\n",
" <td>163022400</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-27</th>\n",
" <td>127.142502</td>\n",
" <td>127.485001</td>\n",
" <td>123.832497</td>\n",
" <td>125.010002</td>\n",
" <td>125.010002</td>\n",
" <td>155552400</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Open High Low Close Adj Close \\\n",
"Date \n",
"2020-07-31 102.885002 106.415001 100.824997 106.260002 106.068756 \n",
"2020-08-03 108.199997 111.637497 107.892502 108.937500 108.741440 \n",
"2020-08-04 109.132500 110.790001 108.387497 109.665001 109.467628 \n",
"2020-08-05 109.377502 110.392502 108.897499 110.062500 109.864410 \n",
"2020-08-06 110.404999 114.412498 109.797501 113.902496 113.697502 \n",
"2020-08-07 113.205002 113.675003 110.292503 111.112503 111.112503 \n",
"2020-08-10 112.599998 113.775002 110.000000 112.727501 112.727501 \n",
"2020-08-11 111.970001 112.482498 109.107498 109.375000 109.375000 \n",
"2020-08-12 110.497498 113.275002 110.297501 113.010002 113.010002 \n",
"2020-08-13 114.430000 116.042503 113.927498 115.010002 115.010002 \n",
"2020-08-14 114.830002 115.000000 113.044998 114.907501 114.907501 \n",
"2020-08-17 116.062500 116.087502 113.962502 114.607498 114.607498 \n",
"2020-08-18 114.352501 116.000000 114.007500 115.562500 115.562500 \n",
"2020-08-19 115.982498 117.162498 115.610001 115.707497 115.707497 \n",
"2020-08-20 115.750000 118.392502 115.732498 118.275002 118.275002 \n",
"2020-08-21 119.262497 124.867500 119.250000 124.370003 124.370003 \n",
"2020-08-24 128.697495 128.785004 123.937500 125.857498 125.857498 \n",
"2020-08-25 124.697502 125.180000 123.052498 124.824997 124.824997 \n",
"2020-08-26 126.180000 126.992500 125.082497 126.522499 126.522499 \n",
"2020-08-27 127.142502 127.485001 123.832497 125.010002 125.010002 \n",
"\n",
" Volume \n",
"Date \n",
"2020-07-31 374336800 \n",
"2020-08-03 308151200 \n",
"2020-08-04 173071600 \n",
"2020-08-05 121992000 \n",
"2020-08-06 202428800 \n",
"2020-08-07 198045600 \n",
"2020-08-10 212403600 \n",
"2020-08-11 187902400 \n",
"2020-08-12 165944800 \n",
"2020-08-13 210082000 \n",
"2020-08-14 165565200 \n",
"2020-08-17 119561600 \n",
"2020-08-18 105633600 \n",
"2020-08-19 145538000 \n",
"2020-08-20 126907200 \n",
"2020-08-21 338054800 \n",
"2020-08-24 345937600 \n",
"2020-08-25 211495600 \n",
"2020-08-26 163022400 \n",
"2020-08-27 155552400 "
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"aapl_proper_index[cond].head(20)"
]
},
{
"cell_type": "code",
"execution_count": 39,
"id": "d4bd5fc0",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Open</th>\n",
" <th>High</th>\n",
" <th>Low</th>\n",
" <th>Close</th>\n",
" <th>Adj Close</th>\n",
" <th>Volume</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Date</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2020-10-16</th>\n",
" <td>121.279999</td>\n",
" <td>121.550003</td>\n",
" <td>118.809998</td>\n",
" <td>119.019997</td>\n",
" <td>119.019997</td>\n",
" <td>115393800</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-10-19</th>\n",
" <td>119.959999</td>\n",
" <td>120.419998</td>\n",
" <td>115.660004</td>\n",
" <td>115.980003</td>\n",
" <td>115.980003</td>\n",
" <td>120639300</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-10-20</th>\n",
" <td>116.199997</td>\n",
" <td>118.980003</td>\n",
" <td>115.629997</td>\n",
" <td>117.510002</td>\n",
" <td>117.510002</td>\n",
" <td>124423700</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-10-22</th>\n",
" <td>117.449997</td>\n",
" <td>118.040001</td>\n",
" <td>114.589996</td>\n",
" <td>115.750000</td>\n",
" <td>115.750000</td>\n",
" <td>101988000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-10-26</th>\n",
" <td>114.010002</td>\n",
" <td>116.550003</td>\n",
" <td>112.879997</td>\n",
" <td>115.050003</td>\n",
" <td>115.050003</td>\n",
" <td>111850700</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Open High Low Close Adj Close \\\n",
"Date \n",
"2020-10-16 121.279999 121.550003 118.809998 119.019997 119.019997 \n",
"2020-10-19 119.959999 120.419998 115.660004 115.980003 115.980003 \n",
"2020-10-20 116.199997 118.980003 115.629997 117.510002 117.510002 \n",
"2020-10-22 117.449997 118.040001 114.589996 115.750000 115.750000 \n",
"2020-10-26 114.010002 116.550003 112.879997 115.050003 115.050003 \n",
"\n",
" Volume \n",
"Date \n",
"2020-10-16 115393800 \n",
"2020-10-19 120639300 \n",
"2020-10-20 124423700 \n",
"2020-10-22 101988000 \n",
"2020-10-26 111850700 "
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"aapl_proper_index[cond].tail(5)"
]
},
{
"cell_type": "markdown",
"id": "e7fcccd2",
"metadata": {},
"source": [
"## 4.3 query"
]
},
{
"cell_type": "code",
"execution_count": 40,
"id": "f866b899",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Open</th>\n",
" <th>High</th>\n",
" <th>Low</th>\n",
" <th>Close</th>\n",
" <th>Adj Close</th>\n",
" <th>Volume</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Date</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2020-07-31</th>\n",
" <td>102.885002</td>\n",
" <td>106.415001</td>\n",
" <td>100.824997</td>\n",
" <td>106.260002</td>\n",
" <td>106.068756</td>\n",
" <td>374336800</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-03</th>\n",
" <td>108.199997</td>\n",
" <td>111.637497</td>\n",
" <td>107.892502</td>\n",
" <td>108.937500</td>\n",
" <td>108.741440</td>\n",
" <td>308151200</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-04</th>\n",
" <td>109.132500</td>\n",
" <td>110.790001</td>\n",
" <td>108.387497</td>\n",
" <td>109.665001</td>\n",
" <td>109.467628</td>\n",
" <td>173071600</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-05</th>\n",
" <td>109.377502</td>\n",
" <td>110.392502</td>\n",
" <td>108.897499</td>\n",
" <td>110.062500</td>\n",
" <td>109.864410</td>\n",
" <td>121992000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-06</th>\n",
" <td>110.404999</td>\n",
" <td>114.412498</td>\n",
" <td>109.797501</td>\n",
" <td>113.902496</td>\n",
" <td>113.697502</td>\n",
" <td>202428800</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-07</th>\n",
" <td>113.205002</td>\n",
" <td>113.675003</td>\n",
" <td>110.292503</td>\n",
" <td>111.112503</td>\n",
" <td>111.112503</td>\n",
" <td>198045600</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-10</th>\n",
" <td>112.599998</td>\n",
" <td>113.775002</td>\n",
" <td>110.000000</td>\n",
" <td>112.727501</td>\n",
" <td>112.727501</td>\n",
" <td>212403600</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-11</th>\n",
" <td>111.970001</td>\n",
" <td>112.482498</td>\n",
" <td>109.107498</td>\n",
" <td>109.375000</td>\n",
" <td>109.375000</td>\n",
" <td>187902400</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-12</th>\n",
" <td>110.497498</td>\n",
" <td>113.275002</td>\n",
" <td>110.297501</td>\n",
" <td>113.010002</td>\n",
" <td>113.010002</td>\n",
" <td>165944800</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-08-13</th>\n",
" <td>114.430000</td>\n",
" <td>116.042503</td>\n",
" <td>113.927498</td>\n",
" <td>115.010002</td>\n",
" <td>115.010002</td>\n",
" <td>210082000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Open High Low Close Adj Close \\\n",
"Date \n",
"2020-07-31 102.885002 106.415001 100.824997 106.260002 106.068756 \n",
"2020-08-03 108.199997 111.637497 107.892502 108.937500 108.741440 \n",
"2020-08-04 109.132500 110.790001 108.387497 109.665001 109.467628 \n",
"2020-08-05 109.377502 110.392502 108.897499 110.062500 109.864410 \n",
"2020-08-06 110.404999 114.412498 109.797501 113.902496 113.697502 \n",
"2020-08-07 113.205002 113.675003 110.292503 111.112503 111.112503 \n",
"2020-08-10 112.599998 113.775002 110.000000 112.727501 112.727501 \n",
"2020-08-11 111.970001 112.482498 109.107498 109.375000 109.375000 \n",
"2020-08-12 110.497498 113.275002 110.297501 113.010002 113.010002 \n",
"2020-08-13 114.430000 116.042503 113.927498 115.010002 115.010002 \n",
"\n",
" Volume \n",
"Date \n",
"2020-07-31 374336800 \n",
"2020-08-03 308151200 \n",
"2020-08-04 173071600 \n",
"2020-08-05 121992000 \n",
"2020-08-06 202428800 \n",
"2020-08-07 198045600 \n",
"2020-08-10 212403600 \n",
"2020-08-11 187902400 \n",
"2020-08-12 165944800 \n",
"2020-08-13 210082000 "
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"aapl_proper_index[cond].query(\"Open > 100 and Volume > 110000000\").head(10)"
]
},
{
"cell_type": "markdown",
"id": "932bccc7",
"metadata": {},
"source": [
"# 5. New columns"
]
},
{
"cell_type": "markdown",
"id": "023d022d",
"metadata": {},
"source": [
"## 5.1 Density Example"
]
},
{
"cell_type": "code",
"execution_count": 41,
"id": "83ebf03c",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>population</th>\n",
" <th>area</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>California</th>\n",
" <td>38332521</td>\n",
" <td>423967</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Texas</th>\n",
" <td>26448193</td>\n",
" <td>695662</td>\n",
" </tr>\n",
" <tr>\n",
" <th>New York</th>\n",
" <td>19651127</td>\n",
" <td>141297</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Florida</th>\n",
" <td>19552860</td>\n",
" <td>170312</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Illinois</th>\n",
" <td>12882135</td>\n",
" <td>149995</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" population area\n",
"California 38332521 423967\n",
"Texas 26448193 695662\n",
"New York 19651127 141297\n",
"Florida 19552860 170312\n",
"Illinois 12882135 149995"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"\n",
"population = pd.Series({'California': 38332521,\n",
" 'Texas': 26448193,\n",
" 'New York': 19651127,\n",
" 'Florida': 19552860,\n",
" 'Illinois': 12882135}\n",
")\n",
"\n",
"area = pd.Series({'California': 423967, \n",
" 'Texas': 695662, \n",
" 'New York': 141297,\n",
" 'Florida': 170312, \n",
" 'Illinois': 149995})\n",
"\n",
"states = pd.DataFrame( {'population': population,'area': area} )\n",
"states"
]
},
{
"cell_type": "code",
"execution_count": 42,
"id": "278f4672",
"metadata": {},
"outputs": [],
"source": [
"states[\"density\"] = states[\"population\"] / states[\"area\"]"
]
},
{
"cell_type": "code",
"execution_count": 43,
"id": "005b3ac5",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>population</th>\n",
" <th>area</th>\n",
" <th>density</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>California</th>\n",
" <td>38332521</td>\n",
" <td>423967</td>\n",
" <td>90.413926</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Texas</th>\n",
" <td>26448193</td>\n",
" <td>695662</td>\n",
" <td>38.018740</td>\n",
" </tr>\n",
" <tr>\n",
" <th>New York</th>\n",
" <td>19651127</td>\n",
" <td>141297</td>\n",
" <td>139.076746</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Florida</th>\n",
" <td>19552860</td>\n",
" <td>170312</td>\n",
" <td>114.806121</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Illinois</th>\n",
" <td>12882135</td>\n",
" <td>149995</td>\n",
" <td>85.883763</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" population area density\n",
"California 38332521 423967 90.413926\n",
"Texas 26448193 695662 38.018740\n",
"New York 19651127 141297 139.076746\n",
"Florida 19552860 170312 114.806121\n",
"Illinois 12882135 149995 85.883763"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"states"
]
},
{
"cell_type": "markdown",
"id": "8526db19",
"metadata": {},
"source": [
"## 5.2 Stocks example"
]
},
{
"cell_type": "code",
"execution_count": 44,
"id": "59008d49",
"metadata": {},
"outputs": [],
"source": [
"aapl_proper_index[\"Percent Changes\"] = aapl_proper_index[\"Close\"].pct_change()"
]
},
{
"cell_type": "code",
"execution_count": 45,
"id": "3115a2c6",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Open</th>\n",
" <th>High</th>\n",
" <th>Low</th>\n",
" <th>Close</th>\n",
" <th>Adj Close</th>\n",
" <th>Volume</th>\n",
" <th>Percent Changes</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Date</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2019-10-28</th>\n",
" <td>61.855000</td>\n",
" <td>62.312500</td>\n",
" <td>61.680000</td>\n",
" <td>62.262501</td>\n",
" <td>61.650810</td>\n",
" <td>96572800</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-10-29</th>\n",
" <td>62.242500</td>\n",
" <td>62.437500</td>\n",
" <td>60.642502</td>\n",
" <td>60.822498</td>\n",
" <td>60.224953</td>\n",
" <td>142839600</td>\n",
" <td>-0.023128</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-10-30</th>\n",
" <td>61.189999</td>\n",
" <td>61.325001</td>\n",
" <td>60.302502</td>\n",
" <td>60.814999</td>\n",
" <td>60.217525</td>\n",
" <td>124522000</td>\n",
" <td>-0.000123</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-10-31</th>\n",
" <td>61.810001</td>\n",
" <td>62.292500</td>\n",
" <td>59.314999</td>\n",
" <td>62.189999</td>\n",
" <td>61.579021</td>\n",
" <td>139162000</td>\n",
" <td>0.022610</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-11-01</th>\n",
" <td>62.384998</td>\n",
" <td>63.982498</td>\n",
" <td>62.290001</td>\n",
" <td>63.955002</td>\n",
" <td>63.326683</td>\n",
" <td>151125200</td>\n",
" <td>0.028381</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-10-21</th>\n",
" <td>116.669998</td>\n",
" <td>118.709999</td>\n",
" <td>116.449997</td>\n",
" <td>116.870003</td>\n",
" <td>116.870003</td>\n",
" <td>89946000</td>\n",
" <td>-0.005446</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-10-22</th>\n",
" <td>117.449997</td>\n",
" <td>118.040001</td>\n",
" <td>114.589996</td>\n",
" <td>115.750000</td>\n",
" <td>115.750000</td>\n",
" <td>101988000</td>\n",
" <td>-0.009583</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-10-23</th>\n",
" <td>116.389999</td>\n",
" <td>116.550003</td>\n",
" <td>114.279999</td>\n",
" <td>115.040001</td>\n",
" <td>115.040001</td>\n",
" <td>82572600</td>\n",
" <td>-0.006134</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-10-26</th>\n",
" <td>114.010002</td>\n",
" <td>116.550003</td>\n",
" <td>112.879997</td>\n",
" <td>115.050003</td>\n",
" <td>115.050003</td>\n",
" <td>111850700</td>\n",
" <td>0.000087</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-10-27</th>\n",
" <td>115.489998</td>\n",
" <td>117.279999</td>\n",
" <td>114.540001</td>\n",
" <td>116.599998</td>\n",
" <td>116.599998</td>\n",
" <td>91927700</td>\n",
" <td>0.013472</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>253 rows × 7 columns</p>\n",
"</div>"
],
"text/plain": [
" Open High Low Close Adj Close \\\n",
"Date \n",
"2019-10-28 61.855000 62.312500 61.680000 62.262501 61.650810 \n",
"2019-10-29 62.242500 62.437500 60.642502 60.822498 60.224953 \n",
"2019-10-30 61.189999 61.325001 60.302502 60.814999 60.217525 \n",
"2019-10-31 61.810001 62.292500 59.314999 62.189999 61.579021 \n",
"2019-11-01 62.384998 63.982498 62.290001 63.955002 63.326683 \n",
"... ... ... ... ... ... \n",
"2020-10-21 116.669998 118.709999 116.449997 116.870003 116.870003 \n",
"2020-10-22 117.449997 118.040001 114.589996 115.750000 115.750000 \n",
"2020-10-23 116.389999 116.550003 114.279999 115.040001 115.040001 \n",
"2020-10-26 114.010002 116.550003 112.879997 115.050003 115.050003 \n",
"2020-10-27 115.489998 117.279999 114.540001 116.599998 116.599998 \n",
"\n",
" Volume Percent Changes \n",
"Date \n",
"2019-10-28 96572800 NaN \n",
"2019-10-29 142839600 -0.023128 \n",
"2019-10-30 124522000 -0.000123 \n",
"2019-10-31 139162000 0.022610 \n",
"2019-11-01 151125200 0.028381 \n",
"... ... ... \n",
"2020-10-21 89946000 -0.005446 \n",
"2020-10-22 101988000 -0.009583 \n",
"2020-10-23 82572600 -0.006134 \n",
"2020-10-26 111850700 0.000087 \n",
"2020-10-27 91927700 0.013472 \n",
"\n",
"[253 rows x 7 columns]"
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"aapl_proper_index"
]
},
{
"cell_type": "markdown",
"id": "3279df01",
"metadata": {},
"source": [
"# 6. Aggregation"
]
},
{
"cell_type": "markdown",
"id": "9dec0caa",
"metadata": {},
"source": [
"## 6.1 Basic operations"
]
},
{
"cell_type": "code",
"execution_count": 55,
"id": "876e6b85",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.0028956964634767705"
]
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"aapl_proper_index[\"Percent Changes\"].mean()"
]
},
{
"cell_type": "code",
"execution_count": 51,
"id": "27c153e5",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.11980826040056836"
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"aapl_proper_index[\"Percent Changes\"].max()"
]
},
{
"cell_type": "code",
"execution_count": 52,
"id": "23e031ff",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"-0.12864694751232164"
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"aapl_proper_index[\"Percent Changes\"].min()"
]
},
{
"cell_type": "code",
"execution_count": 56,
"id": "92849318",
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"0.0024045071214521263"
]
},
"execution_count": 56,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"aapl_proper_index[\"Percent Changes\"].median()"
]
},
{
"cell_type": "code",
"execution_count": 53,
"id": "e6f50326",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.01985402460478214"
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"aapl_proper_index[aapl_proper_index[\"Percent Changes\"] > 0][\"Percent Changes\"].mean()"
]
},
{
"cell_type": "code",
"execution_count": 57,
"id": "a3d0e96b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"-0.01881547236798305"
]
},
"execution_count": 57,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"aapl_proper_index[aapl_proper_index[\"Percent Changes\"] < 0][\"Percent Changes\"].mean()"
]
},
{
"cell_type": "markdown",
"id": "fd7781d1",
"metadata": {},
"source": [
"## 6.2 Grouping"
]
},
{
"cell_type": "markdown",
"id": "64ec7f60",
"metadata": {},
"source": [
"We use planets discovery data as an example for grouping"
]
},
{
"cell_type": "code",
"execution_count": 96,
"id": "26b8b374",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(1035, 6)"
]
},
"execution_count": 96,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import seaborn as sns\n",
"planets = sns.load_dataset('planets')\n",
"planets.shape"
]
},
{
"cell_type": "code",
"execution_count": 97,
"id": "3c8426a5",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>method</th>\n",
" <th>number</th>\n",
" <th>orbital_period</th>\n",
" <th>mass</th>\n",
" <th>distance</th>\n",
" <th>year</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Radial Velocity</td>\n",
" <td>1</td>\n",
" <td>269.300000</td>\n",
" <td>7.10</td>\n",
" <td>77.40</td>\n",
" <td>2006</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Radial Velocity</td>\n",
" <td>1</td>\n",
" <td>874.774000</td>\n",
" <td>2.21</td>\n",
" <td>56.95</td>\n",
" <td>2008</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Radial Velocity</td>\n",
" <td>1</td>\n",
" <td>763.000000</td>\n",
" <td>2.60</td>\n",
" <td>19.84</td>\n",
" <td>2011</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Radial Velocity</td>\n",
" <td>1</td>\n",
" <td>326.030000</td>\n",
" <td>19.40</td>\n",
" <td>110.62</td>\n",
" <td>2007</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Radial Velocity</td>\n",
" <td>1</td>\n",
" <td>516.220000</td>\n",
" <td>10.50</td>\n",
" <td>119.47</td>\n",
" <td>2009</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1030</th>\n",
" <td>Transit</td>\n",
" <td>1</td>\n",
" <td>3.941507</td>\n",
" <td>NaN</td>\n",
" <td>172.00</td>\n",
" <td>2006</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1031</th>\n",
" <td>Transit</td>\n",
" <td>1</td>\n",
" <td>2.615864</td>\n",
" <td>NaN</td>\n",
" <td>148.00</td>\n",
" <td>2007</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1032</th>\n",
" <td>Transit</td>\n",
" <td>1</td>\n",
" <td>3.191524</td>\n",
" <td>NaN</td>\n",
" <td>174.00</td>\n",
" <td>2007</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1033</th>\n",
" <td>Transit</td>\n",
" <td>1</td>\n",
" <td>4.125083</td>\n",
" <td>NaN</td>\n",
" <td>293.00</td>\n",
" <td>2008</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1034</th>\n",
" <td>Transit</td>\n",
" <td>1</td>\n",
" <td>4.187757</td>\n",
" <td>NaN</td>\n",
" <td>260.00</td>\n",
" <td>2008</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1035 rows × 6 columns</p>\n",
"</div>"
],
"text/plain": [
" method number orbital_period mass distance year\n",
"0 Radial Velocity 1 269.300000 7.10 77.40 2006\n",
"1 Radial Velocity 1 874.774000 2.21 56.95 2008\n",
"2 Radial Velocity 1 763.000000 2.60 19.84 2011\n",
"3 Radial Velocity 1 326.030000 19.40 110.62 2007\n",
"4 Radial Velocity 1 516.220000 10.50 119.47 2009\n",
"... ... ... ... ... ... ...\n",
"1030 Transit 1 3.941507 NaN 172.00 2006\n",
"1031 Transit 1 2.615864 NaN 148.00 2007\n",
"1032 Transit 1 3.191524 NaN 174.00 2007\n",
"1033 Transit 1 4.125083 NaN 293.00 2008\n",
"1034 Transit 1 4.187757 NaN 260.00 2008\n",
"\n",
"[1035 rows x 6 columns]"
]
},
"execution_count": 97,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"planets"
]
},
{
"cell_type": "code",
"execution_count": 98,
"id": "9872086c",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"method\n",
"Astrometry 631.180000\n",
"Eclipse Timing Variations 4343.500000\n",
"Imaging 27500.000000\n",
"Microlensing 3300.000000\n",
"Orbital Brightness Modulation 0.342887\n",
"Pulsar Timing 66.541900\n",
"Pulsation Timing Variations 1170.000000\n",
"Radial Velocity 360.200000\n",
"Transit 5.714932\n",
"Transit Timing Variations 57.011000\n",
"Name: orbital_period, dtype: float64"
]
},
"execution_count": 98,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"planets.groupby('method')['orbital_period'].median()"
]
},
{
"cell_type": "code",
"execution_count": 101,
"id": "8399a771",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>count</th>\n",
" <th>mean</th>\n",
" <th>std</th>\n",
" <th>min</th>\n",
" <th>25%</th>\n",
" <th>50%</th>\n",
" <th>75%</th>\n",
" <th>max</th>\n",
" </tr>\n",
" <tr>\n",
" <th>method</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Astrometry</th>\n",
" <td>2.0</td>\n",
" <td>631.180000</td>\n",
" <td>544.217663</td>\n",
" <td>246.360000</td>\n",
" <td>438.770000</td>\n",
" <td>631.180000</td>\n",
" <td>823.590000</td>\n",
" <td>1016.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Eclipse Timing Variations</th>\n",
" <td>9.0</td>\n",
" <td>4751.644444</td>\n",
" <td>2499.130945</td>\n",
" <td>1916.250000</td>\n",
" <td>2900.000000</td>\n",
" <td>4343.500000</td>\n",
" <td>5767.000000</td>\n",
" <td>10220.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Imaging</th>\n",
" <td>12.0</td>\n",
" <td>118247.737500</td>\n",
" <td>213978.177277</td>\n",
" <td>4639.150000</td>\n",
" <td>8343.900000</td>\n",
" <td>27500.000000</td>\n",
" <td>94250.000000</td>\n",
" <td>730000.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Microlensing</th>\n",
" <td>7.0</td>\n",
" <td>3153.571429</td>\n",
" <td>1113.166333</td>\n",
" <td>1825.000000</td>\n",
" <td>2375.000000</td>\n",
" <td>3300.000000</td>\n",
" <td>3550.000000</td>\n",
" <td>5100.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Orbital Brightness Modulation</th>\n",
" <td>3.0</td>\n",
" <td>0.709307</td>\n",
" <td>0.725493</td>\n",
" <td>0.240104</td>\n",
" <td>0.291496</td>\n",
" <td>0.342887</td>\n",
" <td>0.943908</td>\n",
" <td>1.544929</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Pulsar Timing</th>\n",
" <td>5.0</td>\n",
" <td>7343.021201</td>\n",
" <td>16313.265573</td>\n",
" <td>0.090706</td>\n",
" <td>25.262000</td>\n",
" <td>66.541900</td>\n",
" <td>98.211400</td>\n",
" <td>36525.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Pulsation Timing Variations</th>\n",
" <td>1.0</td>\n",
" <td>1170.000000</td>\n",
" <td>NaN</td>\n",
" <td>1170.000000</td>\n",
" <td>1170.000000</td>\n",
" <td>1170.000000</td>\n",
" <td>1170.000000</td>\n",
" <td>1170.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Radial Velocity</th>\n",
" <td>553.0</td>\n",
" <td>823.354680</td>\n",
" <td>1454.926210</td>\n",
" <td>0.736540</td>\n",
" <td>38.021000</td>\n",
" <td>360.200000</td>\n",
" <td>982.000000</td>\n",
" <td>17337.500000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Transit</th>\n",
" <td>397.0</td>\n",
" <td>21.102073</td>\n",
" <td>46.185893</td>\n",
" <td>0.355000</td>\n",
" <td>3.160630</td>\n",
" <td>5.714932</td>\n",
" <td>16.145700</td>\n",
" <td>331.600590</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Transit Timing Variations</th>\n",
" <td>3.0</td>\n",
" <td>79.783500</td>\n",
" <td>71.599884</td>\n",
" <td>22.339500</td>\n",
" <td>39.675250</td>\n",
" <td>57.011000</td>\n",
" <td>108.505500</td>\n",
" <td>160.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" count mean std \\\n",
"method \n",
"Astrometry 2.0 631.180000 544.217663 \n",
"Eclipse Timing Variations 9.0 4751.644444 2499.130945 \n",
"Imaging 12.0 118247.737500 213978.177277 \n",
"Microlensing 7.0 3153.571429 1113.166333 \n",
"Orbital Brightness Modulation 3.0 0.709307 0.725493 \n",
"Pulsar Timing 5.0 7343.021201 16313.265573 \n",
"Pulsation Timing Variations 1.0 1170.000000 NaN \n",
"Radial Velocity 553.0 823.354680 1454.926210 \n",
"Transit 397.0 21.102073 46.185893 \n",
"Transit Timing Variations 3.0 79.783500 71.599884 \n",
"\n",
" min 25% 50% \\\n",
"method \n",
"Astrometry 246.360000 438.770000 631.180000 \n",
"Eclipse Timing Variations 1916.250000 2900.000000 4343.500000 \n",
"Imaging 4639.150000 8343.900000 27500.000000 \n",
"Microlensing 1825.000000 2375.000000 3300.000000 \n",
"Orbital Brightness Modulation 0.240104 0.291496 0.342887 \n",
"Pulsar Timing 0.090706 25.262000 66.541900 \n",
"Pulsation Timing Variations 1170.000000 1170.000000 1170.000000 \n",
"Radial Velocity 0.736540 38.021000 360.200000 \n",
"Transit 0.355000 3.160630 5.714932 \n",
"Transit Timing Variations 22.339500 39.675250 57.011000 \n",
"\n",
" 75% max \n",
"method \n",
"Astrometry 823.590000 1016.000000 \n",
"Eclipse Timing Variations 5767.000000 10220.000000 \n",
"Imaging 94250.000000 730000.000000 \n",
"Microlensing 3550.000000 5100.000000 \n",
"Orbital Brightness Modulation 0.943908 1.544929 \n",
"Pulsar Timing 98.211400 36525.000000 \n",
"Pulsation Timing Variations 1170.000000 1170.000000 \n",
"Radial Velocity 982.000000 17337.500000 \n",
"Transit 16.145700 331.600590 \n",
"Transit Timing Variations 108.505500 160.000000 "
]
},
"execution_count": 101,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"planets.groupby('method')['orbital_period'].describe()"
]
},
{
"cell_type": "code",
"execution_count": 100,
"id": "4cd98382",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"method\n",
"Astrometry 2\n",
"Eclipse Timing Variations 9\n",
"Imaging 38\n",
"Microlensing 23\n",
"Orbital Brightness Modulation 3\n",
"Pulsar Timing 5\n",
"Pulsation Timing Variations 1\n",
"Radial Velocity 553\n",
"Transit 397\n",
"Transit Timing Variations 4\n",
"Name: number, dtype: int64"
]
},
"execution_count": 100,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"planets.groupby('method')[\"number\"].count()"
]
},
{
"cell_type": "markdown",
"id": "1430d995",
"metadata": {},
"source": [
"# 7. Joining Data"
]
},
{
"cell_type": "markdown",
"id": "c3abca64",
"metadata": {},
"source": [
"## 7.1 Merge (or join)"
]
},
{
"cell_type": "code",
"execution_count": 106,
"id": "de4cab14",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>employee</th>\n",
" <th>group</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Bob</td>\n",
" <td>Accounting</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Jake</td>\n",
" <td>Engineering</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Lisa</td>\n",
" <td>Engineering</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Sue</td>\n",
" <td>HR</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" employee group\n",
"0 Bob Accounting\n",
"1 Jake Engineering\n",
"2 Lisa Engineering\n",
"3 Sue HR"
]
},
"execution_count": 106,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"department = pd.DataFrame({'employee': ['Bob', 'Jake', 'Lisa', 'Sue'],\n",
" 'group': ['Accounting', 'Engineering', 'Engineering', 'HR']})\n",
"\n",
"department"
]
},
{
"cell_type": "code",
"execution_count": 107,
"id": "6301c1f2",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>employee</th>\n",
" <th>hire_date</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Lisa</td>\n",
" <td>2004</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Bob</td>\n",
" <td>2008</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Jake</td>\n",
" <td>2012</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Sue</td>\n",
" <td>2014</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" employee hire_date\n",
"0 Lisa 2004\n",
"1 Bob 2008\n",
"2 Jake 2012\n",
"3 Sue 2014"
]
},
"execution_count": 107,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"hire_date = pd.DataFrame({'employee': ['Lisa', 'Bob', 'Jake', 'Sue'],\n",
" 'hire_date': [2004, 2008, 2012, 2014]})\n",
"\n",
"hire_date"
]
},
{
"cell_type": "code",
"execution_count": 111,
"id": "7a4f923b",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>employee</th>\n",
" <th>group</th>\n",
" <th>hire_date</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Bob</td>\n",
" <td>Accounting</td>\n",
" <td>2008</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Jake</td>\n",
" <td>Engineering</td>\n",
" <td>2012</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Lisa</td>\n",
" <td>Engineering</td>\n",
" <td>2004</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Sue</td>\n",
" <td>HR</td>\n",
" <td>2014</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" employee group hire_date\n",
"0 Bob Accounting 2008\n",
"1 Jake Engineering 2012\n",
"2 Lisa Engineering 2004\n",
"3 Sue HR 2014"
]
},
"execution_count": 111,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"employee = pd.merge(department, hire_date)\n",
"employee"
]
},
{
"cell_type": "code",
"execution_count": 112,
"id": "0bab79da",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>employee</th>\n",
" <th>group</th>\n",
" <th>hire_date</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Bob</td>\n",
" <td>Accounting</td>\n",
" <td>2008</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Jake</td>\n",
" <td>Engineering</td>\n",
" <td>2012</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Lisa</td>\n",
" <td>Engineering</td>\n",
" <td>2004</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Sue</td>\n",
" <td>HR</td>\n",
" <td>2014</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" employee group hire_date\n",
"0 Bob Accounting 2008\n",
"1 Jake Engineering 2012\n",
"2 Lisa Engineering 2004\n",
"3 Sue HR 2014"
]
},
"execution_count": 112,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"employee = pd.merge(department, hire_date, on=\"employee\")\n",
"employee"
]
},
{
"cell_type": "code",
"execution_count": 113,
"id": "29d1d5e0",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>salary</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Bob</td>\n",
" <td>70000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Jake</td>\n",
" <td>80000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Lisa</td>\n",
" <td>120000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Sue</td>\n",
" <td>90000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name salary\n",
"0 Bob 70000\n",
"1 Jake 80000\n",
"2 Lisa 120000\n",
"3 Sue 90000"
]
},
"execution_count": 113,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"salary = pd.DataFrame({'name': ['Bob', 'Jake', 'Lisa', 'Sue'],\n",
" 'salary': [70000, 80000, 120000, 90000]})\n",
"salary"
]
},
{
"cell_type": "code",
"execution_count": 118,
"id": "200d57d0",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>employee</th>\n",
" <th>group</th>\n",
" <th>name</th>\n",
" <th>salary</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Bob</td>\n",
" <td>Accounting</td>\n",
" <td>Bob</td>\n",
" <td>70000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Jake</td>\n",
" <td>Engineering</td>\n",
" <td>Jake</td>\n",
" <td>80000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Lisa</td>\n",
" <td>Engineering</td>\n",
" <td>Lisa</td>\n",
" <td>120000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Sue</td>\n",
" <td>HR</td>\n",
" <td>Sue</td>\n",
" <td>90000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" employee group name salary\n",
"0 Bob Accounting Bob 70000\n",
"1 Jake Engineering Jake 80000\n",
"2 Lisa Engineering Lisa 120000\n",
"3 Sue HR Sue 90000"
]
},
"execution_count": 118,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"employee = pd.merge(department, salary, left_on=\"employee\", right_on=\"name\")\n",
"employee"
]
},
{
"cell_type": "code",
"execution_count": 119,
"id": "cd98a441",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>employee</th>\n",
" <th>group</th>\n",
" <th>salary</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Bob</td>\n",
" <td>Accounting</td>\n",
" <td>70000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Jake</td>\n",
" <td>Engineering</td>\n",
" <td>80000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Lisa</td>\n",
" <td>Engineering</td>\n",
" <td>120000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Sue</td>\n",
" <td>HR</td>\n",
" <td>90000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" employee group salary\n",
"0 Bob Accounting 70000\n",
"1 Jake Engineering 80000\n",
"2 Lisa Engineering 120000\n",
"3 Sue HR 90000"
]
},
"execution_count": 119,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"employee = employee.drop('name',axis=1)\n",
"employee"
]
},
{
"cell_type": "markdown",
"id": "3762163c",
"metadata": {},
"source": [
"## 7.2 one to many merging"
]
},
{
"cell_type": "code",
"execution_count": 121,
"id": "b1aa1315",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>group</th>\n",
" <th>supervisor</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Accounting</td>\n",
" <td>Carly</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Engineering</td>\n",
" <td>Guido</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>HR</td>\n",
" <td>Steve</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" group supervisor\n",
"0 Accounting Carly\n",
"1 Engineering Guido\n",
"2 HR Steve"
]
},
"execution_count": 121,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"supervisor = pd.DataFrame({'group': ['Accounting', 'Engineering', 'HR'],\n",
" 'supervisor': ['Carly', 'Guido', 'Steve']})\n",
"supervisor"
]
},
{
"cell_type": "code",
"execution_count": 126,
"id": "b01c9052",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>employee</th>\n",
" <th>group</th>\n",
" <th>salary</th>\n",
" <th>supervisor</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Bob</td>\n",
" <td>Accounting</td>\n",
" <td>70000</td>\n",
" <td>Carly</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Jake</td>\n",
" <td>Engineering</td>\n",
" <td>80000</td>\n",
" <td>Guido</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Lisa</td>\n",
" <td>Engineering</td>\n",
" <td>120000</td>\n",
" <td>Guido</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Sue</td>\n",
" <td>HR</td>\n",
" <td>90000</td>\n",
" <td>Steve</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" employee group salary supervisor\n",
"0 Bob Accounting 70000 Carly\n",
"1 Jake Engineering 80000 Guido\n",
"2 Lisa Engineering 120000 Guido\n",
"3 Sue HR 90000 Steve"
]
},
"execution_count": 126,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.merge(employee,supervisor)"
]
},
{
"cell_type": "markdown",
"id": "211fc106",
"metadata": {},
"source": [
"# 7.3 Many to Many merging"
]
},
{
"cell_type": "code",
"execution_count": 128,
"id": "87fb2052",
"metadata": {},
"outputs": [],
"source": [
"skills = pd.DataFrame({'group': ['Accounting', 'Accounting','Engineering', \n",
" 'Engineering', 'HR', 'HR'],\n",
" 'skills': ['math', 'spreadsheets', 'coding', \n",
" 'linux','spreadsheets', 'organization']})"
]
},
{
"cell_type": "code",
"execution_count": 129,
"id": "e1d7ebc4",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>employee</th>\n",
" <th>group</th>\n",
" <th>salary</th>\n",
" <th>skills</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Bob</td>\n",
" <td>Accounting</td>\n",
" <td>70000</td>\n",
" <td>math</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Bob</td>\n",
" <td>Accounting</td>\n",
" <td>70000</td>\n",
" <td>spreadsheets</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Jake</td>\n",
" <td>Engineering</td>\n",
" <td>80000</td>\n",
" <td>coding</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Jake</td>\n",
" <td>Engineering</td>\n",
" <td>80000</td>\n",
" <td>linux</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Lisa</td>\n",
" <td>Engineering</td>\n",
" <td>120000</td>\n",
" <td>coding</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Lisa</td>\n",
" <td>Engineering</td>\n",
" <td>120000</td>\n",
" <td>linux</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Sue</td>\n",
" <td>HR</td>\n",
" <td>90000</td>\n",
" <td>spreadsheets</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Sue</td>\n",
" <td>HR</td>\n",
" <td>90000</td>\n",
" <td>organization</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" employee group salary skills\n",
"0 Bob Accounting 70000 math\n",
"1 Bob Accounting 70000 spreadsheets\n",
"2 Jake Engineering 80000 coding\n",
"3 Jake Engineering 80000 linux\n",
"4 Lisa Engineering 120000 coding\n",
"5 Lisa Engineering 120000 linux\n",
"6 Sue HR 90000 spreadsheets\n",
"7 Sue HR 90000 organization"
]
},
"execution_count": 129,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.merge(employee,skills)"
]
},
{
"cell_type": "markdown",
"id": "ec89a9e8",
"metadata": {},
"source": [
"It's a very strange set of data. Make sure you know how to use it for many-to-many merging"
]
},
{
"cell_type": "markdown",
"id": "1d976a18",
"metadata": {},
"source": [
"## 7.4 Inner Join / Outer Join / Left Join / Right Join"
]
},
{
"cell_type": "code",
"execution_count": 132,
"id": "1c7edc09",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>food</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Peter</td>\n",
" <td>fish</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Paul</td>\n",
" <td>beans</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Mary</td>\n",
" <td>bread</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name food\n",
"0 Peter fish\n",
"1 Paul beans\n",
"2 Mary bread"
]
},
"execution_count": 132,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fav_food = pd.DataFrame({'name': ['Peter', 'Paul', 'Mary'],\n",
" 'food': ['fish', 'beans', 'bread']},\n",
" columns=['name', 'food'])\n",
"fav_food"
]
},
{
"cell_type": "code",
"execution_count": 134,
"id": "74aa124b",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>drink</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Mary</td>\n",
" <td>wine</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Joseph</td>\n",
" <td>beer</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name drink\n",
"0 Mary wine\n",
"1 Joseph beer"
]
},
"execution_count": 134,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fav_drink = pd.DataFrame({'name': ['Mary', 'Joseph'],\n",
" 'drink': ['wine', 'beer']},\n",
" columns=['name', 'drink'])\n",
"fav_drink"
]
},
{
"cell_type": "code",
"execution_count": 135,
"id": "48886a2f",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>food</th>\n",
" <th>drink</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Mary</td>\n",
" <td>bread</td>\n",
" <td>wine</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name food drink\n",
"0 Mary bread wine"
]
},
"execution_count": 135,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.merge(fav_food, fav_drink)"
]
},
{
"cell_type": "code",
"execution_count": 136,
"id": "d5e4c0e9",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>food</th>\n",
" <th>drink</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Peter</td>\n",
" <td>fish</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Paul</td>\n",
" <td>beans</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Mary</td>\n",
" <td>bread</td>\n",
" <td>wine</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Joseph</td>\n",
" <td>NaN</td>\n",
" <td>beer</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name food drink\n",
"0 Peter fish NaN\n",
"1 Paul beans NaN\n",
"2 Mary bread wine\n",
"3 Joseph NaN beer"
]
},
"execution_count": 136,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.merge(fav_food, fav_drink, how=\"outer\")"
]
},
{
"cell_type": "code",
"execution_count": 137,
"id": "84ab9843",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>food</th>\n",
" <th>drink</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Peter</td>\n",
" <td>fish</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Paul</td>\n",
" <td>beans</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Mary</td>\n",
" <td>bread</td>\n",
" <td>wine</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name food drink\n",
"0 Peter fish NaN\n",
"1 Paul beans NaN\n",
"2 Mary bread wine"
]
},
"execution_count": 137,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.merge(fav_food, fav_drink, how=\"left\")"
]
},
{
"cell_type": "code",
"execution_count": 138,
"id": "3d11be90",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>food</th>\n",
" <th>drink</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Mary</td>\n",
" <td>bread</td>\n",
" <td>wine</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Joseph</td>\n",
" <td>NaN</td>\n",
" <td>beer</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name food drink\n",
"0 Mary bread wine\n",
"1 Joseph NaN beer"
]
},
"execution_count": 138,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.merge(fav_food, fav_drink, how=\"right\")"
]
},
{
"cell_type": "markdown",
"id": "d8e16b5b",
"metadata": {},
"source": [
"# 8. Handling Missing Data"
]
},
{
"cell_type": "code",
"execution_count": 71,
"id": "e782f9ea",
"metadata": {},
"outputs": [],
"source": [
"hibor = pd.read_csv(\"hibor.csv\", parse_dates=True, index_col='date')"
]
},
{
"cell_type": "code",
"execution_count": 72,
"id": "04401df8",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>overnight</th>\n",
" <th>1 week</th>\n",
" <th>2 weeks</th>\n",
" <th>1 months</th>\n",
" <th>2 months</th>\n",
" <th>3 months</th>\n",
" <th>6 months</th>\n",
" <th>12 months</th>\n",
" </tr>\n",
" <tr>\n",
" <th>date</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2010-01-01</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-02</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-03</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-04</th>\n",
" <td>0.03</td>\n",
" <td>0.04971</td>\n",
" <td>0.05000</td>\n",
" <td>0.07964</td>\n",
" <td>0.11893</td>\n",
" <td>0.15679</td>\n",
" <td>0.31571</td>\n",
" <td>0.71429</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-05</th>\n",
" <td>0.03</td>\n",
" <td>0.04971</td>\n",
" <td>0.05000</td>\n",
" <td>0.07964</td>\n",
" <td>0.11000</td>\n",
" <td>0.15000</td>\n",
" <td>0.29929</td>\n",
" <td>0.68929</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-06</th>\n",
" <td>0.03</td>\n",
" <td>0.04900</td>\n",
" <td>0.04971</td>\n",
" <td>0.08000</td>\n",
" <td>0.11000</td>\n",
" <td>0.14000</td>\n",
" <td>0.28000</td>\n",
" <td>0.66857</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-07</th>\n",
" <td>0.03</td>\n",
" <td>0.04971</td>\n",
" <td>0.04971</td>\n",
" <td>0.06964</td>\n",
" <td>0.10000</td>\n",
" <td>0.13000</td>\n",
" <td>0.26000</td>\n",
" <td>0.62857</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-08</th>\n",
" <td>0.03</td>\n",
" <td>0.04971</td>\n",
" <td>0.04971</td>\n",
" <td>0.06964</td>\n",
" <td>0.10000</td>\n",
" <td>0.13000</td>\n",
" <td>0.26000</td>\n",
" <td>0.62857</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-09</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-10</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-11</th>\n",
" <td>0.03</td>\n",
" <td>0.04971</td>\n",
" <td>0.04971</td>\n",
" <td>0.06036</td>\n",
" <td>0.09000</td>\n",
" <td>0.12000</td>\n",
" <td>0.24000</td>\n",
" <td>0.57000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" overnight 1 week 2 weeks 1 months 2 months 3 months \\\n",
"date \n",
"2010-01-01 NaN NaN NaN NaN NaN NaN \n",
"2010-01-02 NaN NaN NaN NaN NaN NaN \n",
"2010-01-03 NaN NaN NaN NaN NaN NaN \n",
"2010-01-04 0.03 0.04971 0.05000 0.07964 0.11893 0.15679 \n",
"2010-01-05 0.03 0.04971 0.05000 0.07964 0.11000 0.15000 \n",
"2010-01-06 0.03 0.04900 0.04971 0.08000 0.11000 0.14000 \n",
"2010-01-07 0.03 0.04971 0.04971 0.06964 0.10000 0.13000 \n",
"2010-01-08 0.03 0.04971 0.04971 0.06964 0.10000 0.13000 \n",
"2010-01-09 NaN NaN NaN NaN NaN NaN \n",
"2010-01-10 NaN NaN NaN NaN NaN NaN \n",
"2010-01-11 0.03 0.04971 0.04971 0.06036 0.09000 0.12000 \n",
"\n",
" 6 months 12 months \n",
"date \n",
"2010-01-01 NaN NaN \n",
"2010-01-02 NaN NaN \n",
"2010-01-03 NaN NaN \n",
"2010-01-04 0.31571 0.71429 \n",
"2010-01-05 0.29929 0.68929 \n",
"2010-01-06 0.28000 0.66857 \n",
"2010-01-07 0.26000 0.62857 \n",
"2010-01-08 0.26000 0.62857 \n",
"2010-01-09 NaN NaN \n",
"2010-01-10 NaN NaN \n",
"2010-01-11 0.24000 0.57000 "
]
},
"execution_count": 72,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"hibor"
]
},
{
"cell_type": "markdown",
"id": "a880cff4",
"metadata": {},
"source": [
"## 8.1 Check missing data"
]
},
{
"cell_type": "code",
"execution_count": 77,
"id": "6fc86cdc",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>overnight</th>\n",
" <th>1 week</th>\n",
" <th>2 weeks</th>\n",
" <th>1 months</th>\n",
" <th>2 months</th>\n",
" <th>3 months</th>\n",
" <th>6 months</th>\n",
" <th>12 months</th>\n",
" </tr>\n",
" <tr>\n",
" <th>date</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2010-01-01</th>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-02</th>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-03</th>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-04</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-05</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-06</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-07</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-08</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-09</th>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-10</th>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-11</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" overnight 1 week 2 weeks 1 months 2 months 3 months \\\n",
"date \n",
"2010-01-01 True True True True True True \n",
"2010-01-02 True True True True True True \n",
"2010-01-03 True True True True True True \n",
"2010-01-04 False False False False False False \n",
"2010-01-05 False False False False False False \n",
"2010-01-06 False False False False False False \n",
"2010-01-07 False False False False False False \n",
"2010-01-08 False False False False False False \n",
"2010-01-09 True True True True True True \n",
"2010-01-10 True True True True True True \n",
"2010-01-11 False False False False False False \n",
"\n",
" 6 months 12 months \n",
"date \n",
"2010-01-01 True True \n",
"2010-01-02 True True \n",
"2010-01-03 True True \n",
"2010-01-04 False False \n",
"2010-01-05 False False \n",
"2010-01-06 False False \n",
"2010-01-07 False False \n",
"2010-01-08 False False \n",
"2010-01-09 True True \n",
"2010-01-10 True True \n",
"2010-01-11 False False "
]
},
"execution_count": 77,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"hibor.isnull()"
]
},
{
"cell_type": "code",
"execution_count": 79,
"id": "84d0ac6f",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 79,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"hibor.isnull().values.any()"
]
},
{
"cell_type": "code",
"execution_count": 80,
"id": "fc5a7223",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 80,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"hibor[\"overnight\"].isnull().values.any()"
]
},
{
"cell_type": "code",
"execution_count": 84,
"id": "87d7e12f",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"False 6\n",
"True 5\n",
"Name: overnight, dtype: int64"
]
},
"execution_count": 84,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"hibor[\"overnight\"].isnull().value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 85,
"id": "d3acd78f",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"date\n",
"2010-01-01 NaN\n",
"2010-01-02 NaN\n",
"2010-01-03 NaN\n",
"2010-01-09 NaN\n",
"2010-01-10 NaN\n",
"Name: overnight, dtype: float64"
]
},
"execution_count": 85,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"hibor[\"overnight\"][hibor[\"overnight\"].isnull()]"
]
},
{
"cell_type": "markdown",
"id": "27237840",
"metadata": {},
"source": [
"## 8.2 Drop Data"
]
},
{
"cell_type": "code",
"execution_count": 87,
"id": "f550b706",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>overnight</th>\n",
" <th>1 week</th>\n",
" <th>2 weeks</th>\n",
" <th>1 months</th>\n",
" <th>2 months</th>\n",
" <th>3 months</th>\n",
" <th>6 months</th>\n",
" <th>12 months</th>\n",
" </tr>\n",
" <tr>\n",
" <th>date</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2010-01-04</th>\n",
" <td>0.03</td>\n",
" <td>0.04971</td>\n",
" <td>0.05000</td>\n",
" <td>0.07964</td>\n",
" <td>0.11893</td>\n",
" <td>0.15679</td>\n",
" <td>0.31571</td>\n",
" <td>0.71429</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-05</th>\n",
" <td>0.03</td>\n",
" <td>0.04971</td>\n",
" <td>0.05000</td>\n",
" <td>0.07964</td>\n",
" <td>0.11000</td>\n",
" <td>0.15000</td>\n",
" <td>0.29929</td>\n",
" <td>0.68929</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-06</th>\n",
" <td>0.03</td>\n",
" <td>0.04900</td>\n",
" <td>0.04971</td>\n",
" <td>0.08000</td>\n",
" <td>0.11000</td>\n",
" <td>0.14000</td>\n",
" <td>0.28000</td>\n",
" <td>0.66857</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-07</th>\n",
" <td>0.03</td>\n",
" <td>0.04971</td>\n",
" <td>0.04971</td>\n",
" <td>0.06964</td>\n",
" <td>0.10000</td>\n",
" <td>0.13000</td>\n",
" <td>0.26000</td>\n",
" <td>0.62857</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-08</th>\n",
" <td>0.03</td>\n",
" <td>0.04971</td>\n",
" <td>0.04971</td>\n",
" <td>0.06964</td>\n",
" <td>0.10000</td>\n",
" <td>0.13000</td>\n",
" <td>0.26000</td>\n",
" <td>0.62857</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-11</th>\n",
" <td>0.03</td>\n",
" <td>0.04971</td>\n",
" <td>0.04971</td>\n",
" <td>0.06036</td>\n",
" <td>0.09000</td>\n",
" <td>0.12000</td>\n",
" <td>0.24000</td>\n",
" <td>0.57000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" overnight 1 week 2 weeks 1 months 2 months 3 months \\\n",
"date \n",
"2010-01-04 0.03 0.04971 0.05000 0.07964 0.11893 0.15679 \n",
"2010-01-05 0.03 0.04971 0.05000 0.07964 0.11000 0.15000 \n",
"2010-01-06 0.03 0.04900 0.04971 0.08000 0.11000 0.14000 \n",
"2010-01-07 0.03 0.04971 0.04971 0.06964 0.10000 0.13000 \n",
"2010-01-08 0.03 0.04971 0.04971 0.06964 0.10000 0.13000 \n",
"2010-01-11 0.03 0.04971 0.04971 0.06036 0.09000 0.12000 \n",
"\n",
" 6 months 12 months \n",
"date \n",
"2010-01-04 0.31571 0.71429 \n",
"2010-01-05 0.29929 0.68929 \n",
"2010-01-06 0.28000 0.66857 \n",
"2010-01-07 0.26000 0.62857 \n",
"2010-01-08 0.26000 0.62857 \n",
"2010-01-11 0.24000 0.57000 "
]
},
"execution_count": 87,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"hibor.dropna()"
]
},
{
"cell_type": "markdown",
"id": "3f0d7ba1",
"metadata": {},
"source": [
"## 8.3 Fill with specific values"
]
},
{
"cell_type": "markdown",
"id": "4033679b",
"metadata": {},
"source": [
"Notes: Just show as an example. Does not make sense in this scenario"
]
},
{
"cell_type": "code",
"execution_count": 90,
"id": "0ac6a858",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>overnight</th>\n",
" <th>1 week</th>\n",
" <th>2 weeks</th>\n",
" <th>1 months</th>\n",
" <th>2 months</th>\n",
" <th>3 months</th>\n",
" <th>6 months</th>\n",
" <th>12 months</th>\n",
" </tr>\n",
" <tr>\n",
" <th>date</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2010-01-01</th>\n",
" <td>0.00</td>\n",
" <td>0.00000</td>\n",
" <td>0.00000</td>\n",
" <td>0.00000</td>\n",
" <td>0.00000</td>\n",
" <td>0.00000</td>\n",
" <td>0.00000</td>\n",
" <td>0.00000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-02</th>\n",
" <td>0.00</td>\n",
" <td>0.00000</td>\n",
" <td>0.00000</td>\n",
" <td>0.00000</td>\n",
" <td>0.00000</td>\n",
" <td>0.00000</td>\n",
" <td>0.00000</td>\n",
" <td>0.00000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-03</th>\n",
" <td>0.00</td>\n",
" <td>0.00000</td>\n",
" <td>0.00000</td>\n",
" <td>0.00000</td>\n",
" <td>0.00000</td>\n",
" <td>0.00000</td>\n",
" <td>0.00000</td>\n",
" <td>0.00000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-04</th>\n",
" <td>0.03</td>\n",
" <td>0.04971</td>\n",
" <td>0.05000</td>\n",
" <td>0.07964</td>\n",
" <td>0.11893</td>\n",
" <td>0.15679</td>\n",
" <td>0.31571</td>\n",
" <td>0.71429</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-05</th>\n",
" <td>0.03</td>\n",
" <td>0.04971</td>\n",
" <td>0.05000</td>\n",
" <td>0.07964</td>\n",
" <td>0.11000</td>\n",
" <td>0.15000</td>\n",
" <td>0.29929</td>\n",
" <td>0.68929</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-06</th>\n",
" <td>0.03</td>\n",
" <td>0.04900</td>\n",
" <td>0.04971</td>\n",
" <td>0.08000</td>\n",
" <td>0.11000</td>\n",
" <td>0.14000</td>\n",
" <td>0.28000</td>\n",
" <td>0.66857</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-07</th>\n",
" <td>0.03</td>\n",
" <td>0.04971</td>\n",
" <td>0.04971</td>\n",
" <td>0.06964</td>\n",
" <td>0.10000</td>\n",
" <td>0.13000</td>\n",
" <td>0.26000</td>\n",
" <td>0.62857</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-08</th>\n",
" <td>0.03</td>\n",
" <td>0.04971</td>\n",
" <td>0.04971</td>\n",
" <td>0.06964</td>\n",
" <td>0.10000</td>\n",
" <td>0.13000</td>\n",
" <td>0.26000</td>\n",
" <td>0.62857</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-09</th>\n",
" <td>0.00</td>\n",
" <td>0.00000</td>\n",
" <td>0.00000</td>\n",
" <td>0.00000</td>\n",
" <td>0.00000</td>\n",
" <td>0.00000</td>\n",
" <td>0.00000</td>\n",
" <td>0.00000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-10</th>\n",
" <td>0.00</td>\n",
" <td>0.00000</td>\n",
" <td>0.00000</td>\n",
" <td>0.00000</td>\n",
" <td>0.00000</td>\n",
" <td>0.00000</td>\n",
" <td>0.00000</td>\n",
" <td>0.00000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-11</th>\n",
" <td>0.03</td>\n",
" <td>0.04971</td>\n",
" <td>0.04971</td>\n",
" <td>0.06036</td>\n",
" <td>0.09000</td>\n",
" <td>0.12000</td>\n",
" <td>0.24000</td>\n",
" <td>0.57000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" overnight 1 week 2 weeks 1 months 2 months 3 months \\\n",
"date \n",
"2010-01-01 0.00 0.00000 0.00000 0.00000 0.00000 0.00000 \n",
"2010-01-02 0.00 0.00000 0.00000 0.00000 0.00000 0.00000 \n",
"2010-01-03 0.00 0.00000 0.00000 0.00000 0.00000 0.00000 \n",
"2010-01-04 0.03 0.04971 0.05000 0.07964 0.11893 0.15679 \n",
"2010-01-05 0.03 0.04971 0.05000 0.07964 0.11000 0.15000 \n",
"2010-01-06 0.03 0.04900 0.04971 0.08000 0.11000 0.14000 \n",
"2010-01-07 0.03 0.04971 0.04971 0.06964 0.10000 0.13000 \n",
"2010-01-08 0.03 0.04971 0.04971 0.06964 0.10000 0.13000 \n",
"2010-01-09 0.00 0.00000 0.00000 0.00000 0.00000 0.00000 \n",
"2010-01-10 0.00 0.00000 0.00000 0.00000 0.00000 0.00000 \n",
"2010-01-11 0.03 0.04971 0.04971 0.06036 0.09000 0.12000 \n",
"\n",
" 6 months 12 months \n",
"date \n",
"2010-01-01 0.00000 0.00000 \n",
"2010-01-02 0.00000 0.00000 \n",
"2010-01-03 0.00000 0.00000 \n",
"2010-01-04 0.31571 0.71429 \n",
"2010-01-05 0.29929 0.68929 \n",
"2010-01-06 0.28000 0.66857 \n",
"2010-01-07 0.26000 0.62857 \n",
"2010-01-08 0.26000 0.62857 \n",
"2010-01-09 0.00000 0.00000 \n",
"2010-01-10 0.00000 0.00000 \n",
"2010-01-11 0.24000 0.57000 "
]
},
"execution_count": 90,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"hibor.fillna(0)"
]
},
{
"cell_type": "markdown",
"id": "d03e428d",
"metadata": {},
"source": [
"## 8.4 Fill with previous values (i.e. forward fill)\n"
]
},
{
"cell_type": "code",
"execution_count": 92,
"id": "0576a4c1",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>overnight</th>\n",
" <th>1 week</th>\n",
" <th>2 weeks</th>\n",
" <th>1 months</th>\n",
" <th>2 months</th>\n",
" <th>3 months</th>\n",
" <th>6 months</th>\n",
" <th>12 months</th>\n",
" </tr>\n",
" <tr>\n",
" <th>date</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2010-01-01</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-02</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-03</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-04</th>\n",
" <td>0.03</td>\n",
" <td>0.04971</td>\n",
" <td>0.05000</td>\n",
" <td>0.07964</td>\n",
" <td>0.11893</td>\n",
" <td>0.15679</td>\n",
" <td>0.31571</td>\n",
" <td>0.71429</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-05</th>\n",
" <td>0.03</td>\n",
" <td>0.04971</td>\n",
" <td>0.05000</td>\n",
" <td>0.07964</td>\n",
" <td>0.11000</td>\n",
" <td>0.15000</td>\n",
" <td>0.29929</td>\n",
" <td>0.68929</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-06</th>\n",
" <td>0.03</td>\n",
" <td>0.04900</td>\n",
" <td>0.04971</td>\n",
" <td>0.08000</td>\n",
" <td>0.11000</td>\n",
" <td>0.14000</td>\n",
" <td>0.28000</td>\n",
" <td>0.66857</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-07</th>\n",
" <td>0.03</td>\n",
" <td>0.04971</td>\n",
" <td>0.04971</td>\n",
" <td>0.06964</td>\n",
" <td>0.10000</td>\n",
" <td>0.13000</td>\n",
" <td>0.26000</td>\n",
" <td>0.62857</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-08</th>\n",
" <td>0.03</td>\n",
" <td>0.04971</td>\n",
" <td>0.04971</td>\n",
" <td>0.06964</td>\n",
" <td>0.10000</td>\n",
" <td>0.13000</td>\n",
" <td>0.26000</td>\n",
" <td>0.62857</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-09</th>\n",
" <td>0.03</td>\n",
" <td>0.04971</td>\n",
" <td>0.04971</td>\n",
" <td>0.06964</td>\n",
" <td>0.10000</td>\n",
" <td>0.13000</td>\n",
" <td>0.26000</td>\n",
" <td>0.62857</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-10</th>\n",
" <td>0.03</td>\n",
" <td>0.04971</td>\n",
" <td>0.04971</td>\n",
" <td>0.06964</td>\n",
" <td>0.10000</td>\n",
" <td>0.13000</td>\n",
" <td>0.26000</td>\n",
" <td>0.62857</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-11</th>\n",
" <td>0.03</td>\n",
" <td>0.04971</td>\n",
" <td>0.04971</td>\n",
" <td>0.06036</td>\n",
" <td>0.09000</td>\n",
" <td>0.12000</td>\n",
" <td>0.24000</td>\n",
" <td>0.57000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" overnight 1 week 2 weeks 1 months 2 months 3 months \\\n",
"date \n",
"2010-01-01 NaN NaN NaN NaN NaN NaN \n",
"2010-01-02 NaN NaN NaN NaN NaN NaN \n",
"2010-01-03 NaN NaN NaN NaN NaN NaN \n",
"2010-01-04 0.03 0.04971 0.05000 0.07964 0.11893 0.15679 \n",
"2010-01-05 0.03 0.04971 0.05000 0.07964 0.11000 0.15000 \n",
"2010-01-06 0.03 0.04900 0.04971 0.08000 0.11000 0.14000 \n",
"2010-01-07 0.03 0.04971 0.04971 0.06964 0.10000 0.13000 \n",
"2010-01-08 0.03 0.04971 0.04971 0.06964 0.10000 0.13000 \n",
"2010-01-09 0.03 0.04971 0.04971 0.06964 0.10000 0.13000 \n",
"2010-01-10 0.03 0.04971 0.04971 0.06964 0.10000 0.13000 \n",
"2010-01-11 0.03 0.04971 0.04971 0.06036 0.09000 0.12000 \n",
"\n",
" 6 months 12 months \n",
"date \n",
"2010-01-01 NaN NaN \n",
"2010-01-02 NaN NaN \n",
"2010-01-03 NaN NaN \n",
"2010-01-04 0.31571 0.71429 \n",
"2010-01-05 0.29929 0.68929 \n",
"2010-01-06 0.28000 0.66857 \n",
"2010-01-07 0.26000 0.62857 \n",
"2010-01-08 0.26000 0.62857 \n",
"2010-01-09 0.26000 0.62857 \n",
"2010-01-10 0.26000 0.62857 \n",
"2010-01-11 0.24000 0.57000 "
]
},
"execution_count": 92,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"hibor.fillna(method='ffill')"
]
},
{
"cell_type": "markdown",
"id": "d916df12",
"metadata": {},
"source": [
"## 8.5 Fill with next values (i.e. back fill)\n"
]
},
{
"cell_type": "markdown",
"id": "aff4fd55",
"metadata": {},
"source": [
"Remark: may not make sense in this example"
]
},
{
"cell_type": "code",
"execution_count": 95,
"id": "19fdac25",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>overnight</th>\n",
" <th>1 week</th>\n",
" <th>2 weeks</th>\n",
" <th>1 months</th>\n",
" <th>2 months</th>\n",
" <th>3 months</th>\n",
" <th>6 months</th>\n",
" <th>12 months</th>\n",
" </tr>\n",
" <tr>\n",
" <th>date</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2010-01-01</th>\n",
" <td>0.03</td>\n",
" <td>0.04971</td>\n",
" <td>0.05000</td>\n",
" <td>0.07964</td>\n",
" <td>0.11893</td>\n",
" <td>0.15679</td>\n",
" <td>0.31571</td>\n",
" <td>0.71429</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-02</th>\n",
" <td>0.03</td>\n",
" <td>0.04971</td>\n",
" <td>0.05000</td>\n",
" <td>0.07964</td>\n",
" <td>0.11893</td>\n",
" <td>0.15679</td>\n",
" <td>0.31571</td>\n",
" <td>0.71429</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-03</th>\n",
" <td>0.03</td>\n",
" <td>0.04971</td>\n",
" <td>0.05000</td>\n",
" <td>0.07964</td>\n",
" <td>0.11893</td>\n",
" <td>0.15679</td>\n",
" <td>0.31571</td>\n",
" <td>0.71429</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-04</th>\n",
" <td>0.03</td>\n",
" <td>0.04971</td>\n",
" <td>0.05000</td>\n",
" <td>0.07964</td>\n",
" <td>0.11893</td>\n",
" <td>0.15679</td>\n",
" <td>0.31571</td>\n",
" <td>0.71429</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-05</th>\n",
" <td>0.03</td>\n",
" <td>0.04971</td>\n",
" <td>0.05000</td>\n",
" <td>0.07964</td>\n",
" <td>0.11000</td>\n",
" <td>0.15000</td>\n",
" <td>0.29929</td>\n",
" <td>0.68929</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-06</th>\n",
" <td>0.03</td>\n",
" <td>0.04900</td>\n",
" <td>0.04971</td>\n",
" <td>0.08000</td>\n",
" <td>0.11000</td>\n",
" <td>0.14000</td>\n",
" <td>0.28000</td>\n",
" <td>0.66857</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-07</th>\n",
" <td>0.03</td>\n",
" <td>0.04971</td>\n",
" <td>0.04971</td>\n",
" <td>0.06964</td>\n",
" <td>0.10000</td>\n",
" <td>0.13000</td>\n",
" <td>0.26000</td>\n",
" <td>0.62857</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-08</th>\n",
" <td>0.03</td>\n",
" <td>0.04971</td>\n",
" <td>0.04971</td>\n",
" <td>0.06964</td>\n",
" <td>0.10000</td>\n",
" <td>0.13000</td>\n",
" <td>0.26000</td>\n",
" <td>0.62857</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-09</th>\n",
" <td>0.03</td>\n",
" <td>0.04971</td>\n",
" <td>0.04971</td>\n",
" <td>0.06036</td>\n",
" <td>0.09000</td>\n",
" <td>0.12000</td>\n",
" <td>0.24000</td>\n",
" <td>0.57000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-10</th>\n",
" <td>0.03</td>\n",
" <td>0.04971</td>\n",
" <td>0.04971</td>\n",
" <td>0.06036</td>\n",
" <td>0.09000</td>\n",
" <td>0.12000</td>\n",
" <td>0.24000</td>\n",
" <td>0.57000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-01-11</th>\n",
" <td>0.03</td>\n",
" <td>0.04971</td>\n",
" <td>0.04971</td>\n",
" <td>0.06036</td>\n",
" <td>0.09000</td>\n",
" <td>0.12000</td>\n",
" <td>0.24000</td>\n",
" <td>0.57000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" overnight 1 week 2 weeks 1 months 2 months 3 months \\\n",
"date \n",
"2010-01-01 0.03 0.04971 0.05000 0.07964 0.11893 0.15679 \n",
"2010-01-02 0.03 0.04971 0.05000 0.07964 0.11893 0.15679 \n",
"2010-01-03 0.03 0.04971 0.05000 0.07964 0.11893 0.15679 \n",
"2010-01-04 0.03 0.04971 0.05000 0.07964 0.11893 0.15679 \n",
"2010-01-05 0.03 0.04971 0.05000 0.07964 0.11000 0.15000 \n",
"2010-01-06 0.03 0.04900 0.04971 0.08000 0.11000 0.14000 \n",
"2010-01-07 0.03 0.04971 0.04971 0.06964 0.10000 0.13000 \n",
"2010-01-08 0.03 0.04971 0.04971 0.06964 0.10000 0.13000 \n",
"2010-01-09 0.03 0.04971 0.04971 0.06036 0.09000 0.12000 \n",
"2010-01-10 0.03 0.04971 0.04971 0.06036 0.09000 0.12000 \n",
"2010-01-11 0.03 0.04971 0.04971 0.06036 0.09000 0.12000 \n",
"\n",
" 6 months 12 months \n",
"date \n",
"2010-01-01 0.31571 0.71429 \n",
"2010-01-02 0.31571 0.71429 \n",
"2010-01-03 0.31571 0.71429 \n",
"2010-01-04 0.31571 0.71429 \n",
"2010-01-05 0.29929 0.68929 \n",
"2010-01-06 0.28000 0.66857 \n",
"2010-01-07 0.26000 0.62857 \n",
"2010-01-08 0.26000 0.62857 \n",
"2010-01-09 0.24000 0.57000 \n",
"2010-01-10 0.24000 0.57000 \n",
"2010-01-11 0.24000 0.57000 "
]
},
"execution_count": 95,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"hibor.fillna(method='bfill')"
]
},
{
"cell_type": "markdown",
"id": "785c9ec4",
"metadata": {},
"source": [
"# 9. Export CSV"
]
},
{
"cell_type": "markdown",
"id": "fd978fcf",
"metadata": {},
"source": [
"Export dataframe to a csv. Remember don't override the original file!"
]
},
{
"cell_type": "code",
"execution_count": 46,
"id": "3e53c244",
"metadata": {},
"outputs": [],
"source": [
"aapl_proper_index.to_csv(\"AAPL_new.csv\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.0"
}
},
"nbformat": 4,
"nbformat_minor": 5
}