[“z_signal_volume“] [[‘Ticker‘,‘Date‘,‘Adj_Close‘]
() PandaPy [0] (Install) ! pip3 install pandapy
(Load) [0.98438834, 0.96338604, 0.95711037, ..., 1.15115429, 1.13040872, 1.14092643] (Why PandaPy?)
Maintains the full functionality and speed of structured NumPy datatype (eg., [(37.24206924, 100.45429993, 44.57522202, 20.72605705, 130.59109497, 35.80251312, 41.9791832 , 81.51140594, 66.33999634), (35.08446503, 97.62433624, 43.83200836, 20.34561157, 128.53627014, 35.80251312, 41.59314346, 80.89860535, 66.15000153), (35.34244537, 97.63354492, 42.79874039, 19.90727234, 125.76422119, 36.07437897, 40.98268127, 80.28580475, 64.58000183), ..., (21.57999992, 289.79998779, 59.08000183, 11.18000031, 135.27000427, 55.34999847, 158.96000671, 137.53999329, 88.37000275), (21.34000015, 291.51998901, 58.65999985, 11.07999992, 132.80999756, 55.27000046, 157.58999634, 136.80999756, 87.95999908), (21.51000023, 293.6499939 , 58.47999954, 11.15999985, 134.03999329, 55.34999847, 157.69999695, 136.66999817, 88.08999634)] array [col1] array [col2], or np. log (array [col1] ) Provides wrapper functions over NumPy to give you the usability of Pandas (eg., [:5] (pp.group (array, [col1, col2, col2]) ['mean', 'std'], ['Adj_Close','Close']) If you need Pandas for specialty functions, you can easily df=pp.pandas (array) and back (array=pp.structured (df) For Simple calculations (ie, plus, mult, log) PandaPy is (x - 100 x faster than Pandas.
For Table functions (ie, group, pivot, drop, concat, fillna) PandaPy is 5x - 138 x times faster than Pandas. For most use cases, PandaPy is faster than Dask, Modin Ray and Pandas. The best competing python package for performance on table functions is (datatable) , it is 2x - 36 x fast er than PandaPy.
The problem is that datatable is 5x - x slower with simple calculations (plus, mult, returns), it is less intuitive, does not have a large range of functions, have very few complementary libraries, eg matplotlib, and doesn't leave you in a Numpy datatype. For Finance applications the speed of simple calculations takes preference over table function speed. PandaPy is not created to allow you to scale up to clusters for multiple computer processing like Dask, Modin, and Spark, instead it is focused on speed and usability within a single computer's Memory. Machines are getting large, EC2 X1 has 2TB of RAM and is remarkably affordable. If it can be done on a single machine then it should be done on a single machine. Quoting Dask - "For data that fits into RAM, Pandas can often be faster and easier to use than Dask DataFrame" If your dataset is very small you can load your data using PandaPy's read () function, for medium sized data, it is best to load it with datatable or pyspark and convert it to structured Numpy, if it is large pyspark, Dask, or Modin, if it is very large use pyspark. Lastly PandaPy can have as input any multidimensional object and does not have to conform to the basic NumPy datatypes. It can include nested datatypes, subarrays, functions as long as each column conforms to the array lenght, this allows for a great amount of flexibility. You can for example, add (array, "panda function", [[pd for i in range(len(multiple_stocks))]]) to create a list of the panda (pd) module and access it along any index value array ["panda function"] [0]. read_csv (url) . [(315.13000488, 298.79998779, 306.1000061 , 310.11999512, 11658600, 310.11999512, -9.99999000e 05, -9.99999000e 05, -9.99999000e 05, -9.99999000e 05, -999999.), (309.3999939 , 297.38000488, 307. , 300.35998535, 6965200, 300.35998535, 3.10119995e 02, -9.99999000e 05, -9.99999000e 05, -9.99999000e 05, -999999.), (318. , 302.73001099, 306. , 317.69000244, 7394100, 317.69000244, 3.00359985e 02, 3.10119995e 02, -9.99999000e 05, -9.99999000e 05, -999999.), (336.73999023, 317.75 , 321.72000122, 334.95999146, 7551200, 334.95999146, 3.17690002e 02, 3.00359985e 02, 3.10119995e 02, -9.99999000e 05, -999999.), (344.01000977, 327.01998901, 341.95999146, 335.3500061 , 7008500, 335.3500061 , 3.34959991e 02, 3.17690002e 02, 3.00359985e 02, 3.10119995e 02, -999999.)] PandaPy software, similar to the original Pandas project, is developed to improve the usability of python for finance. Structured datatypes are designed to be able to mimic ‘structs’ in the C language, and share a similar memory layout. PandaPy currently houses more than functions. Structured NumPy are meant for interfacing with C code and for low-level manipulation of structured buffers, for example for interpreting binary blobs. For these purposes they support specialized features such as subarrays, nested datatypes, and unions, and allow control over the memory layout of the structure. Note this is a fledgling project, much room for improvement, all feedback appreciated (issues tab) [(315.13000488, 298.79998779, 306.1000061 , 310.11999512, 11658600, 310.11999512, -9.99999000e 05, -9.99999000e 05, -9.99999000e 05, -9.99999000e 05, -999999.), (309.3999939 , 297.38000488, 307. , 300.35998535, 6965200, 300.35998535, 3.10119995e 02, -9.99999000e 05, -9.99999000e 05, -9.99999000e 05, -999999.), (318. , 302.73001099, 306. , 317.69000244, 7394100, 317.69000244, 3.00359985e 02, 3.10119995e 02, -9.99999000e 05, -9.99999000e 05, -999999.), (336.73999023, 317.75 , 321.72000122, 334.95999146, 7551200, 334.95999146, 3.17690002e 02, 3.00359985e 02, 3.10119995e 02, -9.99999000e 05, -999999.), (344.01000977, 327.01998901, 341.95999146, 335.3500061 , 7008500, 335.3500061 , 3.34959991e 02, 3.17690002e 02, 3.00359985e 02, 3.10119995e 02, -999999.)] [-0.01561166, -0.03661396, -0.04288963, ..., 0.15115429, 0.13040872, 0.14092643]
[0.98438834, 0.96338604, 0.95711037, ..., 1.15115429, 1.13040872, 1.14092643] (Why PandaPy?)
Maintains the full functionality and speed of structured NumPy datatype (eg., [(37.24206924, 100.45429993, 44.57522202, 20.72605705, 130.59109497, 35.80251312, 41.9791832 , 81.51140594, 66.33999634), (35.08446503, 97.62433624, 43.83200836, 20.34561157, 128.53627014, 35.80251312, 41.59314346, 80.89860535, 66.15000153), (35.34244537, 97.63354492, 42.79874039, 19.90727234, 125.76422119, 36.07437897, 40.98268127, 80.28580475, 64.58000183), ..., (21.57999992, 289.79998779, 59.08000183, 11.18000031, 135.27000427, 55.34999847, 158.96000671, 137.53999329, 88.37000275), (21.34000015, 291.51998901, 58.65999985, 11.07999992, 132.80999756, 55.27000046, 157.58999634, 136.80999756, 87.95999908), (21.51000023, 293.6499939 , 58.47999954, 11.15999985, 134.03999329, 55.34999847, 157.69999695, 136.66999817, 88.08999634)] array [col1] array [col2], or np. log (array [col1] ) Provides wrapper functions over NumPy to give you the usability of Pandas (eg., [:5] (pp.group (array, [col1, col2, col2]) ['mean', 'std'], ['Adj_Close','Close']) If you need Pandas for specialty functions, you can easily df=pp.pandas (array) and back (array=pp.structured (df) For Simple calculations (ie, plus, mult, log) PandaPy is (x - 100 x faster than Pandas.
For Table functions (ie, group, pivot, drop, concat, fillna) PandaPy is 5x - 138 x times faster than Pandas. For most use cases, PandaPy is faster than Dask, Modin Ray and Pandas. The best competing python package for performance on table functions is (datatable) , it is 2x - 36 x fast er than PandaPy.
The problem is that datatable is 5x - x slower with simple calculations (plus, mult, returns), it is less intuitive, does not have a large range of functions, have very few complementary libraries, eg matplotlib, and doesn't leave you in a Numpy datatype. For Finance applications the speed of simple calculations takes preference over table function speed. PandaPy is not created to allow you to scale up to clusters for multiple computer processing like Dask, Modin, and Spark, instead it is focused on speed and usability within a single computer's Memory. Machines are getting large, EC2 X1 has 2TB of RAM and is remarkably affordable. If it can be done on a single machine then it should be done on a single machine. Quoting Dask - "For data that fits into RAM, Pandas can often be faster and easier to use than Dask DataFrame" If your dataset is very small you can load your data using PandaPy's read () function, for medium sized data, it is best to load it with datatable or pyspark and convert it to structured Numpy, if it is large pyspark, Dask, or Modin, if it is very large use pyspark. Lastly PandaPy can have as input any multidimensional object and does not have to conform to the basic NumPy datatypes. It can include nested datatypes, subarrays, functions as long as each column conforms to the array lenght, this allows for a great amount of flexibility. You can for example, add (array, "panda function", [[pd for i in range(len(multiple_stocks))]]) to create a list of the panda (pd) module and access it along any index value array ["panda function"] [0]. read_csv (url) . [(315.13000488, 298.79998779, 306.1000061 , 310.11999512, 11658600, 310.11999512, -9.99999000e 05, -9.99999000e 05, -9.99999000e 05, -9.99999000e 05, -999999.), (309.3999939 , 297.38000488, 307. , 300.35998535, 6965200, 300.35998535, 3.10119995e 02, -9.99999000e 05, -9.99999000e 05, -9.99999000e 05, -999999.), (318. , 302.73001099, 306. , 317.69000244, 7394100, 317.69000244, 3.00359985e 02, 3.10119995e 02, -9.99999000e 05, -9.99999000e 05, -999999.), (336.73999023, 317.75 , 321.72000122, 334.95999146, 7551200, 334.95999146, 3.17690002e 02, 3.00359985e 02, 3.10119995e 02, -9.99999000e 05, -999999.), (344.01000977, 327.01998901, 341.95999146, 335.3500061 , 7008500, 335.3500061 , 3.34959991e 02, 3.17690002e 02, 3.00359985e 02, 3.10119995e 02, -999999.)] PandaPy software, similar to the original Pandas project, is developed to improve the usability of python for finance. Structured datatypes are designed to be able to mimic ‘structs’ in the C language, and share a similar memory layout. PandaPy currently houses more than functions. Structured NumPy are meant for interfacing with C code and for low-level manipulation of structured buffers, for example for interpreting binary blobs. For these purposes they support specialized features such as subarrays, nested datatypes, and unions, and allow control over the memory layout of the structure. Note this is a fledgling project, much room for improvement, all feedback appreciated (issues tab) [(315.13000488, 298.79998779, 306.1000061 , 310.11999512, 11658600, 310.11999512, -9.99999000e 05, -9.99999000e 05, -9.99999000e 05, -9.99999000e 05, -999999.), (309.3999939 , 297.38000488, 307. , 300.35998535, 6965200, 300.35998535, 3.10119995e 02, -9.99999000e 05, -9.99999000e 05, -9.99999000e 05, -999999.), (318. , 302.73001099, 306. , 317.69000244, 7394100, 317.69000244, 3.00359985e 02, 3.10119995e 02, -9.99999000e 05, -9.99999000e 05, -999999.), (336.73999023, 317.75 , 321.72000122, 334.95999146, 7551200, 334.95999146, 3.17690002e 02, 3.00359985e 02, 3.10119995e 02, -9.99999000e 05, -999999.), (344.01000977, 327.01998901, 341.95999146, 335.3500061 , 7008500, 335.3500061 , 3.34959991e 02, 3.17690002e 02, 3.00359985e 02, 3.10119995e 02, -999999.)] [-0.01561166, -0.03661396, -0.04288963, ..., 0.15115429, 0.13040872, 0.14092643]
(Play around with) speed tests here [‘mean’, ‘std’] and some more (here) . Test and explore the package with this (Google Colab Notebook) . (Get in touch on) [[‘Ticker‘,‘Date‘,‘Adj_Close‘] or (Twitter) . (Use) table (array) to get a pandas looking table printout (Functions)
PandaPy Speed Over Pandas In (X) eg, (dropnarow) ( (x) ["Volume"]
(Array Structure) Read In Arrays (read) To Pandas (unstructured) Pandas to Structured (structured) To Unstructured (to_unstruct) To Structured (to_struct) Print Table (table)
['High','Low','Open','Close','Volume','Adj_Close','Adj_Close_lag_1','Adj_Close_lag_2','Adj_Close_lag_3','Adj_Close_lag_4','Adj_Close_lag_5'] Explorative Functions Descriptive Statistics (describe) (5x) Correlation Array (corr) (2x) [(315.13000488, 298.79998779, 306.1000061 , 310.11999512, 11658600, 310.11999512, 272.95330665, 272.38631982, 271.75180703, 271.10991915, 270.48587024), (309.3999939 , 297.38000488, 307. , 300.35998535, 6965200, 300.35998535, 310.11999512, 272.38631982, 271.75180703, 271.10991915, 270.48587024), (318. , 302.73001099, 306. , 317.69000244, 7394100, 317.69000244, 300.35998535, 310.11999512, 271.75180703, 271.10991915, 270.48587024), (336.73999023, 317.75 , 321.72000122, 334.95999146, 7551200, 334.95999146, 317.69000244, 300.35998535, 310.11999512, 271.10991915, 270.48587024), (344.01000977, 327.01998901, 341.95999146, 335.3500061 , 7008500, 335.3500061 , 334.95999146, 317.69000244, 300.35998535, 310.11999512, 270.48587024)] (Finance Functions) (Returns (returns) (73 x) Portfolio Value (portfolio_value) ( (x) Cummulative Value (cummulative_return) ( (x) Column Lags (lags) (7x) ['High','Low','Open','Close','Volume','Adj_Close','Adj_Close_lag_1','Adj_Close_lag_2','Adj_Close_lag_3','Adj_Close_lag_4','Adj_Close_lag_5'] (Array Functions) Drop Null Rows (dropnarow) (54 x) Drop Column / s (drop) (139 x) Add Column / s (add) (3x) Concatenate (concat) (rows [[(44.57522201538086, 20.726057052612305), (43.832008361816406, 20.345611572265625), (42.79874038696289, 19.907272338867188), ..., (59.08000183105469, 11.180000305175781), (58.65999984741211, 11.079999923706055), (58.47999954223633, 11.15999984741211)] x columns 94 x) Merge (merge) (2x) Group by (group) ( Pivot (pivot) (48 x) Fill Nulls (fillna) (46 x) Shift Column (shift) ( (x) Rename (rename) ( [(315.13000488, 298.79998779, 306.1000061 , 310.11999512, 11658600, 310.11999512, 272.95330665, 272.38631982, 271.75180703, 271.10991915, 270.48587024), (309.3999939 , 297.38000488, 307. , 300.35998535, 6965200, 300.35998535, 310.11999512, 272.38631982, 271.75180703, 271.10991915, 270.48587024), (318. , 302.73001099, 306. , 317.69000244, 7394100, 317.69000244, 300.35998535, 310.11999512, 271.75180703, 271.10991915, 270.48587024), (336.73999023, 317.75 , 321.72000122, 334.95999146, 7551200, 334.95999146, 317.69000244, 300.35998535, 310.11999512, 271.10991915, 270.48587024), (344.01000977, 327.01998901, 341.95999146, 335.3500061 , 7008500, 335.3500061 , 334.95999146, 317.69000244, 300.35998535, 310.11999512, 270.48587024)] (Other Speed Tests) (Update) array [col]=values) (84 x) Addition (array [col] array [col]) ([:5] Multiplication (array [col] array [col]) ([:5] Log (np.log (array [col]) ( note speed tests done on financial dataset only Documentation by Example
Read In Arrays # (First Example) multiple_stocks multiple_stocks=(pp.read) (') https://github.com/firmai/random-assets-two/ blob / master / numpy / multiple_stocks.csv? raw=true (') ) closing=multiple_stocks [['Ticker','Date','Adj_Close']] piv=(pp.pivot) closing,
(Date) , () [["Date","Adj_Close","Volume"] (Ticker) " , " Adj_Close ["AA","AAPL"] ); piv closing=(pp.to_struct) piv, (name_list)= [["Date","Adj_Close","Volume"]))) (#) (Second Example) tsla=(pp.read) (') https://github.com/firmai/random-assets-two/ raw / master / numpy / tsla.csv ' crm=(pp.read) (') https://github.com/firmai/random-assets-two/ raw / master / numpy / crm.csv ' tsla_sub= tsla [["Date","Adj_Close","Volume"]] crm_sub= crm [["Date","Adj_Close","Volume"]] crm_adj=(crm) ] ['mean', 'std'] closing
array ([(37.24206924, 100.45429993, 44.57522202, 20.72605705, 130.59109497, 35.80251312, 41.9791832 , 81.51140594, 66.33999634), (35.08446503, 97.62433624, 43.83200836, 20.34561157, 128.53627014, 35.80251312, 41.59314346, 80.89860535, 66.15000153), (35.34244537, 97.63354492, 42.79874039, 19.90727234, 125.76422119, 36.07437897, 40.98268127, 80.28580475, 64.58000183), ..., (21.57999992, 289.79998779, 59.08000183, 11.18000031, 135.27000427, 55.34999847, 158.96000671, 137.53999329, 88.37000275), (21.34000015, 291.51998901, 58.65999985, 11.07999992, 132.80999756, 55.27000046, 157.58999634, 136.80999756, 87.95999908), (21.51000023, 293.6499939 , 58.47999954, 11.15999985, 134.03999329, 55.34999847, 157.69999695, 136.66999817, 88.08999634)], dtype=["AA","AAPL"] Rename (pp.rename) closing, , [(37.24206924, 100.45429993, 44.57522202, 20.72605705, 130.59109497, 35.80251312, 41.9791832 , 81.51140594, 66.33999634), (35.08446503, 97.62433624, 43.83200836, 20.34561157, 128.53627014, 35.80251312, 41.59314346, 80.89860535, 66.15000153), (35.34244537, 97.63354492, 42.79874039, 19.90727234, 125.76422119, 36.07437897, 40.98268127, 80.28580475, 64.58000183), ..., (21.57999992, 289.79998779, 59.08000183, 11.18000031, 135.27000427, 55.34999847, 158.96000671, 137.53999329, 88.37000275), (21.34000015, 291.51998901, 58.65999985, 11.07999992, 132.80999756, 55.27000046, 157.58999634, 136.80999756, 87.95999908), (21.51000023, 293.6499939 , 58.47999954, 11.15999985, 134.03999329, 55.34999847, 157.69999695, 136.66999817, 88.08999634)] [["Date","Adj_Close","Volume"] array ([:5], dtype=[:5] (pp.rename) closing, (AA) [['Ticker','Date','Adj_Close'] , (GALLY) () ) array ([:5], dtype=[('GAP', ' (Statistics) described=(pp.describe (closing) [closing["IBM"] Describe (observations (minimum) (maximum) (mean) (variance) (skewness) (kurtosis) (AA) 230011. [0] 66 () (0.) - 0. 82 (AAPL) [:5]
PandaPy Speed Over Pandas In (X) eg, (dropnarow) ( (x) ["Volume"]
(Array Structure) Read In Arrays (read) To Pandas (unstructured) Pandas to Structured (structured) To Unstructured (to_unstruct) To Structured (to_struct) Print Table (table)
['High','Low','Open','Close','Volume','Adj_Close','Adj_Close_lag_1','Adj_Close_lag_2','Adj_Close_lag_3','Adj_Close_lag_4','Adj_Close_lag_5'] Explorative Functions Descriptive Statistics (describe) (5x) Correlation Array (corr) (2x) [(315.13000488, 298.79998779, 306.1000061 , 310.11999512, 11658600, 310.11999512, 272.95330665, 272.38631982, 271.75180703, 271.10991915, 270.48587024), (309.3999939 , 297.38000488, 307. , 300.35998535, 6965200, 300.35998535, 310.11999512, 272.38631982, 271.75180703, 271.10991915, 270.48587024), (318. , 302.73001099, 306. , 317.69000244, 7394100, 317.69000244, 300.35998535, 310.11999512, 271.75180703, 271.10991915, 270.48587024), (336.73999023, 317.75 , 321.72000122, 334.95999146, 7551200, 334.95999146, 317.69000244, 300.35998535, 310.11999512, 271.10991915, 270.48587024), (344.01000977, 327.01998901, 341.95999146, 335.3500061 , 7008500, 335.3500061 , 334.95999146, 317.69000244, 300.35998535, 310.11999512, 270.48587024)] (Finance Functions) (Returns (returns) (73 x) Portfolio Value (portfolio_value) ( (x) Cummulative Value (cummulative_return) ( (x) Column Lags (lags) (7x) ['High','Low','Open','Close','Volume','Adj_Close','Adj_Close_lag_1','Adj_Close_lag_2','Adj_Close_lag_3','Adj_Close_lag_4','Adj_Close_lag_5'] (Array Functions) Drop Null Rows (dropnarow) (54 x) Drop Column / s (drop) (139 x) Add Column / s (add) (3x) Concatenate (concat) (rows [[(44.57522201538086, 20.726057052612305), (43.832008361816406, 20.345611572265625), (42.79874038696289, 19.907272338867188), ..., (59.08000183105469, 11.180000305175781), (58.65999984741211, 11.079999923706055), (58.47999954223633, 11.15999984741211)] x columns 94 x) Merge (merge) (2x) Group by (group) ( Pivot (pivot) (48 x) Fill Nulls (fillna) (46 x) Shift Column (shift) ( (x) Rename (rename) ( [(315.13000488, 298.79998779, 306.1000061 , 310.11999512, 11658600, 310.11999512, 272.95330665, 272.38631982, 271.75180703, 271.10991915, 270.48587024), (309.3999939 , 297.38000488, 307. , 300.35998535, 6965200, 300.35998535, 310.11999512, 272.38631982, 271.75180703, 271.10991915, 270.48587024), (318. , 302.73001099, 306. , 317.69000244, 7394100, 317.69000244, 300.35998535, 310.11999512, 271.75180703, 271.10991915, 270.48587024), (336.73999023, 317.75 , 321.72000122, 334.95999146, 7551200, 334.95999146, 317.69000244, 300.35998535, 310.11999512, 271.10991915, 270.48587024), (344.01000977, 327.01998901, 341.95999146, 335.3500061 , 7008500, 335.3500061 , 334.95999146, 317.69000244, 300.35998535, 310.11999512, 270.48587024)] (Other Speed Tests) (Update) array [col]=values) (84 x) Addition (array [col] array [col]) ([:5] Multiplication (array [col] array [col]) ([:5] Log (np.log (array [col]) ( note speed tests done on financial dataset only Documentation by Example
Read In Arrays # (First Example) multiple_stocks multiple_stocks=(pp.read) (') https://github.com/firmai/random-assets-two/ blob / master / numpy / multiple_stocks.csv? raw=true (') ) closing=multiple_stocks [['Ticker','Date','Adj_Close']] piv=(pp.pivot) closing,
(Date) , () [["Date","Adj_Close","Volume"] (Ticker) " , " Adj_Close ["AA","AAPL"] ); piv closing=(pp.to_struct) piv, (name_list)= [["Date","Adj_Close","Volume"]))) (#) (Second Example) tsla=(pp.read) (') https://github.com/firmai/random-assets-two/ raw / master / numpy / tsla.csv ' crm=(pp.read) (') https://github.com/firmai/random-assets-two/ raw / master / numpy / crm.csv ' tsla_sub= tsla [["Date","Adj_Close","Volume"]] crm_sub= crm [["Date","Adj_Close","Volume"]] crm_adj=(crm) ] ['mean', 'std'] closing
array ([(37.24206924, 100.45429993, 44.57522202, 20.72605705, 130.59109497, 35.80251312, 41.9791832 , 81.51140594, 66.33999634), (35.08446503, 97.62433624, 43.83200836, 20.34561157, 128.53627014, 35.80251312, 41.59314346, 80.89860535, 66.15000153), (35.34244537, 97.63354492, 42.79874039, 19.90727234, 125.76422119, 36.07437897, 40.98268127, 80.28580475, 64.58000183), ..., (21.57999992, 289.79998779, 59.08000183, 11.18000031, 135.27000427, 55.34999847, 158.96000671, 137.53999329, 88.37000275), (21.34000015, 291.51998901, 58.65999985, 11.07999992, 132.80999756, 55.27000046, 157.58999634, 136.80999756, 87.95999908), (21.51000023, 293.6499939 , 58.47999954, 11.15999985, 134.03999329, 55.34999847, 157.69999695, 136.66999817, 88.08999634)], dtype=["AA","AAPL"] Rename (pp.rename) closing, , [(37.24206924, 100.45429993, 44.57522202, 20.72605705, 130.59109497, 35.80251312, 41.9791832 , 81.51140594, 66.33999634), (35.08446503, 97.62433624, 43.83200836, 20.34561157, 128.53627014, 35.80251312, 41.59314346, 80.89860535, 66.15000153), (35.34244537, 97.63354492, 42.79874039, 19.90727234, 125.76422119, 36.07437897, 40.98268127, 80.28580475, 64.58000183), ..., (21.57999992, 289.79998779, 59.08000183, 11.18000031, 135.27000427, 55.34999847, 158.96000671, 137.53999329, 88.37000275), (21.34000015, 291.51998901, 58.65999985, 11.07999992, 132.80999756, 55.27000046, 157.58999634, 136.80999756, 87.95999908), (21.51000023, 293.6499939 , 58.47999954, 11.15999985, 134.03999329, 55.34999847, 157.69999695, 136.66999817, 88.08999634)] [["Date","Adj_Close","Volume"] array ([:5], dtype=[:5] (pp.rename) closing, (AA) [['Ticker','Date','Adj_Close'] , (GALLY) () ) array ([:5], dtype=[('GAP', ' (Statistics) described=(pp.describe (closing) [closing["IBM"] Describe (observations (minimum) (maximum) (mean) (variance) (skewness) (kurtosis) (AA) 230011. [0] 66 () (0.) - 0. 82 (AAPL) [:5]
(Array Structure) Read In Arrays (read) To Pandas (unstructured) Pandas to Structured (structured) To Unstructured (to_unstruct) To Structured (to_struct) Print Table (table)
['High','Low','Open','Close','Volume','Adj_Close','Adj_Close_lag_1','Adj_Close_lag_2','Adj_Close_lag_3','Adj_Close_lag_4','Adj_Close_lag_5'] Explorative Functions Descriptive Statistics (describe) (5x) Correlation Array (corr) (2x) [(315.13000488, 298.79998779, 306.1000061 , 310.11999512, 11658600, 310.11999512, 272.95330665, 272.38631982, 271.75180703, 271.10991915, 270.48587024), (309.3999939 , 297.38000488, 307. , 300.35998535, 6965200, 300.35998535, 310.11999512, 272.38631982, 271.75180703, 271.10991915, 270.48587024), (318. , 302.73001099, 306. , 317.69000244, 7394100, 317.69000244, 300.35998535, 310.11999512, 271.75180703, 271.10991915, 270.48587024), (336.73999023, 317.75 , 321.72000122, 334.95999146, 7551200, 334.95999146, 317.69000244, 300.35998535, 310.11999512, 271.10991915, 270.48587024), (344.01000977, 327.01998901, 341.95999146, 335.3500061 , 7008500, 335.3500061 , 334.95999146, 317.69000244, 300.35998535, 310.11999512, 270.48587024)] (Finance Functions) (Returns (returns) (73 x) Portfolio Value (portfolio_value) ( (x) Cummulative Value (cummulative_return) ( (x) Column Lags (lags) (7x) ['High','Low','Open','Close','Volume','Adj_Close','Adj_Close_lag_1','Adj_Close_lag_2','Adj_Close_lag_3','Adj_Close_lag_4','Adj_Close_lag_5'] (Array Functions) Drop Null Rows (dropnarow) (54 x) Drop Column / s (drop) (139 x) Add Column / s (add) (3x) Concatenate (concat) (rows [[(44.57522201538086, 20.726057052612305), (43.832008361816406, 20.345611572265625), (42.79874038696289, 19.907272338867188), ..., (59.08000183105469, 11.180000305175781), (58.65999984741211, 11.079999923706055), (58.47999954223633, 11.15999984741211)] x columns 94 x) Merge (merge) (2x) Group by (group) ( Pivot (pivot) (48 x) Fill Nulls (fillna) (46 x) Shift Column (shift) ( (x) Rename (rename) ( [(315.13000488, 298.79998779, 306.1000061 , 310.11999512, 11658600, 310.11999512, 272.95330665, 272.38631982, 271.75180703, 271.10991915, 270.48587024), (309.3999939 , 297.38000488, 307. , 300.35998535, 6965200, 300.35998535, 310.11999512, 272.38631982, 271.75180703, 271.10991915, 270.48587024), (318. , 302.73001099, 306. , 317.69000244, 7394100, 317.69000244, 300.35998535, 310.11999512, 271.75180703, 271.10991915, 270.48587024), (336.73999023, 317.75 , 321.72000122, 334.95999146, 7551200, 334.95999146, 317.69000244, 300.35998535, 310.11999512, 271.10991915, 270.48587024), (344.01000977, 327.01998901, 341.95999146, 335.3500061 , 7008500, 335.3500061 , 334.95999146, 317.69000244, 300.35998535, 310.11999512, 270.48587024)] (Other Speed Tests) (Update) array [col]=values) (84 x) Addition (array [col] array [col]) ([:5] Multiplication (array [col] array [col]) ([:5] Log (np.log (array [col]) ( note speed tests done on financial dataset only Documentation by Example
Read In Arrays # (First Example) multiple_stocks multiple_stocks=(pp.read) (') https://github.com/firmai/random-assets-two/ blob / master / numpy / multiple_stocks.csv? raw=true (') ) closing=multiple_stocks [['Ticker','Date','Adj_Close']] piv=(pp.pivot) closing,
(Date) , () [["Date","Adj_Close","Volume"] (Ticker) " , " Adj_Close ["AA","AAPL"] ); piv closing=(pp.to_struct) piv, (name_list)= [["Date","Adj_Close","Volume"]))) (#) (Second Example) tsla=(pp.read) (') https://github.com/firmai/random-assets-two/ raw / master / numpy / tsla.csv ' crm=(pp.read) (') https://github.com/firmai/random-assets-two/ raw / master / numpy / crm.csv ' tsla_sub= tsla [["Date","Adj_Close","Volume"]] crm_sub= crm [["Date","Adj_Close","Volume"]] crm_adj=(crm) ] ['mean', 'std'] closing
array ([(37.24206924, 100.45429993, 44.57522202, 20.72605705, 130.59109497, 35.80251312, 41.9791832 , 81.51140594, 66.33999634), (35.08446503, 97.62433624, 43.83200836, 20.34561157, 128.53627014, 35.80251312, 41.59314346, 80.89860535, 66.15000153), (35.34244537, 97.63354492, 42.79874039, 19.90727234, 125.76422119, 36.07437897, 40.98268127, 80.28580475, 64.58000183), ..., (21.57999992, 289.79998779, 59.08000183, 11.18000031, 135.27000427, 55.34999847, 158.96000671, 137.53999329, 88.37000275), (21.34000015, 291.51998901, 58.65999985, 11.07999992, 132.80999756, 55.27000046, 157.58999634, 136.80999756, 87.95999908), (21.51000023, 293.6499939 , 58.47999954, 11.15999985, 134.03999329, 55.34999847, 157.69999695, 136.66999817, 88.08999634)], dtype=["AA","AAPL"] Rename (pp.rename) closing, , [(37.24206924, 100.45429993, 44.57522202, 20.72605705, 130.59109497, 35.80251312, 41.9791832 , 81.51140594, 66.33999634), (35.08446503, 97.62433624, 43.83200836, 20.34561157, 128.53627014, 35.80251312, 41.59314346, 80.89860535, 66.15000153), (35.34244537, 97.63354492, 42.79874039, 19.90727234, 125.76422119, 36.07437897, 40.98268127, 80.28580475, 64.58000183), ..., (21.57999992, 289.79998779, 59.08000183, 11.18000031, 135.27000427, 55.34999847, 158.96000671, 137.53999329, 88.37000275), (21.34000015, 291.51998901, 58.65999985, 11.07999992, 132.80999756, 55.27000046, 157.58999634, 136.80999756, 87.95999908), (21.51000023, 293.6499939 , 58.47999954, 11.15999985, 134.03999329, 55.34999847, 157.69999695, 136.66999817, 88.08999634)] [["Date","Adj_Close","Volume"] array ([:5], dtype=[:5] (pp.rename) closing, (AA) [['Ticker','Date','Adj_Close'] , (GALLY) () ) array ([:5], dtype=[('GAP', ' (Statistics) described=(pp.describe (closing) [closing["IBM"] Describe (observations (minimum) (maximum) (mean) (variance) (skewness) (kurtosis) (AA) 230011. [0] 66 () (0.) - 0. 82 (AAPL) [:5]
['High','Low','Open','Close','Volume','Adj_Close','Adj_Close_lag_1','Adj_Close_lag_2','Adj_Close_lag_3','Adj_Close_lag_4','Adj_Close_lag_5'] Explorative Functions Descriptive Statistics (describe) (5x) Correlation Array (corr) (2x) [(315.13000488, 298.79998779, 306.1000061 , 310.11999512, 11658600, 310.11999512, 272.95330665, 272.38631982, 271.75180703, 271.10991915, 270.48587024), (309.3999939 , 297.38000488, 307. , 300.35998535, 6965200, 300.35998535, 310.11999512, 272.38631982, 271.75180703, 271.10991915, 270.48587024), (318. , 302.73001099, 306. , 317.69000244, 7394100, 317.69000244, 300.35998535, 310.11999512, 271.75180703, 271.10991915, 270.48587024), (336.73999023, 317.75 , 321.72000122, 334.95999146, 7551200, 334.95999146, 317.69000244, 300.35998535, 310.11999512, 271.10991915, 270.48587024), (344.01000977, 327.01998901, 341.95999146, 335.3500061 , 7008500, 335.3500061 , 334.95999146, 317.69000244, 300.35998535, 310.11999512, 270.48587024)] (Finance Functions) (Returns (returns) (73 x) Portfolio Value (portfolio_value) ( (x) Cummulative Value (cummulative_return) ( (x) Column Lags (lags) (7x) ['High','Low','Open','Close','Volume','Adj_Close','Adj_Close_lag_1','Adj_Close_lag_2','Adj_Close_lag_3','Adj_Close_lag_4','Adj_Close_lag_5'] (Array Functions) Drop Null Rows (dropnarow) (54 x) Drop Column / s (drop) (139 x) Add Column / s (add) (3x) Concatenate (concat) (rows [[(44.57522201538086, 20.726057052612305), (43.832008361816406, 20.345611572265625), (42.79874038696289, 19.907272338867188), ..., (59.08000183105469, 11.180000305175781), (58.65999984741211, 11.079999923706055), (58.47999954223633, 11.15999984741211)] x columns 94 x) Merge (merge) (2x) Group by (group) ( Pivot (pivot) (48 x) Fill Nulls (fillna) (46 x) Shift Column (shift) ( (x) Rename (rename) ( [(315.13000488, 298.79998779, 306.1000061 , 310.11999512, 11658600, 310.11999512, 272.95330665, 272.38631982, 271.75180703, 271.10991915, 270.48587024), (309.3999939 , 297.38000488, 307. , 300.35998535, 6965200, 300.35998535, 310.11999512, 272.38631982, 271.75180703, 271.10991915, 270.48587024), (318. , 302.73001099, 306. , 317.69000244, 7394100, 317.69000244, 300.35998535, 310.11999512, 271.75180703, 271.10991915, 270.48587024), (336.73999023, 317.75 , 321.72000122, 334.95999146, 7551200, 334.95999146, 317.69000244, 300.35998535, 310.11999512, 271.10991915, 270.48587024), (344.01000977, 327.01998901, 341.95999146, 335.3500061 , 7008500, 335.3500061 , 334.95999146, 317.69000244, 300.35998535, 310.11999512, 270.48587024)] (Other Speed Tests) (Update) array [col]=values) (84 x) Addition (array [col] array [col]) ([:5] Multiplication (array [col] array [col]) ([:5] Log (np.log (array [col]) ( note speed tests done on financial dataset only Documentation by Example
Read In Arrays # (First Example) multiple_stocks multiple_stocks=(pp.read) (') https://github.com/firmai/random-assets-two/ blob / master / numpy / multiple_stocks.csv? raw=true (') ) closing=multiple_stocks [['Ticker','Date','Adj_Close']] piv=(pp.pivot) closing,
(Date) , () [["Date","Adj_Close","Volume"] (Ticker) " , " Adj_Close ["AA","AAPL"] ); piv closing=(pp.to_struct) piv, (name_list)= [["Date","Adj_Close","Volume"]))) (#) (Second Example) tsla=(pp.read) (') https://github.com/firmai/random-assets-two/ raw / master / numpy / tsla.csv ' crm=(pp.read) (') https://github.com/firmai/random-assets-two/ raw / master / numpy / crm.csv ' tsla_sub= tsla [["Date","Adj_Close","Volume"]] crm_sub= crm [["Date","Adj_Close","Volume"]] crm_adj=(crm) ] ['mean', 'std'] closing
array ([(37.24206924, 100.45429993, 44.57522202, 20.72605705, 130.59109497, 35.80251312, 41.9791832 , 81.51140594, 66.33999634), (35.08446503, 97.62433624, 43.83200836, 20.34561157, 128.53627014, 35.80251312, 41.59314346, 80.89860535, 66.15000153), (35.34244537, 97.63354492, 42.79874039, 19.90727234, 125.76422119, 36.07437897, 40.98268127, 80.28580475, 64.58000183), ..., (21.57999992, 289.79998779, 59.08000183, 11.18000031, 135.27000427, 55.34999847, 158.96000671, 137.53999329, 88.37000275), (21.34000015, 291.51998901, 58.65999985, 11.07999992, 132.80999756, 55.27000046, 157.58999634, 136.80999756, 87.95999908), (21.51000023, 293.6499939 , 58.47999954, 11.15999985, 134.03999329, 55.34999847, 157.69999695, 136.66999817, 88.08999634)], dtype=["AA","AAPL"] Rename (pp.rename) closing, , [(37.24206924, 100.45429993, 44.57522202, 20.72605705, 130.59109497, 35.80251312, 41.9791832 , 81.51140594, 66.33999634), (35.08446503, 97.62433624, 43.83200836, 20.34561157, 128.53627014, 35.80251312, 41.59314346, 80.89860535, 66.15000153), (35.34244537, 97.63354492, 42.79874039, 19.90727234, 125.76422119, 36.07437897, 40.98268127, 80.28580475, 64.58000183), ..., (21.57999992, 289.79998779, 59.08000183, 11.18000031, 135.27000427, 55.34999847, 158.96000671, 137.53999329, 88.37000275), (21.34000015, 291.51998901, 58.65999985, 11.07999992, 132.80999756, 55.27000046, 157.58999634, 136.80999756, 87.95999908), (21.51000023, 293.6499939 , 58.47999954, 11.15999985, 134.03999329, 55.34999847, 157.69999695, 136.66999817, 88.08999634)] [["Date","Adj_Close","Volume"] array ([:5], dtype=[:5] (pp.rename) closing, (AA) [['Ticker','Date','Adj_Close'] , (GALLY) () ) array ([:5], dtype=[('GAP', ' (Statistics) described=(pp.describe (closing) [closing["IBM"] Describe (observations (minimum) (maximum) (mean) (variance) (skewness) (kurtosis) (AA) 230011. [0] 66 () (0.) - 0. 82 (AAPL) [:5]