The one stop shop to learn about data intake, processing, and visualization.

The Dataplay Handbook uses techniques covered in the Datalabs Guidebook.

Open Source NPM License Active Python Versions GitHub last commit No Maintenance Intended

GitHub stars GitHub watchers GitHub forks GitHub followers

Tweet Twitter Follow

Install

The code is on PyPI so you can just run:

pip install dataplay geopandas dexplot

From the terminal to install the code and its dependencies

How to use

Import the installed module into your code and use like so:

from dataplay.acsDownload import retrieve_acs_data 
retrieve_acs_data(state, county, tract, tableId, year, saveAcs)

and

from dataplay.merge import mergeDatasets
mergeDatasets(left_ds=False, right_ds=False, crosswalk_ds=False,  use_crosswalk = True, left_col=False, right_col=False, crosswalk_left_col = False, crosswalk_right_col = False, merge_how=False, interactive=True)

Heres an example:

Define our download parameters.

More information on these parameters can be found in the tutorials!

tract = '*'
county = '510'
state = '24'
tableId = 'B19001'
year = '17'
saveAcs = False
df = retrieve_acs_data(state, county, tract, tableId, year, saveAcs)
df.head()
Number of Columns 17
B19001_001E_Total B19001_002E_Total_Less_than_$10_000 B19001_003E_Total_$10_000_to_$14_999 ... state county tract
NAME
Census Tract 1901 796 237 76 ... 24 510 190100
Census Tract 1902 695 63 87 ... 24 510 190200
Census Tract 2201 2208 137 229 ... 24 510 220100
Census Tract 2303 632 3 20 ... 24 510 230300
Census Tract 2502.07 836 102 28 ... 24 510 250207

5 rows × 20 columns

# Primary Table
left_ds = df
left_col = 'tract'

# Crosswalk Table
# Table: Crosswalk Census Communities
# 'TRACT2010', 'GEOID2010', 'CSA2010'
crosswalk_ds = 'https://docs.google.com/spreadsheets/d/e/2PACX-1vREwwa_s8Ix39OYGnnS_wA8flOoEkU7reIV4o3ZhlwYhLXhpNEvnOia_uHUDBvnFptkLLHHlaQNvsQE/pub?output=csv'
use_crosswalk = True
crosswalk_left_col = 'TRACT2010'
crosswalk_right_col = 'GEOID2010'

# Secondary Table
# Table: Baltimore Boundaries
# 'TRACTCE10', 'GEOID10', 'CSA', 'NAME10', 'Tract', 'geometry'
right_ds = 'https://docs.google.com/spreadsheets/d/e/2PACX-1vQ8xXdUaT17jkdK0MWTJpg3GOy6jMWeaXTlguXNjCSb8Vr_FanSZQRaTU-m811fQz4kyMFK5wcahMNY/pub?gid=886223646&single=true&output=csv'
right_col ='GEOID10'

merge_how = 'geometry'
interactive = True
merge_how = 'outer'

banksPd = mergeDatasets( left_ds=left_ds, left_col=left_col, 
              use_crosswalk=use_crosswalk, crosswalk_ds=crosswalk_ds,
              crosswalk_left_col = crosswalk_left_col, crosswalk_right_col = crosswalk_right_col,
              right_ds=right_ds, right_col=right_col, 
              merge_how=merge_how, interactive = interactive )
 Handling Left Dataset
retrieveDatasetFromUrl                       B19001_001E_Total  \
NAME                                      
Census Tract 1901                   796   
Census Tract 1902                   695   
Census Tract 2201                  2208   
Census Tract 2303                   632   
Census Tract 2502.07                836   
...                                 ...   
Census Tract 2720.05               1219   
Census Tract 1202.01                883   
Census Tract 2720.04               1835   
Census Tract 2720.06               1679   
Baltimore City                   239791   

                      B19001_002E_Total_Less_than_$10_000  \
NAME                                                        
Census Tract 1901                     237                   
Census Tract 1902                      63                   
Census Tract 2201                     137                   
Census Tract 2303                       3                   
Census Tract 2502.07                  102                   
...                                   ...                   
Census Tract 2720.05                   84                   
Census Tract 1202.01                   78                   
Census Tract 2720.04                  155                   
Census Tract 2720.06                  347                   
Baltimore City                      29106                   

                      B19001_003E_Total_$10_000_to_$14_999  \
NAME                                                         
Census Tract 1901                      76                    
Census Tract 1902                      87                    
Census Tract 2201                     229                    
Census Tract 2303                      20                    
Census Tract 2502.07                   28                    
...                                   ...                    
Census Tract 2720.05                   41                    
Census Tract 1202.01                   27                    
Census Tract 2720.04                  109                    
Census Tract 2720.06                  165                    
Baltimore City                      15759                    

                      ...  \
NAME                  ...   
Census Tract 1901     ...   
Census Tract 1902     ...   
Census Tract 2201     ...   
Census Tract 2303     ...   
Census Tract 2502.07  ...   
...                   ...   
Census Tract 2720.05  ...   
Census Tract 1202.01  ...   
Census Tract 2720.04  ...   
Census Tract 2720.06  ...   
Baltimore City        ...   

                      state  \
NAME                          
Census Tract 1901        24   
Census Tract 1902        24   
Census Tract 2201        24   
Census Tract 2303        24   
Census Tract 2502.07     24   
...                     ...   
Census Tract 2720.05     24   
Census Tract 1202.01     24   
Census Tract 2720.04     24   
Census Tract 2720.06     24   
Baltimore City           24   

                      county  \
NAME                           
Census Tract 1901        510   
Census Tract 1902        510   
Census Tract 2201        510   
Census Tract 2303        510   
Census Tract 2502.07     510   
...                      ...   
Census Tract 2720.05     510   
Census Tract 1202.01     510   
Census Tract 2720.04     510   
Census Tract 2720.06     510   
Baltimore City           510   

                       tract  
NAME                          
Census Tract 1901     190100  
Census Tract 1902     190200  
Census Tract 2201     220100  
Census Tract 2303     230300  
Census Tract 2502.07  250207  
...                      ...  
Census Tract 2720.05  272005  
Census Tract 1202.01  120201  
Census Tract 2720.04  272004  
Census Tract 2720.06  272006  
Baltimore City         10000  

[201 rows x 20 columns]
checkDataSetExists True
checkDataSetExists True
checkDataSetExists True
Left Dataset and Columns are Valid

 Handling Right Dataset
retrieveDatasetFromUrl https://docs.google.com/spreadsheets/d/e/2PACX-1vQ8xXdUaT17jkdK0MWTJpg3GOy6jMWeaXTlguXNjCSb8Vr_FanSZQRaTU-m811fQz4kyMFK5wcahMNY/pub?gid=886223646&single=true&output=csv
checkDataSetExists False
retrieveDatasetFromUrl https://docs.google.com/spreadsheets/d/e/2PACX-1vQ8xXdUaT17jkdK0MWTJpg3GOy6jMWeaXTlguXNjCSb8Vr_FanSZQRaTU-m811fQz4kyMFK5wcahMNY/pub?gid=886223646&single=true&output=csv
checkDataSetExists True
checkDataSetExists True
checkDataSetExists True
Right Dataset and Columns are Valid

 Checking the merge_how Parameter
merge_how operator is Valid outer
checkDataSetExists False

 Checking the Crosswalk Parameter

 Handling Crosswalk Left Dataset Loading
retrieveDatasetFromUrl https://docs.google.com/spreadsheets/d/e/2PACX-1vREwwa_s8Ix39OYGnnS_wA8flOoEkU7reIV4o3ZhlwYhLXhpNEvnOia_uHUDBvnFptkLLHHlaQNvsQE/pub?output=csv
checkDataSetExists False
retrieveDatasetFromUrl https://docs.google.com/spreadsheets/d/e/2PACX-1vREwwa_s8Ix39OYGnnS_wA8flOoEkU7reIV4o3ZhlwYhLXhpNEvnOia_uHUDBvnFptkLLHHlaQNvsQE/pub?output=csv
checkDataSetExists True
checkDataSetExists True
checkDataSetExists True

 Handling Crosswalk Right Dataset Loading
retrieveDatasetFromUrl https://docs.google.com/spreadsheets/d/e/2PACX-1vREwwa_s8Ix39OYGnnS_wA8flOoEkU7reIV4o3ZhlwYhLXhpNEvnOia_uHUDBvnFptkLLHHlaQNvsQE/pub?output=csv
checkDataSetExists False
retrieveDatasetFromUrl https://docs.google.com/spreadsheets/d/e/2PACX-1vREwwa_s8Ix39OYGnnS_wA8flOoEkU7reIV4o3ZhlwYhLXhpNEvnOia_uHUDBvnFptkLLHHlaQNvsQE/pub?output=csv
checkDataSetExists True
checkDataSetExists True
checkDataSetExists True

 Assessment Completed

 Ensuring Left->Crosswalk compatability

 Ensuring Crosswalk->Right compatability
PERFORMING MERGE LEFT->CROSSWALK
left_on TRACT2010 right_on GEOID2010 how outer
PERFORMING MERGE LEFT->RIGHT
left_col GEOID2010 right_col GEOID10 how outer

 Local Column Values Not Matched 
[0]
1

 Crosswalk Unique Column Values
[24510151000 24510080700 24510080500 24510150500 24510120100 24510090900
 24510280301 24510130803 24510130700 24510130600 24510100100 24510110100
 24510270501 24510270302 24510270401 24510120700 24510271200 24510110200
 24510271002 24510280404 24510270804 24510260203 24510260101 24510260102
 24510090800 24510090300 24510270801 24510120400 24510090200 24510271001
 24510130200 24510140100 24510270600 24510270701 24510130100 24510270803
 24510280200 24510280302 24510130804 24510271101 24510271102 24510150800
 24510270301 24510170100 24510090500 24510170200 24510090600 24510120300
 24510120500 24510130300 24510120600 24510100200 24510150400 24510261000
 24510280403 24510010400 24510250303 24510260303 24510200701 24510272003
 24510070200 24510280102 24510151200 24510260900 24510200400 24510261100
 24510200500 24510250103 24510260301 24510200600 24510130806 24510270702
 24510180200 24510190100 24510270805 24510200200 24510150702 24510270402
 24510250206 24510150701 24510151100 24510040100 24510270101 24510270200
 24510190200 24510271501 24510210100 24510180300 24510180100 24510150100
 24510200300 24510200100 24510090700 24510190300 24510090400 24510200702
 24510250500 24510280401 24510160801 24510160802 24510270703 24510220100
 24510250301 24510270502 24510030100 24510020200 24510250600 24510240200
 24510150900 24510020300 24510270102 24510250207 24510030200 24510250101
 24510280402 24510080102 24510040200 24510200800 24510270903 24510060200
 24510260800 24510160400 24510280101 24510250401 24510240400 24510250102
 24510250205 24510240300 24510271802 24510060100 24510010300 24510010200
 24510270902 24510010100 24510270901 24510270802 24510260605 24510250402
 24510271801 24510260201 24510260401 24510271300 24510230100 24510080101
 24510060300 24510140200 24510160100 24510160200 24510260404 24510150300
 24510150200 24510160700 24510260202 24510271400 24510130805 24510140300
 24510170300 24510080302 24510100300 24510260501 24510160300 24510130400
 24510160600 24510271600 24510271700 24510151300 24510210200 24510271503
 24510060400 24510250204 24510070400 24510230200 24510240100 24510020100
 24510260604 24510120202 24510272007 24510272005 24510230300 24510260302
 24510080200 24510080301 24510010500 24510070100 24510250203 24510070300
 24510080600 24510271900 24510080400 24510120201 24510272004 24510272006
 24510280500 24510260403 24510150600 24510080800 24510160500 24510090100
 24510260402 24510260700]
/usr/local/lib/python3.6/dist-packages/pandas/core/ops/array_ops.py:253: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  res_values = method(rvalues)
banksPd.head()
B19001_001E_Total B19001_002E_Total_Less_than_$10_000 B19001_003E_Total_$10_000_to_$14_999 ... CSA Tract geometry
0 796 237 76 ... Southwest Baltimore 1901.0 POLYGON ((-76.63...
1 695 63 87 ... Southwest Baltimore 1902.0 POLYGON ((-76.63...
2 2208 137 229 ... Inner Harbor/Fed... 2201.0 MULTIPOLYGON (((...
3 632 3 20 ... South Baltimore 2303.0 MULTIPOLYGON (((...
4 836 102 28 ... Cherry Hill 2502.0 POLYGON ((-76.62...

5 rows × 27 columns

type(banksPd)
pandas.core.frame.DataFrame
from dataplay.geoms import readInGeometryData
/usr/local/lib/python3.6/dist-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
  """)
/content/drive/My Drive/dataplay/dataplay/acsDownload.py:27: FutureWarning: Passing a negative integer is deprecated in version 1.0 and will not be supported in future version. Instead, use None to not limit the column width.
  pd.set_option('display.max_colwidth', -1)
csaMap = readInGeometryData(url=banksPd, porg='g', geom='geometry', lat=False, lng=False, revgeocode=False, save=False, in_crs=2248, out_crs=2248)
isGeoDataframe
RECIEVED url:      B19001_001E_Total  \
0                  796   
1                  695   
2                 2208   
3                  632   
4                  836   
..                 ...   
195               1848   
196               1219   
197                883   
198               1835   
199               1679   

     B19001_002E_Total_Less_than_$10_000  \
0                    237                   
1                     63                   
2                    137                   
3                      3                   
4                    102                   
..                   ...                   
195                  153                   
196                   84                   
197                   78                   
198                  155                   
199                  347                   

     B19001_003E_Total_$10_000_to_$14_999  \
0                     76                    
1                     87                    
2                    229                    
3                     20                    
4                     28                    
..                   ...                    
195                   68                    
196                   41                    
197                   27                    
198                  109                    
199                  165                    

     ...  \
0    ...   
1    ...   
2    ...   
3    ...   
4    ...   
..   ...   
195  ...   
196  ...   
197  ...   
198  ...   
199  ...   

                     CSA  \
0    Southwest Baltimore   
1    Southwest Baltimore   
2    Inner Harbor/Fed...   
3        South Baltimore   
4            Cherry Hill   
..                   ...   
195       Glen-Fallstaff   
196  Cross-Country/Ch...   
197  Greater Charles ...   
198  Cross-Country/Ch...   
199       Glen-Fallstaff   

      Tract  \
0    1901.0   
1    1902.0   
2    2201.0   
3    2303.0   
4    2502.0   
..      ...   
195  2720.0   
196  2720.0   
197  1202.0   
198  2720.0   
199  2720.0   

                geometry  
0    POLYGON ((-76.63...  
1    POLYGON ((-76.63...  
2    MULTIPOLYGON (((...  
3    MULTIPOLYGON (((...  
4    POLYGON ((-76.62...  
..                   ...  
195  POLYGON ((-76.69...  
196  POLYGON ((-76.69...  
197  POLYGON ((-76.60...  
198  POLYGON ((-76.69...  
199  POLYGON ((-76.68...  

[200 rows x 27 columns], 
 porg: g, 
 geom: geometry, 
 lat: False, 
 lng: False, 
 revgeocode: False, 
 in_crs: 2248, 
 out_crs: 2248
Index(['B19001_001E_Total',
       'B19001_002E_Total_Less_than_$10_000',
       'B19001_003E_Total_$10_000_to_$14_999',
       'B19001_004E_Total_$15_000_to_$19_999',
       'B19001_005E_Total_$20_000_to_$24_999',
       'B19001_006E_Total_$25_000_to_$29_999',
       'B19001_007E_Total_$30_000_to_$34_999',
       'B19001_008E_Total_$35_000_to_$39_999',
       'B19001_009E_Total_$40_000_to_$44_999',
       'B19001_010E_Total_$45_000_to_$49_999',
       'B19001_011E_Total_$50_000_to_$59_999',
       'B19001_012E_Total_$60_000_to_$74_999',
       'B19001_013E_Total_$75_000_to_$99_999',
       'B19001_014E_Total_$100_000_to_$124_999',
       'B19001_015E_Total_$125_000_to_$149_999',
       'B19001_016E_Total_$150_000_to_$199_999',
       'B19001_017E_Total_$200_000_or_more',
       'state',
       'county',
       'tract',
       'GEOID2010',
       'TRACTCE10',
       'GEOID10',
       'NAME10',
       'CSA',
       'Tract',
       'geometry'],
      dtype='object')
csaMap.columns
Index(['B19001_001E_Total',
       'B19001_002E_Total_Less_than_$10_000',
       'B19001_003E_Total_$10_000_to_$14_999',
       'B19001_004E_Total_$15_000_to_$19_999',
       'B19001_005E_Total_$20_000_to_$24_999',
       'B19001_006E_Total_$25_000_to_$29_999',
       'B19001_007E_Total_$30_000_to_$34_999',
       'B19001_008E_Total_$35_000_to_$39_999',
       'B19001_009E_Total_$40_000_to_$44_999',
       'B19001_010E_Total_$45_000_to_$49_999',
       'B19001_011E_Total_$50_000_to_$59_999',
       'B19001_012E_Total_$60_000_to_$74_999',
       'B19001_013E_Total_$75_000_to_$99_999',
       'B19001_014E_Total_$100_000_to_$124_999',
       'B19001_015E_Total_$125_000_to_$149_999',
       'B19001_016E_Total_$150_000_to_$199_999',
       'B19001_017E_Total_$200_000_or_more',
       'state',
       'county',
       'tract',
       'GEOID2010',
       'TRACTCE10',
       'GEOID10',
       'NAME10',
       'CSA',
       'Tract',
       'geometry'],
      dtype='object')
csaMap.plot(column='B19001_002E_Total_Less_than_$10_000')
<matplotlib.axes._subplots.AxesSubplot at 0x7f277d7b0630>
from dataplay.geoms import workWithGeometryData
foodPantryLocationsUrl = 'https://docs.google.com/spreadsheets/d/e/2PACX-1vT3lG0n542sIGE2O-C8fiXx-qUZG2WDO6ezRGcNsS4z8MM30XocVZ90P1UQOIXO2w/pub?gid=1152681223&single=true&output=csv'
crs = {'init' :'epsg:2248'} 
foodPantryLocations = readInGeometryData(url=foodPantryLocationsUrl, porg='p', geom=False, lat='Y', lng='X', revgeocode=False,  save=False, in_crs=crs, out_crs=crs)

panp = workWithGeometryData( 'pandp', foodPantryLocations[ foodPantryLocations.City_1 == 'Baltimore' ], csaMap, pntsClr='red', polyColorCol='B19001_002E_Total_Less_than_$10_000')
RECIEVED url: https://docs.google.com/spreadsheets/d/e/2PACX-1vT3lG0n542sIGE2O-C8fiXx-qUZG2WDO6ezRGcNsS4z8MM30XocVZ90P1UQOIXO2w/pub?gid=1152681223&single=true&output=csv, 
 porg: p, 
 geom: False, 
 lat: Y, 
 lng: X, 
 revgeocode: False, 
 in_crs: {'init': 'epsg:2248'}, 
 out_crs: {'init': 'epsg:2248'}
Index(['X',
       'Y',
       'OBJECTID',
       'Name',
       'Address',
       'City_1',
       'State',
       'Zip',
       '# in Zip',
       'FIPS'],
      dtype='object')
mapPointsandPolygons
/usr/local/lib/python3.6/dist-packages/pyproj/crs/crs.py:53: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
  return _prepare_from_string(" ".join(pjargs))
from dataplay.geoms import map_points
map_points(foodPantryLocations, lat_col='Y', lon_col='X', zoom_start=11, plot_points=True, pt_radius=15, draw_heatmap=True, heat_map_weights_col=None, heat_map_weights_normalize=True, heat_map_radius=15)
/usr/local/lib/python3.6/dist-packages/dataplay/geoms.py:190: FutureWarning: Method `add_children` is deprecated. Please use `add_child` instead.
  curr_map.add_children(plugins.HeatMap(stations, radius=heat_map_radius))
Make this Notebook Trusted to load map: File -> Trust Notebook