Flow Views & Automation (Tutorial) fail on compute_merchant_census_tracts Step

tgb417
Flow Views & Automation (Tutorial) fail on compute_merchant_census_tracts Step

I'm having problems completing the Academy Hands on Exercise at this point in the process.

https://knowledge.dataiku.com/latest/courses/automation/scenarios-hands-on.html#add-steps 

The compute_merchant_census_tracts Step is failing with the following error messages from the Census Plugin.  Get US census block group from lat lon .  The error seems to be some sort of timeout.  It is happening after 7 to 22 records are processed.  I've tried increasing the API Call threshold from 1 second to 20 seconds.  I've also updated DSS to 10.0.5 and the plugin to 0.3.5 and restated DSS.

2022-05-15 23:39:29,039 INFO 14 - processing: (41.74,-71.62899999999999)
/Users/testdss/Library/DataScienceStudio/dss_home/code-envs/python/plugin_census-us_managed/lib/python3.6/site-packages/urllib3/connectionpool.py:988: InsecureRequestWarning: Unverified HTTPS request is being made to host 'geocoding.geo.census.gov'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning,
2022-05-15 23:39:32,668 INFO No data to send, waiting more...
2022-05-15 23:39:32,668 INFO Waiting for data to send ...
2022-05-15 23:39:42,668 INFO No data to send, waiting more...
2022-05-15 23:39:42,668 INFO Waiting for data to send ...
2022-05-15 23:39:49,304 INFO 15 - processing: (41.778,-71.339)
/Users/testdss/Library/DataScienceStudio/dss_home/code-envs/python/plugin_census-us_managed/lib/python3.6/site-packages/urllib3/connectionpool.py:988: InsecureRequestWarning: Unverified HTTPS request is being made to host 'geocoding.geo.census.gov'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning,
2022-05-15 23:39:52,672 INFO No data to send, waiting more...
2022-05-15 23:39:52,673 INFO Waiting for data to send ...
2022-05-15 23:40:02,674 INFO No data to send, waiting more...
2022-05-15 23:40:02,674 INFO Waiting for data to send ...
2022-05-15 23:40:12,678 INFO No data to send, waiting more...
2022-05-15 23:40:12,679 INFO Waiting for data to send ...
2022-05-15 23:40:19,627 INFO 16 - processing: (41.67100000000001,-71.51100000000001)
/Users/testdss/Library/DataScienceStudio/dss_home/code-envs/python/plugin_census-us_managed/lib/python3.6/site-packages/urllib3/connectionpool.py:988: InsecureRequestWarning: Unverified HTTPS request is being made to host 'geocoding.geo.census.gov'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning,
2022-05-15 23:40:22,679 INFO No data to send, waiting more...
2022-05-15 23:40:22,679 INFO Waiting for data to send ...
2022-05-15 23:40:32,682 INFO No data to send, waiting more...
2022-05-15 23:40:32,683 INFO Waiting for data to send ...
2022-05-15 23:40:42,683 INFO No data to send, waiting more...
2022-05-15 23:40:42,683 INFO Waiting for data to send ...
2022-05-15 23:40:47,683 INFO 17 - processing: (41.855,-71.729)
2022-05-15 23:40:52,686 INFO No data to send, waiting more...
2022-05-15 23:40:52,686 INFO Waiting for data to send ...
2022-05-15 23:41:02,689 INFO No data to send, waiting more...
2022-05-15 23:41:02,689 INFO Waiting for data to send ...
2022-05-15 23:41:12,694 INFO No data to send, waiting more...
2022-05-15 23:41:12,694 INFO Waiting for data to send ...
2022-05-15 23:41:22,697 INFO No data to send, waiting more...
2022-05-15 23:41:22,697 INFO Waiting for data to send ...
2022-05-15 23:41:32,697 INFO No data to send, waiting more...
2022-05-15 23:41:32,698 INFO Waiting for data to send ...
2022-05-15 23:41:42,701 INFO No data to send, waiting more...
2022-05-15 23:41:42,701 INFO Waiting for data to send ...
2022-05-15 23:41:52,704 INFO No data to send, waiting more...
2022-05-15 23:41:52,704 INFO Waiting for data to send ...
2022-05-15 23:42:02,706 INFO No data to send, waiting more...
2022-05-15 23:42:02,706 INFO Waiting for data to send ...
2022-05-15 23:42:02,939 INFO Sending data (1264)
2022-05-15 23:42:02,939 INFO Waiting for data to send ...
2022-05-15 23:42:02,939 INFO Remote Stream Writer closed
2022-05-15 23:42:02,939 INFO Got end mark, ending send
17 rows successfully written (5OS59NAQG2)
*************** Recipe code failed **************
Begin Python stack
Traceback (most recent call last):
File "/Users/testdss/Library/DataScienceStudio/dss_home/code-envs/python/plugin_census-us_managed/lib/python3.6/site-packages/urllib3/connection.py", line 160, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw
File "/Users/testdss/Library/DataScienceStudio/dss_home/code-envs/python/plugin_census-us_managed/lib/python3.6/site-packages/urllib3/util/connection.py", line 84, in create_connection
raise err
File "/Users/testdss/Library/DataScienceStudio/dss_home/code-envs/python/plugin_census-us_managed/lib/python3.6/site-packages/urllib3/util/connection.py", line 74, in create_connection
sock.connect(sa)
TimeoutError: [Errno 60] Operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/testdss/Library/DataScienceStudio/dss_home/code-envs/python/plugin_census-us_managed/lib/python3.6/site-packages/urllib3/connectionpool.py", line 677, in urlopen
chunked=chunked,
File "/Users/testdss/Library/DataScienceStudio/dss_home/code-envs/python/plugin_census-us_managed/lib/python3.6/site-packages/urllib3/connectionpool.py", line 381, in _make_request
self._validate_conn(conn)
File "/Users/testdss/Library/DataScienceStudio/dss_home/code-envs/python/plugin_census-us_managed/lib/python3.6/site-packages/urllib3/connectionpool.py", line 978, in _validate_conn
conn.connect()
File "/Users/testdss/Library/DataScienceStudio/dss_home/code-envs/python/plugin_census-us_managed/lib/python3.6/site-packages/urllib3/connection.py", line 309, in connect
conn = self._new_conn()
File "/Users/testdss/Library/DataScienceStudio/dss_home/code-envs/python/plugin_census-us_managed/lib/python3.6/site-packages/urllib3/connection.py", line 172, in _new_conn
self, "Failed to establish a new connection: %s" % e
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x1578d0b00>: Failed to establish a new connection: [Errno 60] Operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/testdss/Library/DataScienceStudio/dss_home/code-envs/python/plugin_census-us_managed/lib/python3.6/site-packages/requests/adapters.py", line 449, in send
timeout=timeout
File "/Users/testdss/Library/DataScienceStudio/dss_home/code-envs/python/plugin_census-us_managed/lib/python3.6/site-packages/urllib3/connectionpool.py", line 727, in urlopen
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
File "/Users/testdss/Library/DataScienceStudio/dss_home/code-envs/python/plugin_census-us_managed/lib/python3.6/site-packages/urllib3/util/retry.py", line 446, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='geocoding.geo.census.gov', port=443): Max retries exceeded with url: /geocoder/geographies/coordinates?format=json&y=41.855&x=-71.729&benchmark=4&vintage=4&layers=10 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x1578d0b00>: Failed to establish a new connection: [Errno 60] Operation timed out',))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/testdss/Library/DataScienceStudio/dss_home/jobs/DKU_TUT_AUTOMATION/Build_merchant_census_tracts__NP__2022-05-16T03-34-40.145/compute_merchant_census_tracts_NP/custom-python-recipe/pyoutxAzvagrBCIWY/python-exec-wrapper.py", line 208, in <module>
exec(f.read())
File "<string>", line 107, in <module>
File "/Users/testdss/Library/DataScienceStudio/dss_home/code-envs/python/plugin_census-us_managed/lib/python3.6/site-packages/requests/api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "/Users/testdss/Library/DataScienceStudio/dss_home/code-envs/python/plugin_census-us_managed/lib/python3.6/site-packages/requests/api.py", line 60, in request
return session.request(method=method, url=url, **kwargs)
File "/Users/testdss/Library/DataScienceStudio/dss_home/code-envs/python/plugin_census-us_managed/lib/python3.6/site-packages/requests/sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "/Users/testdss/Library/DataScienceStudio/dss_home/code-envs/python/plugin_census-us_managed/lib/python3.6/site-packages/requests/sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "/Users/testdss/Library/DataScienceStudio/dss_home/code-envs/python/plugin_census-us_managed/lib/python3.6/site-packages/requests/adapters.py", line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='geocoding.geo.census.gov', port=443): Max retries exceeded with url: /geocoder/geographies/coordinates?format=json&y=41.855&x=-71.729&benchmark=4&vintage=4&layers=10 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x1578d0b00>: Failed to establish a new connection: [Errno 60] Operation timed out',))
End Python stack
2022-05-15 23:42:02,980 INFO Check if spark is available
2022-05-15 23:42:02,980 INFO Not stopping a spark context: No module named 'pyspark'

 After some further investigation it appears the the US Census GeoCoder is having some maintenance done.


Message on https://geocoding.geo.census.gov/ saying  "Coming Soon!  The Census Geocoder is undergoing significant upgrades. A new version will be released in the coming weeks that will provide users with increased consistency and faster processing times. It will also provide an enhanced user experience on desktops, tablets, and mobile devices. The Census Geocoder will continue to have the same functionality and no updates to API calls or batch submissions will be required."Message on https://geocoding.geo.census.gov/ saying "Coming Soon! The Census Geocoder is undergoing significant upgrades. A new version will be released in the coming weeks that will provide users with increased consistency and faster processing times. It will also provide an enhanced user experience on desktops, tablets, and mobile devices. The Census Geocoder will continue to have the same functionality and no updates to API calls or batch submissions will be required."

--Tom
0 Kudos
10 Replies
tgb417
Author

This is broken today as well.  This means that at least some of the academy experience are going to be impacted.  I've not finished the training modules.  How much of an impact is this going to have on the rest of the cources and completing the Certifications?

 

--Tom
tgb417
Author

@Alex_Reutter , @NancyK 

Is there a way to complete the Advanced Designer Certification without the US Census plug-in working correctly?

 

--Tom
0 Kudos
GermainLT
Level 1

Same here

SeanA
Community Manager
Community Manager

Hi @tgb417 , the certification thankfully doesn't use the Census USA plugin so there shouldn't be any blocker there.

I'll look into whether there's any update on the Census plugin or whether we are still waiting on the Census Bureau.

Dataiku
tgb417
Author

It is great that the certification can be completed.  Iโ€™m also wondering about the training as well. 

--Tom
0 Kudos
SeanA
Community Manager
Community Manager

Hi @tgb417 (and @GermainLT ),

I just went through the steps and was able to reproduce the tutorial without error using Dataiku 10.0.3 and Census USA 0.3.3. I was also successful on Dataiku Online (10.0.5) and using version 0.3.5 of the Census USA plugin. 

Aside from any steps in the tutorial, when you create the starter project, and build the Flow-- is it successful?

Dataiku
tgb417
Author

@SeanA 

This evening, I upgraded to the brand new 10.0.7.  And re-loaded the plugin.  I'm now using Census 0.3.5. After those two steps I did try again to use the flow and all seems to be working.

So, good news! 

That said I'm not clear that those changes actually fixed the problem.

It was my impression that the US Census office has also finished updating their Geo Coding API.  See below.

Message on US Census Geo Coder Page that says "We migrated the processing elements for the Census Geocoder to the cloud. This cloud-based Geocoder will provide a scalable platform with faster response times and an enhanced user experience on desktops, tablets, and mobile devices. The Census Geocoder will continue to have the same functionality, and no updates to API calls or batch submissions will be required."Message on US Census Geo Coder Page that says "We migrated the processing elements for the Census Geocoder to the cloud. This cloud-based Geocoder will provide a scalable platform with faster response times and an enhanced user experience on desktops, tablets, and mobile devices. The Census Geocoder will continue to have the same functionality, and no updates to API calls or batch submissions will be required."

 

I suspect that this later issue was the core problem.

--Tom
tgb417
Author

I am now having a bit of trouble refreshing my code environments.  Seems like there is a pip version problem.

 

--Tom
0 Kudos
GermainLT
Level 1

Hi @SeanA ,

I'm able to build the whole flow except for the merchant_census_tracts and the merchant_censur_tracts_joined.

I'm using Dataiku online

Best,

 
0 Kudos
SeanA
Community Manager
Community Manager

Hi @GermainLT , just checking first-- do you have the Census USA plugin added to your workspace? 

Dataiku