Flow Views & Automation (Tutorial) fail on compute_merchant_census_tracts Step
I'm having problems completing the Academy Hands on Exercise at this point in the process.
https://knowledge.dataiku.com/latest/courses/automation/scenarios-hands-on.html#add-steps
The compute_merchant_census_tracts Step is failing with the following error messages from the Census Plugin. Get US census block group from lat lon . The error seems to be some sort of timeout. It is happening after 7 to 22 records are processed. I've tried increasing the API Call threshold from 1 second to 20 seconds. I've also updated DSS to 10.0.5 and the plugin to 0.3.5 and restated DSS.
2022-05-15 23:39:29,039 INFO 14 - processing: (41.74,-71.62899999999999)
/Users/testdss/Library/DataScienceStudio/dss_home/code-envs/python/plugin_census-us_managed/lib/python3.6/site-packages/urllib3/connectionpool.py:988: InsecureRequestWarning: Unverified HTTPS request is being made to host 'geocoding.geo.census.gov'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning,
2022-05-15 23:39:32,668 INFO No data to send, waiting more...
2022-05-15 23:39:32,668 INFO Waiting for data to send ...
2022-05-15 23:39:42,668 INFO No data to send, waiting more...
2022-05-15 23:39:42,668 INFO Waiting for data to send ...
2022-05-15 23:39:49,304 INFO 15 - processing: (41.778,-71.339)
/Users/testdss/Library/DataScienceStudio/dss_home/code-envs/python/plugin_census-us_managed/lib/python3.6/site-packages/urllib3/connectionpool.py:988: InsecureRequestWarning: Unverified HTTPS request is being made to host 'geocoding.geo.census.gov'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning,
2022-05-15 23:39:52,672 INFO No data to send, waiting more...
2022-05-15 23:39:52,673 INFO Waiting for data to send ...
2022-05-15 23:40:02,674 INFO No data to send, waiting more...
2022-05-15 23:40:02,674 INFO Waiting for data to send ...
2022-05-15 23:40:12,678 INFO No data to send, waiting more...
2022-05-15 23:40:12,679 INFO Waiting for data to send ...
2022-05-15 23:40:19,627 INFO 16 - processing: (41.67100000000001,-71.51100000000001)
/Users/testdss/Library/DataScienceStudio/dss_home/code-envs/python/plugin_census-us_managed/lib/python3.6/site-packages/urllib3/connectionpool.py:988: InsecureRequestWarning: Unverified HTTPS request is being made to host 'geocoding.geo.census.gov'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning,
2022-05-15 23:40:22,679 INFO No data to send, waiting more...
2022-05-15 23:40:22,679 INFO Waiting for data to send ...
2022-05-15 23:40:32,682 INFO No data to send, waiting more...
2022-05-15 23:40:32,683 INFO Waiting for data to send ...
2022-05-15 23:40:42,683 INFO No data to send, waiting more...
2022-05-15 23:40:42,683 INFO Waiting for data to send ...
2022-05-15 23:40:47,683 INFO 17 - processing: (41.855,-71.729)
2022-05-15 23:40:52,686 INFO No data to send, waiting more...
2022-05-15 23:40:52,686 INFO Waiting for data to send ...
2022-05-15 23:41:02,689 INFO No data to send, waiting more...
2022-05-15 23:41:02,689 INFO Waiting for data to send ...
2022-05-15 23:41:12,694 INFO No data to send, waiting more...
2022-05-15 23:41:12,694 INFO Waiting for data to send ...
2022-05-15 23:41:22,697 INFO No data to send, waiting more...
2022-05-15 23:41:22,697 INFO Waiting for data to send ...
2022-05-15 23:41:32,697 INFO No data to send, waiting more...
2022-05-15 23:41:32,698 INFO Waiting for data to send ...
2022-05-15 23:41:42,701 INFO No data to send, waiting more...
2022-05-15 23:41:42,701 INFO Waiting for data to send ...
2022-05-15 23:41:52,704 INFO No data to send, waiting more...
2022-05-15 23:41:52,704 INFO Waiting for data to send ...
2022-05-15 23:42:02,706 INFO No data to send, waiting more...
2022-05-15 23:42:02,706 INFO Waiting for data to send ...
2022-05-15 23:42:02,939 INFO Sending data (1264)
2022-05-15 23:42:02,939 INFO Waiting for data to send ...
2022-05-15 23:42:02,939 INFO Remote Stream Writer closed
2022-05-15 23:42:02,939 INFO Got end mark, ending send
17 rows successfully written (5OS59NAQG2)
*************** Recipe code failed **************
Begin Python stack
Traceback (most recent call last):
File "/Users/testdss/Library/DataScienceStudio/dss_home/code-envs/python/plugin_census-us_managed/lib/python3.6/site-packages/urllib3/connection.py", line 160, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw
File "/Users/testdss/Library/DataScienceStudio/dss_home/code-envs/python/plugin_census-us_managed/lib/python3.6/site-packages/urllib3/util/connection.py", line 84, in create_connection
raise err
File "/Users/testdss/Library/DataScienceStudio/dss_home/code-envs/python/plugin_census-us_managed/lib/python3.6/site-packages/urllib3/util/connection.py", line 74, in create_connection
sock.connect(sa)
TimeoutError: [Errno 60] Operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/testdss/Library/DataScienceStudio/dss_home/code-envs/python/plugin_census-us_managed/lib/python3.6/site-packages/urllib3/connectionpool.py", line 677, in urlopen
chunked=chunked,
File "/Users/testdss/Library/DataScienceStudio/dss_home/code-envs/python/plugin_census-us_managed/lib/python3.6/site-packages/urllib3/connectionpool.py", line 381, in _make_request
self._validate_conn(conn)
File "/Users/testdss/Library/DataScienceStudio/dss_home/code-envs/python/plugin_census-us_managed/lib/python3.6/site-packages/urllib3/connectionpool.py", line 978, in _validate_conn
conn.connect()
File "/Users/testdss/Library/DataScienceStudio/dss_home/code-envs/python/plugin_census-us_managed/lib/python3.6/site-packages/urllib3/connection.py", line 309, in connect
conn = self._new_conn()
File "/Users/testdss/Library/DataScienceStudio/dss_home/code-envs/python/plugin_census-us_managed/lib/python3.6/site-packages/urllib3/connection.py", line 172, in _new_conn
self, "Failed to establish a new connection: %s" % e
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x1578d0b00>: Failed to establish a new connection: [Errno 60] Operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/testdss/Library/DataScienceStudio/dss_home/code-envs/python/plugin_census-us_managed/lib/python3.6/site-packages/requests/adapters.py", line 449, in send
timeout=timeout
File "/Users/testdss/Library/DataScienceStudio/dss_home/code-envs/python/plugin_census-us_managed/lib/python3.6/site-packages/urllib3/connectionpool.py", line 727, in urlopen
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
File "/Users/testdss/Library/DataScienceStudio/dss_home/code-envs/python/plugin_census-us_managed/lib/python3.6/site-packages/urllib3/util/retry.py", line 446, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='geocoding.geo.census.gov', port=443): Max retries exceeded with url: /geocoder/geographies/coordinates?format=json&y=41.855&x=-71.729&benchmark=4&vintage=4&layers=10 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x1578d0b00>: Failed to establish a new connection: [Errno 60] Operation timed out',))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/testdss/Library/DataScienceStudio/dss_home/jobs/DKU_TUT_AUTOMATION/Build_merchant_census_tracts__NP__2022-05-16T03-34-40.145/compute_merchant_census_tracts_NP/custom-python-recipe/pyoutxAzvagrBCIWY/python-exec-wrapper.py", line 208, in <module>
exec(f.read())
File "<string>", line 107, in <module>
File "/Users/testdss/Library/DataScienceStudio/dss_home/code-envs/python/plugin_census-us_managed/lib/python3.6/site-packages/requests/api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "/Users/testdss/Library/DataScienceStudio/dss_home/code-envs/python/plugin_census-us_managed/lib/python3.6/site-packages/requests/api.py", line 60, in request
return session.request(method=method, url=url, **kwargs)
File "/Users/testdss/Library/DataScienceStudio/dss_home/code-envs/python/plugin_census-us_managed/lib/python3.6/site-packages/requests/sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "/Users/testdss/Library/DataScienceStudio/dss_home/code-envs/python/plugin_census-us_managed/lib/python3.6/site-packages/requests/sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "/Users/testdss/Library/DataScienceStudio/dss_home/code-envs/python/plugin_census-us_managed/lib/python3.6/site-packages/requests/adapters.py", line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='geocoding.geo.census.gov', port=443): Max retries exceeded with url: /geocoder/geographies/coordinates?format=json&y=41.855&x=-71.729&benchmark=4&vintage=4&layers=10 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x1578d0b00>: Failed to establish a new connection: [Errno 60] Operation timed out',))
End Python stack
2022-05-15 23:42:02,980 INFO Check if spark is available
2022-05-15 23:42:02,980 INFO Not stopping a spark context: No module named 'pyspark'
After some further investigation it appears the the US Census GeoCoder is having some maintenance done.
Answers
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
This is broken today as well. This means that at least some of the academy experience are going to be impacted. I've not finished the training modules. How much of an impact is this going to have on the rest of the cources and completing the Certifications?
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
Is there a way to complete the Advanced Designer Certification without the US Census plug-in working correctly?
-
GermainLT Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 2 ✭✭✭
Same here
-
Sean Dataiker, Alpha Tester, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer Posts: 168 Dataiker
Hi @tgb417
, the certification thankfully doesn't use the Census USA plugin so there shouldn't be any blocker there.I'll look into whether there's any update on the Census plugin or whether we are still waiting on the Census Bureau.
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
It is great that the certification can be completed. I’m also wondering about the training as well.
-
Sean Dataiker, Alpha Tester, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer Posts: 168 Dataiker
Hi @tgb417
(and @GermainLT
),I just went through the steps and was able to reproduce the tutorial without error using Dataiku 10.0.3 and Census USA 0.3.3. I was also successful on Dataiku Online (10.0.5) and using version 0.3.5 of the Census USA plugin.
Aside from any steps in the tutorial, when you create the starter project, and build the Flow-- is it successful?
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
This evening, I upgraded to the brand new 10.0.7. And re-loaded the plugin. I'm now using Census 0.3.5. After those two steps I did try again to use the flow and all seems to be working.
So, good news!
That said I'm not clear that those changes actually fixed the problem.
It was my impression that the US Census office has also finished updating their Geo Coding API. See below.
I suspect that this later issue was the core problem.
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
I am now having a bit of trouble refreshing my code environments. Seems like there is a pip version problem.
-
Sean Dataiker, Alpha Tester, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer Posts: 168 Dataiker
Hi @GermainLT
, just checking first-- do you have the Census USA plugin added to your workspace?