we are trying to pull data from MS Access files using UCanAccess 4.0.4. This works for most of the tables.
However we have a problem when extracting even small tables (12 MB) from bigger MS Access files (bigger than 500 MB). Connection crashes with no specific error. Is there any way how to access big MS Access files? For example by setting up some parameters in the connection? Especially when we are only interested in 1 relatively small table from the MS Access database that contains many others?
Thank you for any suggestion!
As I was thinking about your problem. A work around that you may have considered came to mind. For now, you might go into ms access and just export the table as .csv or ,XLSX then import the table to your dataiku project. This will help make sure your data is useful to your project and will help to confirm that you are not dealing with a corruption in the ms access file.
Thank you for your question. Support for MS access is achieved using the driver you mentioned and it's indeed the latest tested that seems work.
Re parameters, you could check in their docs whether some optimisation might be achieved, but this would be outside the realm of DSS.
The suggestion by @tgb417 is not bad at all if you only need a small table, then perhaps extracting it and querying it separately might be best. If you're unable to do this due to the MS access database being in use and potentially changing, then you may consider changing your design in order to either:
- switch from MS access (to MS Sql for example) or
- splitting into two MS access databases and linking them. with this approach DSS would not be querying one monolithic database, but smaller ones.
Your in good hands regardless and best of luck to you!
Thank you all for your answers, however, I can't modify the MS Access DB before loading it into Dataiku. Neither can I export tables as csv/xlsx directly from MS Access as we run on linux.
Luckily, there is a python code that exports all tables from .accdb/.mdb file into .csv using UCanAccess 4.0.4. This is the only workaround so far that can be incorporated in automated scenario in Dataiku. https://gist.github.com/Guitlle/d64be23f820ac6cf433db97ec79fb2b5#file-access2csv-py