📜 ⬆️ ⬇️

Pentaho Data Integration (PDI), Python and Deep Learning

Hi, Habr! I present to you the translation of the article "Pentaho Data Integration (PDI), Python and Deep Learning . "

Deep Learning (DL) - why is there so much noise around it?


According to Zion Market Research, the deep learning market (DL) will increase from $ 2.3 billion in 2017 to more than $ 23.6 billion by 2024. With an average annual growth rate of almost 40% annually, DL has become one of the hottest areas for analytics experts to create models . Before turning to the question of how Pentaho can help implement your organization's DL models in the product environment, let's take a step back and consider why DL is such a breakthrough technology. Below is some general information about this:

image

image


Why use PDI to develop and implement deep learning models using Python?


Today, data experts and data engineers collaborate in hundreds of data science projects created in PDI. Thanks to Pentaho, they were able to transfer complex data science models to the production environment at lower cost than traditional data preparation tools. We are pleased to announce that Pentaho can now bring this ease of use to the DL frameworks, helping to achieve the goal of Hitachi Vantara, allowing organizations to innovate with all their data. With PDI and the new Python Executor Step, Pentaho can do the following:


Benefits:


How does PDI introduce deep learning?


Components Used:


See Pentaho 8.2 Python Executor Step in the Pentaho online help for a list of dependencies. Python Executor - Pentaho Documentation .

The main process:

1. Select the VFS HCP file in the PDI Step. Copy and prepare unstructured data files for use with the DL framework using PDI Python Executor Step .

image

Additional Information:
https://help.pentaho.com/Documentation/8.2/Products/Data_Integration/Data_Integration_Perspective/Virtual_File_System


2. Use the new transformation that will implement workflows to process the DL framework and associated datasets and so on. Enter hyper parameters (values ​​used to configure and run models) to evaluate the most effective model. Below is an example that implements four DL framework workflows, three using Tensorflow and one using Keras, with the Python Executor Step.

image

image

3. Focusing on the Tensorflow DNN Classifier workflow (which implements the implementation of hyper parameters ), use the PDI Data Grid Step , that is, the name Injected Hyperparameters , with values ​​that correspond to the Python Script Executor steps.

image

4. In Python Script Executor step, use Pandas DF and implement the entered hyper parameters and values ​​as variables on the Input tab.

image

5. Execute the DL script associated with the DL (either using “Embed” or using “Link from file”) and using the link to the DL framework and the entered hyper parameters. In addition, you can set a different path for the Python virtual environment than the default one for it.

image

6. Ensure that TensorFlow is installed, configured and correctly imported into the Python shell.

image

7. Returning to Python Executor Step , click the Output tab, and then click the “Get Fields” button. The PDI will pre-check the script file to check for errors, output, and other parameters.

image

8. At this, the settings for starting the conversion are completed.

Hitachi Vantara offers a proprietary GPU solution to accelerate deep learning


DL frameworks can significantly gain in performance when executed using a graphics rather than a central processor, so most DL frameworks support some types of graphics processors. In 2018, Hitachi Vantara developed and delivered an advanced DS225 server with NVIDIA Tesla V100 graphics processors. This is the first graphic server Hitachi Vantara, designed specifically for the implementation of DL.

image

More information about this offer can be found on the Hitachi Vantara website .

Why do organizations need to use PDI and Python for Deep Learning?


Source: https://habr.com/ru/post/439418/