Good evening!
I have another question / request for you all: which are the more powerful and flexible ETL and DWH / BI tools and which are, on the same or a parallel way, the most popular ones?
PowerCenter? COGNOS? DataStage? Oracle Warehouse Builder? SAS? And what could be the proper purpose of SAP BO XI?
And, at last but not at least: what could be the best approach? Visual tools and SQL or just basing on visual tools? Maybe also epych only SQL?
Please, let me kindly know. And, also, give me some hints from the height of your experience, I am at teeth troubles!
Cheers!
Alberto
Vector DB implementation using FAISS
-
Searching for relevant information in vast repositories of unstructured
text can be a challenge. This article explains a Python-based approach to
impleme...
1 day ago
Roxana L. Socolovsci, Yudong CHEN 陈豫东 and 2 others like this
40 comments • Jump to most recent comments
Follow Ivan
Ivan Peev • Hi Alberto,
It may not be the most powerful at the moment, but the most popular ETL is Microsoft's SQL Server Integration Services (SSIS). It comes as part of the SQL Server license. Compared to the ETL tools you have listed, it is much more affordable and extensively used.
Regarding your second question what is the right approach to get the job done, as always it is best to analyze the requirements first and then pick the right tools for the job.
Follow Crystal
Crystal Sullivan • Hi Alberto,
You might be interested in some of the analyst reports on data management found here.
Analyst Viewpoints | Data Management
http://www.sas.com/news/analysts/indexByTechnology.html#sid=0812.0000.0000
Are you asking for any reason in particular?
Follow Alberto
Alberto Manzoni • Thanks, Crystal!
No, simply I think a good strategy for getting deeper and reasonably in DWH should be at first asking for hints from somenone with good experience on this subjects!
As on a gnomic of my area "I am not wise but not stupid at all as well!". :-)
Thanks!
Follow Swaraj
Swaraj Kumar Dash • Depends primarily on your requirements based on which the tool can be selected.IBM datastage ( formerly Ascential ) is leveraged by many organizations.
Follow Alberto
Alberto Manzoni • Thanks Swaraj!
Follow Alessandro
Alessandro Drago • Hello Alberto, I can tell you something about ETL/ELT tools. It's a quite difficult question because depends on the underlying architecture (i.e. How many targets do you have, wich technologies are envolved and so on), anyway i can suggest Oracle Data Integrator. This tool has the capability to connect to every techology as source and/or as target, you can leverage the module approach on developing the flows and you can dinamically setup the env without impacting the code generated.
Cheers,
Alessandro.
2Click to see who liked this comment.
Follow Yves
Yves Callaert • As previously mentioned your architecture will determine the tool. But beware that if you choose popular ETL tools (infosphere, informatica, sap) that it will be a costly affair.
If you have a small project also take a look at the open source alternatives. For example, Talend offers the same visual interface as most expensive programs and is easy to install and maintain.
If you have good developers using only SQL is an option, on the condition everyone follows strict rules on how code is documented and maintained (eg: use packages in oracle to keep similar tasks together). SQL-only structures will have better performance than visual ETL tools, as they operate on an extra architectural layer.
So the approach will be based on your architecture, team members and the money you have to spend.
4Follow Alberto
Alberto Manzoni • Well, for a person from Brianza like me, money subject seems very reasonably! thanks, Yves!
Follow Alberto
Alberto Manzoni • OK, Alessandro, I see. Avery "fine tuning" and multiple variables matter, at the very end
Follow Alberto
Alberto Manzoni • Just to summarize your last two hints: balancing costs and involved parts, the very heart of the process is ETL and this lesser depends on the tool (maybe the most familiar) but this process is the very heart of the whole matter. Is this correct?
Follow Gordon
Gordon Wong • Echoing others, you really need to lead with requirements. What sort of data are you working with, what kinds of volumes and complexity, what sort of environment and how are your requirements likely to change over time? One shot deal? Handcrafting code is an option. Fluid requirements and anticipating future demand? Then investing in a tool with a decent development interface pays off. Anticipating explosive data volumes? Make sure your product can handle elastic scalability, both from an actual performance perspective and from a licensing one. Also, consider the people assets you can bring to bear. More mature tools let business analysts get into the game, bringing domain expertise to your ETL code, at the cost of well....cost.
1Follow Alberto
Alberto Manzoni • Wow! yes, anyway, you are right! More mature!
Follow David C.
David C. Brown, PMP, CISSP, CIP • I've found that once the requirements and exchange process are well understood and documented, a little coding will do the trick. As mentioned earlier, keeping the code and documentation updated is the trick.
Another thing that I have found, is trying to anticipate future (as yet unrequested) requirements will add unnecessary cost and complexity and you usually get it wrong anyway.
Follow Peter
Peter Nolan • Alberto,
the most important thing to remember is to prototype your data models and your ETL. The vendor ETL tools do not allow you to easily prototype you ETL.
We have developed a tool called SeETL which is available for free in the development environment. You can prototype your ETL and data models using our tools before deciding what ETL tool to move on to. Many of our clients also use our ETL in production because it is much cheaper and we are more than 2x more productive than any of the other ETL tools out there. In most cases we are more than 4 times more productive than other ETL tools our there because we do not have a "document + code" model. We have a "document + generate model".
You can go to our channel over here. In the GA package you will find an extensive example of how our tools work along with all the documentation.
We are giving this away as a development tool so as to reduce the cost of developing DWs. Typically competent use of our tools in development will reduce development costs of ETL and data models by 50% or more even when vendor ETL and data modeling tools are used to go into production.
http://www.youtube.com/user/InstantBI
Anyone else who wants to check this out is welcome to do so. Feel free to pass this around. This is a genuine effort to enable anyone who wants to to reduce development costs of ETL and data models.
Naturally we charge for any time anyone asks for. We can not give away our time but we most certainly can give away our software for use in development as a community service.
Best regards
Peter
Follow Xiaoye
Xiaoye WANG • Powerful ETL: Datastage PX(Parallelism), ODI(ELT approch).
Popular ETL : Informatica, Datastage Server et PX, SSIS (licensed with SQL Server)
Popular Reporting or Cube : COGNOS(good at OLAP analysis, and couple with datastage both are IBM) , SAP BI 4(old BO XI, connectivity with SAP BW, sepration of 3 levels model,object,connexion), SSRS&SSAS (licensed with SQL Server)
Popular & Powerful Datamining tool : SAS, SPSS(SAP now)
Popular Data visualisation Tool : QlickView, Tableau
Otherwise All open source tools are powerful and popular :
ETL: Talend (Data Integration, ESB, MDM)
Reporting&OLAP : Jasper, Pentaho, BIRT
1Follow Gordon
Gordon Wong • Thinking about ETL in general, it used to be that performance was a major differentiator. In some situations, that's still the case. However, for majority of classic BI use cases, performance is essentially solved. Instead, the differentiators are productivity output, maintainability, flexibility in sources and targets and more so than ever, marrying technical capabilities with non-technical personnel
Follow David C.
David C. Brown, PMP, CISSP, CIP • Peter,
The link that you provided points to an IBM Neezza promo video.
Dave
Follow Gabe
Gabe Green • There's a lot of nice information in this thread... not too much I can add, but to say that everyone is right, you've got to look at your business requirements, any infrastructure/technology requirements, total cost of ownership expectations, and then maybe development preferences.
Just about any ETL will meet your business requirements for populating/maintaining your data mart, but maybe the massive amount of data requires you to consider a tool that can massively scale out for performance (like DataStage).
For example, in the Cognos arena there is an ETL tool called Data Manager. The tool does not break out data streams into parallel paths... nor does it allow you to add execution engines across nodes... but if I have a product that is using Cognos BI (and doesn't have massive data quantities)... this tool integrates well with the Cognos BI platform, but more importantly to me is that as a developer it is one of the quickest/easiest ETL tools to develop/maintain the data mart objects.
So bottom line is that you'll need to look at your must have requirements... and then narrow down your list as you go. Really look/test the tools as well... and don't minimize the benefit of ease/speed of development when choosing.
1Follow Paul
Paul Felix • Nice thread. I'd only add that you should favor tools that are metadata driven if iterative development is at all important to your project. Otherwise, you will end up doing a lot of rework on each iteration. You don't need to spend hundreds of thousand to get the job done. You can model in Visio, Excel, or other free/cheap tools. Focus on the task then select a tool that meets the needs.
1Follow David C.
David C. Brown, PMP, CISSP, CIP • Paul, Can you please explain "metadata driven" tools?
Thanks,
Dave