Open source and proprietary software solutions: the key for an analytic project, Naskraft TechBlog

Open source and proprietary software solutions: the key for an analytic project

In the world of data analysis it may be no coincidence that open source tools like the ‘R’ statistical computing language have blossomed as analytics and big data have matured together.

Hadoop, Python… There seems to be a special kind of magic between the curious minds of data analysts (with a small ‘a’ – as they may be ‘line of business’ users that don’t have a degree in statistics or a qualification in coding) and with new ways of exploring the world.

Open source software has proven itself to be a very useful way of rapidly finding quality insights out about the world when out to the challenging task of finding insights from the enormous volumes of data out there. Big data analytics provides an opportunity for open source data quality tools to deliver new insights.

From a bottom-line focus, using open source solutions as part of the enterprise mix can help provide a cost-effective method to help get successful analytics projects off the ground.

Certainly, any business still using coding-intensive legacy architectures, or SAS solutions, will find themselves easily seduced the speed and versatility of modern products in the analytical toolkit.

A successful marriage involves learning to work together to solve problems

Bringing these products and tools together can be complicated, but linking them together in one platform provides the fun and thrill for the analysts who want to use their favourite tools, and still maintain the governance, repeatability and reliability the business needs to really create a long-lived culture of analytics.

It’s a plain fact that much of an analyst’s role, be they a specialist quant or a general business user, is more likely than not filled with the tedium of finding, cleaning, prepping, and cleansing data. By that stage they’ve lost the enjoyment of what made the relationship with data special in the first place.

The trouble is that many legacy solutions can’t adapt to the changing data landscape. Some were not designed to deal with the variety of data – structured, unstructured, and semi-structured, or in the various types it is available from numerous applications and sources. This is why it’s sensible to allow for a flexible environment for analysts to take advantage of data across any system and in any format.

If this, the foundational element of the data journey, can be made as seamless and easy as possible, then the analytical detectives can be doing what they trained and are paid to do. That’s better for them, and it’s better for the business, as that passion and brain power is not atrophying with the tedious end of the mundane elements of data preparation.

Additionally, most data scientists today build predictive and machine learning models in open source programming languages and then need to deploy that code into different technology frameworks.

It’s time consuming, error-prone and requires additional development resources – often stalling data science projects altogether. It’s important to pay attention to any roadblocks between data scientists and development teams by accelerating the model making and model deployment processes.

It can require considerable coding expertise to harness complex sets of open source tools, adding difficulty, not least because the skills are in high demand and fetch a premium on the market.

As a consequence code-free environments for analytics that simplify data access, preparation, analysis, and consumption are becoming a must in the modern enterprise.

Hand in hand – open source tools and stable platforms equal a better experience for all

A project manager should be able to quickly prepare, clean and combine data from any range of data sources. It should be a breeze to implement fuzzy matching techniques to improve the accuracy of results, and however the project is designed, as a matter of course it should reduce the dependency and reliance on data scientists and IT wherever possible. It’s simply not sustainable to do this in any other way.

Following the data preparation and quality improvement, the next step involves taking that data and incorporating predictive or advanced analytics to make or to further improve business decisions. And in the modern, agile enterprise, this should mean doing this without having to write code if users don’t wish to.

Once those elements are accounted for it should be a simple matter to build repeatable workflow processes that provides the business with greater data consistency and accuracy – and result in tangible business benefits once the insights are acted upon.

With the entire approved analytic process in a repeatable workflow organisations spend less time on repeating mundane tasks and process, and spend more time on valuable aspects of the analysis. Analysts will enjoy themselves once more, following their curiosity and solving problems ‘rather than administrating’.

This is important. Today’s data scientists are spending too much time building advanced models that never reach deployment. Gartner stated that many projects remain stuck at the pilot stage.

Only 15% of businesses reported deploying their big data project to production in the Business Intelligence & Analytics Summit 2016 researchYhat states that only 10% of predictive models actually get deployed. And according to TDWI, models can take an average of six to nine months to get deployed. That’s not a sustainable way of working.

Modelling tools need to be more accessible to accelerate deployment, and to save time and frustration. In part, it’s worth bringing joy back to data scientists and business users alike. With a wealth of data out there, it’s a good time to encourage and empower the people who love to solve complex business problems.

Leave a Reply

Your email address will not be published. Required fields are marked *