RSS

Tag Archives: data science

IBM Watson and Cognitive Data as the Future of Information Data Systems

IBM-Watson

IBM Watson is slowly becoming an important piece about IBM’s vision of the future of computing. Yesterday, big blue announced that is launching another business unit centered on Watson solutions. The investment in this new unit is estimated to be around $1B but, more importantly, it reinforces IBM’s commitment to Watson and cognitive data as the future of enterprise data solutions.

A lot has happened since Watson made news winning the television quiz show Jeopardy by beating legends Ken Jennings and Brad Rutter. At the time, Watson was a sophisticated natural language processing machine but didn’t have a lot to offer in other areas of cognitive computing.

Since the jeopardy days, Watson has added a significant number of services in areas such as data insights, vision processing, image recognition, natural language processing, text analytics and other important areas of cognitive science. More importantly, IBM is making Watson available through the series of APIs via the Watson Developer Cloud which allow developers to leverage Watson is third party applications.

IBM’s efforts around Watson are, undoubtedly, the most important steps to establish cognitive data as a mainstream trend in the technology arena. While big data technologies have certainly disrupted the information management space, data processing applications remain mostly ignorant when comes to understanding and reasoning through the data they store. This is where cognitive data becomes important by helping expert systems enhance, understand and reason through structured and unstructured data sets in order to make intelligent decisions.

5 Reasons Why Cognitive Data is the Future of Enterprise Data

Data is Becoming Contextual in Nature

Modern data is becoming more contextual every day. While data sets can be considered static in nature, they have different interpretations depending on contextual aspects such as time, location, environmental aspects, etc. Cognitive computing is a necessary step to make information systems more context aware by augmenting static data sources with dynamic contextual data and reason and learn from it.

Big Data is Just a Lot of Dumb Data

Today, big data systems are becoming an important element of software systems by storing large amounts of static data. Despite the advances in data storage and process, data systems remain essentially unintelligent when comes to understand, optimize, augment and reason through the data they store. In that sense, organizations are constantly building new systems to make data “more intelligent”. Cognitive data presents a powerful alternative to traditional data systems by providing a layer of intelligence to modern information systems.

Data Scientists are not for all Scenarios

Data scientists are the most common answer when comes to gather insights about specific data sets. However, data scientists are fundamentally inefficient in areas such as real time vision analysis, image recognition, speech analysis and other fundamental aspects of cognitive systems. Cognitive data and platforms like IBM Watson will help to expand the capabilities of traditional data science to provide more sophisticated intelligence over traditional data sources.

Video, Images, Text and Speech are Becoming Increasingly Important

Complementing the previous point, data signals such as video, text, images and speech are fundamentally difficult to process by traditional data systems. Platforms like IBM Watson and other cognitive data solutions excel at the understanding and processing of these type of data points making an ideal extension of traditional data systems

Actions are as important as Data Insights

In modern data systems, actions related to the data are typically hardcoded as a bunch of rules within an applications. However, automatically taking actions based on data insights is becoming an increasingly important aspect of modern applications. Cognitive data is a fundamental step towards enabling intelligent decision making based on data insights on software applications.

5 Cognitive Data Scenarios Relevant in Today’s Enterprise

Healthcare

Cognitive science is starting to revolutionize healthcare.  The intelligent processing of unstructured healthcare data such as images, videos, speech etc is leading the charge in modern healthcare applications ranging from treatment recommendations to decease pattern analysis. Not surprisingly, healthcare remains the number one vertical for IBM Watson applications.

Public Safety

Cognitive data can help better reason through real data points in the form of video, images, sounds and text commonly encountered in public safety scenarios. Using cognitive data systems, public safety operators can improve their decision making process by interacting with systems that will help them reason through contextual data in their environments.

Finance

From fraud detection to financial package recommendations, cognitive data is increasingly becoming relevant in financial systems. Reasoning through large amounts of semi-structured and unstructured data, cognitive data systems can help improve financial decisions such as trading, fraud analysis, etc.

Marketing

Cognitive data is a key element of the future of recommendation systems and other user engagement marketing processes. Rapidly reasoning through the text on an email or the tone on a phone call, will help organizations to recommend better products to their customers while also enhancing the understanding of their marketing data.

Defense

This is an obvious one. Cognitive data will be essential to improve defense operations by better reasoning through the millions of data signals collected by soldiers and equipment on the field. Additional, cognitive science will help to build more intelligent defense equipment such as drones or robots that are becoming an integral part of modern warfare.

 
Leave a comment

Posted by on October 14, 2015 in Uncategorized

 

Tags: , , , ,

6 Best Practices of Successful Enterprise Data Science Projects

ml4

During the last decade, data science projects in the enterprise have developed a reputation for being complex and expensive. However, the last few years have seen an explosion in new machine learning and big data infrastructure technologies that have helped lower the entry point for implementing data science solutions in the enterprise. Despite the technical evolution, enterprise data science projects remain relatively complex compared to traditional areas of investment in enterprise IT.

Similar to other groundbreaking technologies in enterprise IT, implementing successful data science solutions is a combination of strong processes, delivery methodologies and technologies. Our experience implementing dozens of successful enterprise data science and machine learning solutions have allowed us to develop certain perspective about patterns we think help to optimize the success of data science projects in the enterprise. The following list provides a small summary of best practices in enterprise data science projects. Some of them might seem trivial but they can be difficult to enforce in real world implementations.

Build For the Future: Build on Technologies You Can Innovate Upon

Data science platforms is one of the fastest growing areas in the technology ecosystem. As a result, new platforms, machine learning algorithms, data visualization technologies, etc are constantly surging bringing new value propositions to enterprise solutions. Additionally, the requirements for enterprise data science solutions are constantly changing based on new market trends.

Building on a technology stack that facilitates innovation, extensibility and scalability is essential to guarantee the success of enterprise data science projects. In that sense, when selecting a data science platform, organizations should not only evaluate its technical capabilities but also complementary factors such as developer community, open source contributions, talent availability etc.

No Model is Right: Implement Various Models for the Same Scenario

One of the most common mistakes in machine learning projects is deciding on a specific prediction or classification algorithm before implementing the solution. Many times, the optimal algorithm is not discovered until several models are tested and evaluated with the real data. In that sense, is a good practice to implement the first iteration of the solution running several machine learning algorithms concurrently and compare the results over time.

Continuous Data Science: Deliver Results Every Week and the First MVP in a Month

Enterprise data science projects are notorious for taking a long time and being extremely expensive. Also, is not uncommon that stakeholders need to wait months before seeing the first results of a data science solution which, more often than not, need to be improved. To mitigate some of those challenges, we always recommend structuring projects in a way that deliver weekly results to stakeholders.

In addition to deliver weekly results, we always recommend to focus on delivering a minimum viable product (MVP) within the first month of starting the project.  Sometimes, this model requires cutting a few corners on the infrastructure side on the early days but it guarantees the constant feedback from the ultimate users which will help to continuously improve the data science solution.

Test Test Test: Make the Models Testable

Complementing the previous point, it is very important to provide mechanisms to continuously test and validate machine learning algorithms even if the solution is running in production. Building testing models is an often overlooked aspect of enterprise data science projects but one that becomes critical to guarantee the evolution of the solution.

Monitor Everything: Implement Operational Monitoring in Your Data Science Solutions

Monitoring the execution of machine learning models, data inputs and outputs, model failures etc becomes essential for the production readiness of an enterprise data science project. In that sense, IT organizations should considering implementing the correct operational monitoring and instrumentation infrastructure as part of any data science project. While conceptually obvious, incorporating these capabilities in a data science solution is far from trivial as most operational monitoring platforms are still not integrated with machine learning and data science stacks.

Start Small, Fail Fast and Iterate

Machine learning and data science solutions are new initiatives for most enterprises and one that requires new skillsets and practices. In that sense, it is important to approach these projects in a highly iterative manner and allocating room for initial failures. While the limitations of legacy data science technology stacks prevented organizations from applying agile and lean development practices to data science projects, this is no longer the case. Today most of the modern data science and machine learning stacks provide enough capabilities that allow organizations to start delivering results extremely fast with a minimum investment.

 
Leave a comment

Posted by on August 5, 2015 in Uncategorized

 

Tags: , , , , ,

5 on 5: Demystifying Machine Learning in the Enterprise

ml2

Machine learning is become one of the most important trends in the next generation enterprise data solutions. The evolution of machine learning platforms as well as complementary technology movements such as big data has lowered the entry point for organizations embracing machine learning models to drive more effective business intelligence.

Despite the remarkable technological advances in the last few years, enterprise machine learning remains surrounded by strong myths. We regularly encounter those myths during our work with large enterprises around the world implementing data science and machine learning solutions. This brief article is our attempt to demystify some of the most common misconceptions about enterprise machine learning and also takes a look at the technology stacks that are helping to democratize machine learning in the enterprise.

5 Myths of Enterprise Machine Learning

Implementing Machine Learning is Expensive

If you were implementing a machine learning solution a few years ago, you were stuck with commercial packages ranging on the high six figures to low seven figures that also require a lot of professional services to be implemented. Consequently, there is a myth that machine learning implementations need to be unreasonably expensive. The last few years have seen an emergence of a new group of platforms that have helped to commoditized the price of machine learning platforms while also lowering the entry point for developers and architects looking to implement these types of solutions. Today, it is possible to get up and running with a machine learning solution in a few weeks without spending anything on software licenses.

Is Impossible to Build In-House Expertise in Machine Learning

A side effect from the previous myth. Machine learning has been traditionally seen as a professional services intensive endeavor. While it is true that an organization could benefit from starting their machine learning journey accompanied by the right experts, it is also true that today machine learning platforms provide a low entry point for developers and architects looking to work on the next generation data analytics solutions. In that sense, it is factually possible for an enterprise to start building machine learning knowledge in house while leveraging an expert firm to help them take the initial steps in that journey.

We Need Data Scientists

Machine learning is typically seen as a disciplined practiced by introverted data scientists or statisticians who wear thick glasses and are the only people capable to reasoning through machine learning data and algorithm. This myth couldn’t be further from the truth. Most modern machine learning platforms includes dozens of well understood algorithms that can be enabled with minimum level of effort.

Machine Learning is About Predictions

People mistakenly associate machine learning with data predictions. While predictive analytics is certainly a popular disciplined in the machine learning space is far from covering the entire value of machine learning solutions. Classification, clustering, regression algorithms are incredibly useful to help enterprises extract value from data assets and they are typically simpler to implement than predictive models.

We Need a Big Data Infrastructure to Implement Machine Learning

The recent evolution of machine learning platforms was, arguably, catalyzed by the explosion in big data technologies. Consequently, many organizations feel they are not ready to take advantage of machine learning until they can implement a proper big data infrastructure. While leveraging big data infrastructure brings certain advantages, modern machine learning platforms work effectively against traditional enterprise relational data stores and data warehouses.

5 Technologies that Simplify Enterprise Machine Learning

Azure Machine Learning (http://azure.microsoft.com/en-us/services/machine-learning/)

Azure native cloud-based predictive analytics service that makes it possible to quickly create and deploy predictive models as analytics solutions. Azure ML provides a visual environment to create ML models as well as an API model to access the models programmatically. Azure ML also allows a developer to use languages like R or python in their ML models.

AWS Machine Learning (http://aws.amazon.com/machine-learning/ )

Similar to Azure ML, AWS ML Service provides a series of tools and algorithms that allow developers to start building and using machine learning solutions without a heavy investment on infrastructure.

Spark MLib(https://spark.apache.org/docs/1.2.1/mllib-guide.html)

The incredibly popular Spark platforms includes a very simple model to execute machine learning algorithms using MPP scale. Interestingly enough, Spark and AWS is now fully supported in Azure and AWS which makes it an interesting complement to the native machine learning engines included in those platforms.

Scikit Learn(http://scikit-learn.org/)

One of the most powerful ML frameworks in the world. Scikit learn provides a series of python based libraries that include over 50 ML algorithms and has a very vibrant community behind it.

Mahout (http://mahout.apache.org/)

Even though Mahout has seen its popularity eclipsed by the raise of new machine learning platforms, it remains incredibly relevant when comes to evaluating machine learning solutions in the enterprise. Mahout provides a large gallery of machine learning algorithms optimized to work in Hadoop infrastructures.

Other Platforms

The aforementioned technologies form a core group of platforms that are actively driving machine learning adoption in the enterprise. Like any other fast growing technology space, we are seeing an increasing number of platforms that are bringing new and innovative capabilities to enterprise machine learning solutions. Consequently, the previous list is likely to increase in the next few months but it is a good place to start today.

 
Leave a comment

Posted by on July 29, 2015 in Uncategorized

 

Tags: , , , , , ,