Beyond Big Data and Benioff’s “AI Spring”

This article was authored by Matt McIlwain and was originally published on Medium. For more of Matt’s writing, you can follow him here!

Big Data, AI, Machine Learning, Hadoop, Predictive Analytics — we hear these terms every day from companies such as Cloudera, Trifacta and Dato (formerly GraphLab) that are securing many millions in financing. I believe that 2015 will be the year when the conversation moves from Big Data to the Dataware stack. Over the past twelve months we have seen a lot of companies across the big data spectrum emerge and while the language can be the same, there are clear product categories that have emerged which describe the market opportunity and future growth.

This is the Dataware stack. Dataware is the combination of infrastructure, data intelligence systems that apply algorithms and machine learning to the data, and the applications enabled by data intelligence that are changing how we do business and how we live our lives every day. And startups dominate the Dataware landscape.

We are at the very start of the data revolution. The consumer world got there first. Apps that know who we are and where we are and some other data points about us help us do a myriad of things every single day. On the business side, there has always been a lot of data but now there is not only an incredible growth in that data, there is an active appreciation beyond business analysts for using data in near real time to improve products and services.

What we are seeing now is a huge shift that is infusing data into every piece of our lives and making every app and service smarter. We have started to see this blending of data with what were rather static services with recent announcements from companies such as Salesforce.com and Workday.

The Dataware Framework is how I look at this new world of software in the age of big data. Dataware includes an Agile Data Stack of components for modern data applications and services and a Continuous Data Loop that brings usable data in and out of the stack. The Agile Data Stack has three layers including the underlying enabling infrastructure, the data intelligence layer, and the data-infused applications and services that benefit from those underlying components. The Continuous Data Loop is the representation of how data is continually being ingested, cleaned, visualized, recycled and refined and put back into the mix for future predictions so that modern applications can deliver intelligence in increasingly dynamic and personalized ways.

While most traditional IT customers and vendors are making moves to deal with the growth of big data — Dataware is largely the territory of startups and early adopters across every layer of the data stack.

At first, new sources of data and data systems will enhance and extend existing technologies such as databases, data warehouses and business intelligence tools. The new technologies will help unlock value in legacy systems and structured data silos along with new types and structures of data sources. But, as the Dataware infrastructure, intelligence and methodologies mature, data-infused applications and services will be built from scratch to disrupt industries and business processes.

Dataware will introduce net new processes and intelligence into the world’s oil exploration, research for cancer cures, advertising optimization, and yes, choosing movies and friends.

Here are four key principles fundamental to understanding the impact that Dataware will have on the technology industry.

Big Data and traditional structured data work together as a “hybrid” of inputs to feed data-infused applications and services and will complement each other as these applications get built.
Enabling infrastructure, including new types of databases (Cassandra, MongoDB, Hbase,) and data execution “engines” (Hadoop/Map Reduce, Spark), are primarily enablers and less likely to be where value is captured (when compared to the relational database era) in the Agile Data Stack.
The Data Intelligence layer is where data, algorithms, data models and “pipelines” intersect to turn data into insights. More value will be delivered and captured in this layer than historic data “middleware” and BI tools have captured in the past. Companies focused on the enabling infrastructure today are likely to try and move up the stack into the data intelligence layer.
Data driven applications and services are distinguished across two dimensions. First, are the data insights being delivered to a machine or a human? Second, are data insights/predictions being delivered in real-time or a batch/offline mode? Real-time insights delivered directly to a human end-user are the most challenging ones to run at scale. And, they are the most challenging systems to create a continuous feedback loop that delivers both instant gratification to the customer and compelling insights to the service provider.

It’s clear that not every company in the big data arena will succeed. There will be a lot of failures and there is already a lot of confusion around language and markets. The companies that will succeed will make themselves a core component of the Agile Data Stack or the Continuous Data Pipeline and will build their footprint from there.

Dataware is one way to frame the major areas of opportunity in what Mark Benioff recently called the “AI Spring” or what Microsoft is promoting with services like AzureML. But many questions remain including where the most promising markets exist, when those markets will be ready for rapid adoption and who amongst startup-ups and incumbents will emerge as winners and losers. What we do know is that Dataware will dramatically impact the technology industry over the next decade.

Matt McIlwain is an investor in Seattle with Madrona Venture Group who invests in enterprise, cloud and Dataware companies. Dato is one of his investments.

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Necessary" category .
cookielawinfo-checkbox-non-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
cookielawinfo-checkbox-preferences	1 year	This cookie is set by the GDPR Cookie Consent plugin to check if the user has given consent to use cookies under the "Preferences" category.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
55d66ab20f0ad28a_cfid	2 years	Set by ChatFunnels to store chat sessions
bcookie	2 years	This cookie is set by linkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
lidc	1 day	This cookie is set by LinkedIn and used for routing.
sc_anonymous_id	9 years	Cookie is placed by SoundCloud to provide functions across pages.

Cookie	Duration	Description
__utma	2 years	This cookie is set by Google Analytics and is used to distinguish users and sessions. The cookie is created when the JavaScript library executes and there are no existing __utma cookies. The cookie is updated every time data is sent to Google Analytics.
__utmb	30 minutes	The cookie is set by Google Analytics. The cookie is used to determine new sessions/visits. The cookie is created when the JavaScript library executes and there are no existing __utma cookies. The cookie is updated every time data is sent to Google Analytics.
__utmc		The cookie is set by Google Analytics and is deleted when the user closes the browser. The cookie is not used by ga.js. The cookie is used to enable interoperability with urchin.js which is an older version of Google analytics and used in conjunction with the __utmb cookie to determine new sessions/visits.
__utmt	10 minutes	The cookie is set by Google Analytics and is used to throttle the request rate.
__utmz	6 months	This cookie is set by Google analytics and is used to store the traffic source or campaign through which the visitor reached your site.
_gat_UA-4086838-1	1 minute	This is a pattern type cookie set by Google Analytics, where the pattern element on the name contains the unique identity number of the account or website it relates to. It appears to be a variation of the _gat cookie which is used to limit the amount of data recorded by Google on high traffic volume websites.
_uetsid	1 day	Bing Ads sets this cookie to engage with a user that has previously visited the website.
_uetvid	1 year 24 days	Bing Ads sets this cookie to engage with a user that has previously visited the website.
YSC		This cookies is set by Youtube and is used to track the views of embedded videos.

Cookie	Duration	Description
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, camapign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assigns a randoly generated number to identify unique visitors.
_gcl_au	2 months	This cookie is placed by Google Tag Manager to place and track conversions.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form.
_uv_id	2 years	Slideshare: Collects data on the user's visits to the website, such as which pages have been read.
browser_id	5 years	This cookie is used for identifying the visitor browser on re-visit to the website.
bscookie	2 years	This cookie is placed by Linkedin to store performed actions on the website.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
GPS	30 minutes	This cookie is set by Youtube and registers a unique ID for tracking users based on their geographical location
li_sugr	2 months	This cookie is placed by Linkedin to store browser details.
lissc	1 year	Used by the social networking service, LinkedIn, for tracking the use of embedded services.
MR	1 week	This cookie is used to measure the use of the website for analytics purposes.
pardot		The cookie is set when the visitor is logged in as a Pardot user.
undefined	never	Wistia sets this cookie to collect data on visitor interaction with the website's video-content, to make the website's video-content more relevant for the visitor.
vuid	2 years	Vimeo

Cookie	Duration	Description
ANONCHK	10 minutes	The ANONCHK cookie, set by Bing, is used to store a user's session ID and also verify the clicks from ads on the Bing search engine. The cookie helps in reporting and personalization as well.
IDE	2 years	Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
MUID	1 year	Used by Microsoft as a unique identifier. The cookie is set by embedded Microsoft scripts. The purpose of this cookie is to synchronize the ID across many different Microsoft domains to enable user tracking.
SRM_B	1 year	Bing.com
SRM_I	1 year	Bing.com
u	2 months	Collects data on user visits to the website, such as what pages have been accessed. The registered data is used to categorize the user's interest and demographic profiles in terms of resales for targeted marketing
uid	1 year	This cookie is used to measure the number and behavior of the visitors to the website anonymously. The data includes the number of visits, average duration of the visit on the website, pages visited, etc. for the purpose of better understanding user preferences for targeted advertisments.
UserMatchHistory	1 month	This cookie is place by Linkedin to enable ad delivery or retargeting.
VISITOR_INFO1_LIVE	5 months	This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Guest Post: Beyond Big Data and Benioff’s “AI Spring” to the Dawn of Dataware

Join our email list for news, product updates, and more.

Product

Company

Help

Cookie	Duration	Description
_clck	1 year	No description
_clsk	1 day	No description
AnalyticsSyncHistory	1 month	No description
CLID	1 year	No description
ingrammicro.com	1 hour	No description
li_gc	2 years	No description
loglevel	never	No description available.
original_req_url	past	No description
visitor_id869971	10 years	No description
visitor_id869971-hash	10 years	No description