It’s that time of year again–time for big data predictions! We have a bumper crop of them to share with you this year, so sit down, grab a beverage of your choice, and buckle up for the predictive onslaught.
We kick this year’s batch of big data predictions with some harsh words from Alexander Lovell, the head of product at Fivetran.
“2023 will be put up or shut up time for data teams. Companies have maintained investment in IT despite wide variance in the quality of returns. With widespread confusion in the economy, it is time for data teams to shine by providing actionable insight because executive intuition is less reliable when markets are in flux. The best data teams will grow and become more central in importance. Data teams that do not generate actionable insight will see increased budget pressure.”
In 2023, SQL users will finally get revenge on the rest of us, says Mike Waas, the CEO and co-founder of Datometry.
“Negating their original battle cry of doing away with SQL, the NoSQL community has acknowledged that enterprise IT demands standards–and with it the simplicity of a common yet powerful query language. Practically every NoSQL database that’s still alive is currently in the process of adding a SQL or SQL-like interface to their system to appeal to enterprises. 2023 will see the revenge of the SQL users where pretty much any data management system that wants to be successful in the enterprise will try to look like a proper database.”
Data historically has gone through phases where it’s distributed, and then becomes centralized again. We’re currently in the distributed phase, and it’s unlikely come back together, and that necessitates new approaches to deal with it, including data fabrics or data meshes, says Angel Viña, CEO and founder of Denodo.
“While there is an inherent difference between the two, data fabric is a composable stack of data management technologies and data mesh is a process orientation for a distributed groups of teams to manage enterprise data as they see fit,” he says. “Both data fabric and data mesh can play critical roles in enterprise-wide data access, integration, management and delivery, when constructed properly with the right data infrastructure in place. So in 2023, expect a rapid increase in adoption of both architectural approaches within mid-to-large size enterprises.”
You’ve heard about the modern data stack. But in 2023, you’ll hear more about the postmodern data stack, says Chris Lubasch, the CDO & RVP DACH at Snowplow:
“It was a year of fast-moving discussions around the modern data stack. Lots of new vendors popped up, and major ones like Snowflake and Databricks continue their journey to take over many technical components, despite the challenging economic situation. But at the same time, voices emerged who questioned the modern data stack as such, whose decoupled approach often leads to many tools and high costs, let alone the complexity of getting it all together. The discussions around the ‘postmodern data stack’ (as just one out of many terms) were started, and we’re all eager to see where this will lead us in the coming years.”
As the founder of object storage provider Cleversafe (acquired by IBM in 2015 for $1.3 billion), Chris Gladwin knows a thing or two about scaling for big data. Now with his third startup, the data warehouse vendor Ocient, Gladwin predicts that says 2023 is the year that hyperscale data goes mainstream.
“Data-intensive businesses are moving beyond big data into the realm of hyperscale data, which is exponentially greater. And that requires a reevaluation of data infrastructure. In 2023, data warehouse vendors are will develop new ways to build and expand systems and services.
It’s not just the overall volume of data that technologists must plan for, but also the burgeoning data sets and workloads to be processed. Some leading-edge IT organizations are now working with data sets that comprise billions and trillions of records. In 2023, we could even see data sets of a quadrillion rows in data-intensive industries such as adtech, telecommunications, and geospatial. Hyperscale data sets will become more common as organizations leverage increasing data volumes in near real-time from operations, customers, and on-the-move devices and objects.”
Matt Carroll, CEO and co-founder of Immuta, says 2023 will see the rise of data processing agreements (DPAs) and no-copy data exchanges.
“In 2023, we’ll see DPAs become a standard element of SaaS contracts and data sharing negotiations. How organizations handle these contracts will fundamentally change how they architect data infrastructure and will define the business value of the data. As a result, it will be in data leaders’ best interest to fully embrace DPAs in 2023 and beyond. These lengthy documents will be complex, but the digitization of DPAs and the involvement of legal teams will make them far easier to understand and implement.
“In 2023, as data sharing continues to grow, and data and IT teams are strapped to keep up, no-copy data exchanges will become the new standard. As organizations productize their modern data stack, there will be an explosion in the size and number of data sets. Making copies before sharing just won’t be feasible anymore. In 2023, enterprises will flock to established platforms, like Snowflake’s Data Exchange and Databricks’ Delta Sharing protocol, to make it easier to securely share and monetize their data.”
2023 will be the year of the rabbit, according to the Chinese calendar. But as Dhruba Borthakur, the co-founder and CTO of Rockset and the founding engineer of RocksDB sees it, 2023 will be the year of the data app.
“In the past 10 years we’ve seen the rise of the web app and the phone app, but 2023 is the year of the data app. Reliable, high performing data applications will prove to be a critical tool for success as businesses seek new solutions to improve customer facing applications and internal business operations. With on-demand data apps like Uber, Lyft and Doordash available at our fingertips, there’s nothing worse for a customer than to be stuck with the spinning wheel of doom and a request not going through. Powered by a foundation of real-time analytics, we will see increased pressure on data applications to not only be real-time, but to be fail safe.”
You likely have many things on your Christmas list. But there’s only one thing that Tamr chief product officer Anthony Deighton is hoping for this year: clean data.
“Junky or dirty data is data that is incorrect, incomplete, inconsistent, outdated, duplicative – or all of the above, and may be killing your business. It’s a common problem often heightened during cyclical periods when you need your customer data to work for you most — i.e., for holiday shopping and travel. Avoid confusion and frustration, and ease your customers’ shopping and travel experience by mastering your customer data. Customer mastering creates a unified, accurate and enriched view of customer data across systems and sources, and a unique identifier enabling consistent tracking of the customer. Mastering your customer data at scale gives sales, marketing and customer experience teams a powerful way to accelerate data-driven selling. It also enables customer insights for competitive advantage.”
Good fences make good neighbors, as the old saying goes. But that doesn’t apply to enterprise workloads, according to Andi Gutmans, vice president and general manager of Google databases at Google Cloud, who says the barriers between transactional and analytics workloads will start to disappear in 2023.
“Traditionally, data architectures have separated these workloads because each needed a fit-for-purpose database. Transactional databases are optimized for fast reads and writes, while analytical databases are optimized for aggregating large data sets,” Gutmans says. “With advances in cloud-based data architectures that leverage highly scalable, disaggregated compute and storage with high-performance networking, we predict there will be new database architectures that allow both transactional and analytical workloads within one system without requiring applications to compromise on workload needs.”
There’s been a lot made of the supposed death of big data. Don’t believe the hype, says Christian Buckner, the senior vice president of data analytics and IoT at Altair.
“Big data isn’t dead (yet),” he says. “Providers will attempt to get ahead trends, and we will see many start to advertise that ‘Big data is dead.’ Instead, many organizations are leaning into ‘smart data’ for greater insights. But despite the advertisements, big data will continue to play an important role in business operations–for now. The key is to make sure you have easy to use, self-service tools in place that enable cleansing, verifying, and prepping of the data that can then be plugged into a data analytics model for valuable results and smart decisions. The companies that turn their big data into smart data will be the ones that will benefit from the new ways of thinking about data.”
Democracy at the national level will need some help 2023, if 2022 is any indication. When it comes to data democratization, that help will come in the form of Python, according to Torsten Grabs, director of product management at Snowflake.
“In 2023, Python will be the primary medium for democratizing access to, and insights from, data for everyone across an organization. Python will become more enterprise-ready as the runtime infrastructure around Python grows simpler and more straight-forward, and includes more security and governance. At the same time, productionizing Python results will become further streamlined and that code will be wrapped in a meaningful user experience so it can be easily consumed and understood by non-IT users such as a company’s marketing team. We’ll see Python have the same or more likely, an even greater transformational impact on democratizing data than the emergence of self-service business intelligence tools 15 to 20 years ago.”
Altair, Datometry, Denodo, Fivetran, Google Cloud, Immuta, Ocient, Rockset, Snowflake, Snowplow, Tamr