Like big data, the term Data Lake is sometimes mock as being simply a marketing label for a product that supports Hadoop. Increasingly, however, the term is being accepted as a way to describe any large data pool in which the schema and data requirements are not defined until the data is queried.
Each data element in a lake is assigned a unique identifier and tagged with a set of extended metadata tags. When a business question arises, the data lake can be queried for relevant data, and that smaller set of data can then be analyzed to help answer the question.
In an interview with Mr. Nikhil Madan, General Manager – Data Lake & Scale-out Storage Solutions, Dell EMC, spoke about, trends and importance of Data Lake to stay ahead of competition by transforming their business.
Tell us more about Data Lake and how is this helping organizations to achieve with better scalability and analytics?
With data expected to grow to 44 trillion zettabyte by 2020, organizations are faced with the challenge of handling large quantities of unstructured data which they need for better business operations. Within unstructured data, lies the real power of data which can prove to be a goldmine for an organization, providing they have right tools and equipment to extract it. This is where the power of Data Lake comes in.
A modernized data warehouse, called Data Lake is a single repository of information from multiple sources in an organization, built on a highly scalable storage infrastructure (scale-out storage solution) for data consolidation. It allows for big data accessibility using traditional and next-generation access methods to gain value through analytics. Data lakes actually allow customers to centralize all types of information, make better business decisions, start innovating and being more competitive in the market. Businesses now look at data lakes to provide them more insights into their business, to stay ahead of competition by transforming their business.
To cater to the rising storage needs of organizations, Dell EMC offers a perfect solution called Isilon. Dell EMC Isilon is extremely simple to manage, but a powerful Scale-out NAS storage that can be deployed in a Hadoop deployment; with built-in data protection and availability to reduce IT risks. Dell EMC Isilon improves the TCO of the big data storage infrastructure by lowering CAPEX, OPEX, and the time to insights while meeting compliance and regulatory requirements.
Dell EMC understands the changing needs of the customers and thus it constantly upgrades its offering. Recently, Dell EMC launched Isilon scale-out NAS technology storage solution which allows businesses to extend their data lake to edge, optimize the core, and embrace the cloud. This will allow enterprises to increase efficiency, simplify management, and gain new levels of business agility. For example: Isilon SD Edge: The first Software-Defined Storage solution with the power and flexibility of Dell EMC Isilon allows enterprises to easily extend the edge of their data lake to locations such as remote and branch offices, allowing organizations to consolidate and securely distribute unstructured data used to support a wide range of second and third platform applications.
What does Data Lake do? Why does one need a Data Lake?
Data Lake Foundations are built to consolidate and eliminate inefficient storage silos. This storage infrastructure helps store, manage, protect and analyze unstructured data. Key Value Propositions of a Data Lake Foundation include:
- Efficient Storage: Data Lake Foundations deliver efficient storage by eliminating storage silos, simplifying storage management and improving storage utilization
- Massive Scalability: Data Lake foundations are built from scale-out architectures and are massively scalable while being simple and efficient to manage.
- Increased Operational Flexibility: Data Lake Foundations leverage multi-protocol capabilities to support a wide range of traditional and next generation workloads.
- In-Place Big Data Analytics: Data Lake Foundations leveraged the shared storage infrastructure and support for protocols such as HDFS to deliver cost efficient in-place analytics that deliver faster time to results and better performance
- Robust Data Protection and Security: Data Lake Foundations protect your data assets with efficient and resilient backup, disaster recovery and security options.
How one can build a Data Lake and what is the right approach to build it?
To respond to the massive increase in enterprise’s unstructured data, it becomes very important for them to incorporate the best IT infrastructure that is simple to build, can hold the humongous amounts of data and provide real time analysis.
Therefore, this level of enterprise data growth is best addressed by consolidating storage into one single central repository of persistent data—a data lake—that simplifies the IT architecture, is efficient, and scales as business needs change.
The Dell EMC Isilon scale-out network-attached storage (NAS) is a simple and scalable platform to build a scale-out data lake and persist enterprise files of all sizes that scale from terabytes to petabytes in a single cluster. It enables you to consolidate storage silos, improve storage utilization, and reduce costs, while providing you a future proofed platform to run today and tomorrow’s workloads.
How is your Business Data Lake solutions helping Enterprises to leverage Big Data across platforms?
With new technologies like Internet of Things (IoT), artificial intelligence, robotics, virtual reality, augmented reality and cloud computing, already transforming our lives and businesses, there is data explosion from everywhere. Businesses today, are also facing multiple challenges—new competition, engaging with the customers in a better way, adapting to the digital world and so on. Over the period of time, enterprises have realized that a lot of the information that they need to make their business decisions better is already sitting within their organizations and they can derive value from it.
The true value of harnessing big data is nearly endless. From predicting buying behaviors, to intelligence-driven innovation, to finding totally new opportunities that could transform business, Dell EMC’s Business Data Lake solutions give the enterprises, the proven power to leverage the big data across operations to offerings to growth strategies.
A business data lake is a complete way of bringing data, applications and analytics together. Data is ingested, catalogued, inventoried and controlled regardless of the source or destination. It delivers analytics capability where needed; down to the point of data inception and ingest. Dell EMC offers plethora of extraordinary solutions to prepare businesses kick start their digital transformation journey. Dell EMC Isilon solution gives the businesses one single system to capture, store analyze, protect and manage your data. This helps organizations to scale out operations, reduce costs and deliver on your organization’s analytics initiatives.
Few of products offered by the company in its Isilon software products category include:
- Dell EMC IsilonSD Edge, the first offering of a new Software-Defined Storage product family called Dell EMC IsilonSD that allows enterprises to easily extend their data late to edge locations.
- Dell EMC Isilon OneFS.NEXT, a new version of the OneFS operating system, which includes powerful new features and capabilities that optimize storage at the core of the data lake.
- Dell EMC Isilon CloudPools, a new software offering that enables enterprises to extend their data lake to the cloud, integrate Isilon on-premise storage with cloud services from a choice of cloud providers, and gain new efficiency by embracing the cloud as an archiving storage tier for cold data
Data Lake is now becoming the center of the Digital strategy that organizations of all shapes and sizes embark on. Whether it is a few applications that are centralized or entire businesses like in a large telco, coming together on a central platform, Data Lake now, holds value in all organizations.