PLATFORML

PLATFORMLPLATFORMLPLATFORML

PLATFORML

PLATFORMLPLATFORMLPLATFORML
  • Home
  • Services
    • Managed Services
    • Enterprise Modernization
    • Data Analytics Delivery
    • RPA Automation
    • Cloud Transformation
  • Global Capabilities
    • Global Workforce
    • Custom Solutions
  • Insights
    • QPlatform
    • Case Studies
    • mlmonitor
  • More
    • Home
    • Services
      • Managed Services
      • Enterprise Modernization
      • Data Analytics Delivery
      • RPA Automation
      • Cloud Transformation
    • Global Capabilities
      • Global Workforce
      • Custom Solutions
    • Insights
      • QPlatform
      • Case Studies
      • mlmonitor

  • Home
  • Services
    • Managed Services
    • Enterprise Modernization
    • Data Analytics Delivery
    • RPA Automation
    • Cloud Transformation
  • Global Capabilities
    • Global Workforce
    • Custom Solutions
  • Insights
    • QPlatform
    • Case Studies
    • mlmonitor

Consumable enterprise Data lake

CASE STUDY

Case Study Overview:

The fortune 100 Firm set out to revamp their Enterprise Data Lake which was on a Cloudera platform to a new data platform primarily for (a) cost reduction (b) adopt to modern data architectures and (c) importantly, to enable the data lake to be easily governed and consumable.  


Business Challenges:

The data lake development at this customer was in works for many years.  With iterative development and on-the-fly data governance, the lake over time has become very difficult to consume with lack of proper metadata management, access provisioning and cataloging. The lake was built on a Hadoop cluster and with the costs growing higher, there was a need to retake a look on options on new technologies and platforms.


Solutions Delivered:

Quadratic Systems was the primary partner for designing and implementing the data lake solution on a new platform comprising of:


(1)  On-Prem S3 object store (Scality) for data storage (replacing HDFS), 

(2) Spark/Scala on Kubernetes containers (CaaS Platform)

(3) Dremio as a query tool (replacing Hive/Impala)


Scality was chosen for Data Storage and a Caas Platform (Kubernetes) for compute to replace the exisiting Hadoop environment.  


Data Governance: 

  • Data Quality addressed with Balance and Controls and Reconciliation Framework
  • Data Access/Security using Service Now Integration for provisioning 
  • Metadata Management with integration with Informatica EDC
  • Incremental feeds to the lake were recorded in a metadata and available for consumers to easily find new data as it was made available.


Data Ingestion Framework:

  • A repeatable config driven framework  written in Scala enabled Systems of Records to quickly on-board to the lake
  • Kafka Integration and Spark Streaming to accomplish near real time ingest to the lake

 

Data Consumption:

  • Dremio was made available for users to consume data on Scality with ease for exploration / reporting.
  • Advanced users employed Pyspark for analytical consumption



Enabling Technology

  • On-Prem S3 (Scality)
  • Spark on Kubernetes
  • Dremio
  • Scala / Pyspark

Copyright © 2022 Quadratic Systems, Inc. - All Rights Reserved.