menu
techminis

A naukri.com initiative

google-web-stories
Home

>

Big Data News

Big Data News

source image

Amazon

3w

read

153

img
dot

Image Credit: Amazon

Introducing AWS Glue Data Catalog automation for table statistics collection for improved query performance on Amazon Redshift and Amazon Athena

  • AWS Glue Data Catalog now automates generating statistics for new tables, integrated with CBO from Amazon Redshift Spectrum and Amazon Athena.
  • Table statistics are essential in optimizing queries on large datasets for join operations across multiple datasets.
  • Data Catalog previously supported collecting table statistics for table formats like Parquet, ORC, JSON, ION, CSV, and XML and Apache Iceberg tables.
  • The latest update allows administrators to configure weekly statistics collection across all databases and tables, optimizing the platform's cost-efficiency.
  • The feature enables flexible per-table controls, allowing individual data owners to manage table statistics per their requirements.
  • Catalog-level statistics collection can be enabled via the Lake Formation console or the AWS CLI.
  • With this feature, AWS Glue automatically updates column statistics for all columns in each table, using 20% of records to calculate statistics.
  • Individual data owners can configure scheduled collection configurations at the table level and customize settings for individual tables.
  • This feature will help in the efficient management of up-to-date column-level statistics to optimize query processing and cost-efficiency.
  • Try this feature for your use case, and share your feedback in the comments.

Read Full Article

like

9 Likes

source image

Currentanalysis

3w

read

61

img
dot

Image Credit: Currentanalysis

Telefónica Tech Seeks to Maximize Operational Capabilities with a Transversal Operating Model

  • Telefónica Tech held its analyst event in London, UK, on November 28, 2024.
  • The company has created a new operating model designed to replicate its local capabilities at a global scale.
  • Telefónica Tech has mirrored and mapped its vertical units with different geographies to maximize the market opportunity globally.
  • The company combines predictive and GenAI in horizontal and vertical use cases leveraging industry-specific data and expertise in IoT, cybersecurity, cloud, blockchain, and robotics.

Read Full Article

like

3 Likes

source image

Cloudera

3w

read

406

img
dot

Image Credit: Cloudera

Secure Data Sharing and Interoperability Powered by Iceberg REST Catalog

  • Cloudera’s open data lakehouse, powered by Apache Iceberg, solves the big data challenges by providing a unified, curated, shareable, and interoperable data lake.
  • The Apache Iceberg REST Catalog provides metastore-agnostic APIs for Iceberg metadata operations, simplifying Iceberg table data sharing and consumption.
  • This REST Catalog abstraction allows real-time metadata access and simplifies the enterprise data architecture, reducing Time to Value, Time to Market, and overall TCO.
  • Cloudera’s open data lakehouse, powered by Apache Iceberg and the REST Catalog, enables sharing data with non-Cloudera engines in a secure manner.
  • This solution gives enterprises the ability to improve data practitioner productivity and launch new AI and data applications much faster.
  • Key features include multi-engine interoperability, time travel, table rollback, and a rich set of SQL (query, DDL, DML) commands.
  • Iceberg tables can evolve schema and partition layout without requiring data migration or application changes.
  • The post provides detailed steps to create an Amazon Athena notebook configured to use the Iceberg REST Catalog.
  • The solution reduces data duplication issues and complexity of ETL pipelines, and improves ROI.
  • For more information, refer to cloudera.com or getting started with Amazon Athena.

Read Full Article

like

24 Likes

source image

Amazon

3w

read

74

img
dot

Image Credit: Amazon

Introducing the HubSpot connector for AWS Glue

  • AWS Glue can integrate, enhance, and present a wide range of software as a service (SaaS) data to improve internal operations and gain valuable insights.
  • HubSpot connector is introduced for AWS Glue to integrate data from HubSpot into a centralized AWS data lake, enabling efficient data integration and preparation.
  • Ingesting HubSpot data into an Amazon S3 location using the Script editor Glue provides both visual and code-based interfaces.
  • Athena simplifies data analysis directly in Amazon S3 using standard SQL. You can query the results of the HubSpot data ingested into Amazon S3.
  • AWS services like Amazon Redshift, Amazon QuickSight, and Amazon SageMaker can further process, transform, and analyze data from HubSpot.
  • AWS Glue can run ETL jobs on a schedule to regularly synchronize data between HubSpot and Amazon S3.
  • By following the steps outlined in this post, you can ensure that up-to-date data from HubSpot is captured in your data lake.
  • The AWS Glue connector for HubSpot enables you to set up ongoing data integration from HubSpot to AWS, providing a unified view of data from across platforms.
  • By integrating HubSpot data into your AWS environment, you can construct sophisticated, end-to-end data architectures that unlock the full value of your HubSpot data.
  • With AWS Glue and the new HubSpot managed connector, companies can enhance, analyze, and optionally push enriched data back to external SaaS platforms.

Read Full Article

like

4 Likes

source image

Cloudera

3w

read

30

img
dot

Image Credit: Cloudera

Cloudera and AWS Partner to Deliver Cost-Efficient and Sustainable Infrastructure for AI and Analytics

  • Cloudera and Amazon Web Services (AWS) are partnering to support energy-efficient, high-performance, and cost-effective solutions for data processing, advanced analytics, and artificial intelligence (AI) services by supporting many data services on AWS Graviton.
  • Together, Cloudera and AWS can help optimize performance for the workload while minimizing resource consumption and the resulting carbon footprint.
  • FinOps and GreenOps are vital to cloud computing in the age of AI. FinOps, when scaled, ensures cloud expenses remain predictable, aligned with business objectives, which is also encouraged by the best practices of GreenOps.
  • Cloudera has several initiatives to help its customers reduce costs, such as native support for Apache Iceberg, a Lakehouse Optimizer, and a unified codebase, while also having committed to reaching net-zero carbon emissions by 2050.
  • AWS Graviton processors are designed to run critical workloads at the lowest cost, with the best performance and the lowest energy consumption, and are ideal for companies looking to improve infrastructure sustainability.
  • Several data services are planned for general availability or technical preview on AWS Graviton as part of this partnership.
  • Cloudera and AWS' collaboration enables data engineers to improve data pipeline development, taking advantage of the technology advancements, offering cost savings and operational efficiencies in addition to environmental stewardship.
  • Cloudera's continuous innovation enables businesses to maintain relevance and prepare for the future of AI, optimizing their infrastructure sustainably to reduce costs and provide business impact with data.
  • Customers across industries can replicate the multinational utility company's journey in sustainable data optimization using Cloudera on AWS, giving them the opportunity to balance the need to innovate with innovative analytics and AI while continuously optimizing costs and the impact on the environment.
  • Try Cloudera and AWS today, and start achieving financial and environmental objectives for sustainable operations with advanced analytics while optimizing costs.

Read Full Article

like

1 Like

source image

Precisely

3w

read

74

img
dot

Image Credit: Precisely

A 4-Step Guide to Elevating Your Customer Experience

  • Companies struggling with fragmented communication systems must adopt a fully integrated Customer Communication Management (CCM) system to provide a seamless customer experience.
  • Siloed communication channels result in inconsistent messaging across departments, leading to confusion and frustration in customers.
  • A detailed assessment of current CCM systems will help identify all customer-facing communications and identify which tools and departments are managing them. Framing digital strategy around customer needs helps create a foundation for unifying communications which improves customer satisfaction and loyalty.
  • Choose CXM solutions that align with long-term growth and customer satisfaction goals, while ensuring omni-channel communication support.
  • Fostering a culture of digital adoption and driving adoption across organizations is fundamental to the success of a modernized CCM system.
  • A complete knowledge of technologies, proven CXM leadership, and change management skills must be looked for in a partner when transitioning from legacy systems to modern CXM technologies.
  • A fully integrated omni-channel communication system ensures consistency across channels, increased efficiency, and higher customer satisfaction or loyalty.
  • Partnering with digital transformation experts can minimize disruptions to business operations, guide teams during the process and sustain long-term growth.
  • With these fundamental steps – assessing needs, evaluating CXM options, driving digital adoption, and partnering with experts – streamlined modernization and seamless customer experiences can be achieved.
  • Unifying customer communications with modern, fully integrated CCM systems enhances customer experiences, improves satisfaction, and drives operational efficiency.

Read Full Article

like

4 Likes

source image

TechBullion

3w

read

307

img
dot

Image Credit: TechBullion

Protecting Business Data and Increasing Data Privacy

  • Protecting business data and increasing data privacy are critical priorities for organizations navigating the complexities of the digital landscape.
  • Notable regulations like the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA) underscore the importance of robust data protection measures, requiring companies to implement comprehensive strategies to mitigate risks associated with data breaches and privacy violations.
  • The increasing prevalence of cyber threats poses significant challenges to data privacy, with small businesses being particularly vulnerable to attacks such as ransomware, phishing, and malware.
  • Businesses must navigate a complex regulatory environment to ensure compliance with various data protection laws.
  • Innovative technologies, including data encryption, data loss prevention (DLP), and endpoint protection, are essential for enhancing data privacy.
  • Data Protection Officers (DPOs) play a significant role in ensuring compliance and managing regulatory requirements across different jurisdictions.
  • Understanding and categorizing types of business data allow organizations to tailor their data protection strategies effectively, enhancing both data security and privacy across their operations.
  • In today's digital landscape, protecting business data is of paramount importance to prevent data breaches, safeguard sensitive information, and maintain business continuity.
  • Furthermore, businesses must adopt proactive risk management measures to mitigate potential data incidents.
  • By adhering to best practices in data privacy and leveraging innovative technologies, organizations can mitigate risks, maintain customer trust, and ensure business continuity.

Read Full Article

like

18 Likes

source image

Amazon

3w

read

415

img
dot

Image Credit: Amazon

Develop a business chargeback model within your organization using Amazon Redshift multi-warehouse writes

  • Amazon Redshift is a cloud data warehouse that allows customers to scale read workloads without copying data
  • The multi-data warehouse writes feature supports scaling write workloads on different warehouses based on workload needs
  • Benefits of this feature include cost monitoring and control for each data warehouse and enabling data collaboration
  • The solution architecture presented involves setting up separate workgroups for ingestion and consumption and creating datashares for different business units
  • The chargeback model allows for attributing costs to different business units and implementing cost control optimizations
  • Prerequisites include having 3 Redshift warehouses, a superuser in each warehouse, and an IAM role to ingest data from Amazon S3 to Redshift
  • Steps involve setting up primary ETL cluster, creating datashares, granting object permissions, and setting up Sales and Marketing warehouses
  • The chargeback calculation is based on compute capacity utilization measured in Redshift Processing Units (RPUs)
  • Cleaning up involves deleting Redshift provisioned cluster, serverless workgroups, and namespaces
  • The benefits of the solution include straightforward cost attribution, using different clusters and data warehouses, writing data even when producer warehouse is paused, and working across accounts and Regions

Read Full Article

like

25 Likes

source image

Amazon

3w

read

207

img
dot

Image Credit: Amazon

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

  • Customers can now use Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud to enable near real-time analytics.
  • Amazon Aurora and Amazon Redshift address different analytics requirements, allowing for a separation of concerns.  
  • Data in Amazon Redshift is transactionally consistent and updates are automatically and continuously propagated.
  • dbt helps manage data transformation by enabling teams to deploy analytics code following software engineering best practices
  • dbt Cloud is a hosted service that helps data teams productionize dbt deployments.
  • Aurora MySQL creates a source database and RedShift creates a target database.
  • Use CloudFormation to provision an Aurora MySQL cluster and Redshift Serverless data warehouse.
  • Populate source data in Amazon Aurora MySQL by creating a sample database on your local system.
  • Create dbt models in dbt Cloud and deploy on Amazon Redshift.
  • Use dbt to verify and test models, and to produce and deploy documentation and production jobs.

Read Full Article

like

12 Likes

source image

Amazon

3w

read

106

img
dot

Image Credit: Amazon

Intel Accelerators on Amazon OpenSearch Service improve price-performance on vector search by up to 51%

  • Intel Advanced Vector Extensions 512 (AVX-512) technology acceleration on vector search workloads is now available on OpenSearch 2.17+ domains with 4th generation Intel Xeon Intel instances on the Amazon OpenSearch Service.
  • C/M/R 7i instances, which use Intel AVX-512, can lead to up to 51% more vector search performance at no additional cost compared to previous R5 Intel instances.
  • Vector search is used to improve the search quality of applications by encoding content into vectors to find similarities between content and perform tasks.
  • Intel Xeon Scalable processors on 7th generation Intel instances use AVX-512 to increase the efficiency of vector operations.
  • Using R7i instances can lead to higher price-performance on OpenSearch vector workloads.
  • Lucene engine and FAISS engine results show up to a 44% and 51% improvement in price-performance, respectively, when upgrading from R5 to R7i instances.
  • AVX-512 accelerators contribute to further price-performance gains by reducing power performance costs.
  • By upgrading to the latest Intel instances, search experiences can be modernized while potentially lowering costs.
  • Vector search is highly relevant for image to semantic search, recommendations, and generative AI applications.
  • Vector search offers contextually relevant insights, which can be further enriched by AI for hyper-personalization and integrated with generative models to power chatbots.

Read Full Article

like

6 Likes

source image

Siliconangle

3w

read

265

img
dot

Image Credit: Siliconangle

Streaming data infrastructure: Scaling AI with cloud-native innovation

  • Efficient streaming data infrastructure is crucial for scaling AI training and operational insights in the cloud-native era.
  • Challenges of scaling technologies like Apache Kafka include escalating costs, operational complexities, and inefficiencies of legacy architectures.
  • Buf Technologies focuses on modernizing streaming data infrastructure by addressing inefficiencies in traditional systems and emphasizing cost efficiency and scalability.
  • They emphasize the importance of schema-driven development and the role of schemas like Protobufs in ensuring data integrity and compatibility.

Read Full Article

like

15 Likes

source image

TechBullion

3w

read

44

img
dot

Image Credit: TechBullion

Common Challenges with Gremlin Graph Database IDE: What to Expect

  • Gremlin Graph Database IDE has a steep learning curve, requiring understanding of graph theory concepts.
  • Performance can be an issue with complex queries or large datasets, but can be improved with query optimization and indexing.
  • Integration of Gremlin into existing systems may cause trouble with legacy systems or frameworks.
  • The Gremlin community is not as large as other database communities, making it harder to find support for specific issues.
  • Debugging graph queries can be complex, but can be facilitated by using logging tools and query plans.
  • Staying up-to-date with latest features and updates is crucial to maximize the value of the Gremlin IDE.

Read Full Article

like

2 Likes

source image

Amazon

4w

read

279

img
dot

Image Credit: Amazon

Run Apache XTable in AWS Lambda for background conversion of open table formats

  • Open table formats are critical to transactional data lakes and offer features such as partitioning, schema evolution, time-travel capabilities, and ACID transactions, addressing traditional problems in data lakes.
  • Apache XTable facilitates seamless conversions between OTFs eliminating many of the challenges associated with table format conversions.
  • This post explores how Apache XTable combined with the AWS Glue Data Catalog enables background conversions between OTFs with minimal or no changes to existing pipelines in a scalable and cost-effective way provided by AWS.
  • XTable works by translating table metadata using the existing APIs of OTFs, enabling interoperability through commonalities among Hudi, Iceberg, and Delta Lake.
  • XTable provides two metadata translation methods - Full Sync, which translates all commits, and Incremental Sync, which only translates new, unsynced commits for greater efficiency with large tables.
  • Detection for tables to be scanned for conversion is based on a Lambda function that scans the Data Catalog for tables that are candidates for conversion.
  • XTable is focused on achieving feature parody with OTFs' built-in features, including adding capabilities such as support for Merge-on-Read tables, and syncing table formats across multiple catalogs like the AWS Glue, Hive and Unity catalog.
  • In practice, XTable can be used in a broad range of analytical workloads, including business intelligence and machine learning. Amazon S3 stores data lakes, where OTFs are stored, allowing you to take advantage of AWS-native services like EMR for processing data, Athena to analyse data, and SageMaker to build machine learning models.
  • In this post, the authors demonstrated how to build a background conversion job for OTFs, using XTable and the Data Catalog, which is independent from data pipelines and transformation jobs.
  • This Lambda based XTable deployment can be reused in other solutions to allow for near real-time conversion of OTFs, which can be invoked by Amazon S3 object events resulting from changes to OTF metadata.

Read Full Article

like

16 Likes

source image

Currentanalysis

4w

read

164

img
dot

Image Credit: Currentanalysis

HCLTech Builds Customer Confidence by Offering Outcomes-based Pricing Models for Generative AI (GenAI)

  • HCLTech is building customer confidence in its GenAI services by offering outcomes-based pricing models.
  • The pricing model ensures that HCLTech will only charge for services once projects are in production.
  • HCLTech's portfolio focuses on moving customers from experimentation to production with AI-led deals.
  • The company plans to expand its partner ecosystem and provide industry-specific solutions.

Read Full Article

like

9 Likes

source image

Precisely

4w

read

275

img
dot

Image Credit: Precisely

What Nobody Tells You About Deploying GenAI

  • Deploying GenAI applications is more complex in practice despite the availability of APIs like OpenAI and Google.
  • Creating customized evaluations for LLMs is important to accurately assess their performance for specific use cases.
  • Implementing input and output guardrails is crucial to safeguard users and systems from potential risks.
  • Experimenting with response formats and proactively detecting and repairing failures are essential for effective GenAI deployment.

Read Full Article

like

16 Likes

For uninterrupted reading, download the app