menu
techminis

A naukri.com initiative

google-web-stories
Home

>

Big Data News

Big Data News

source image

TechBullion

4w

read

97

img
dot

Image Credit: TechBullion

Effective Feedback Examples for Call Center Coaching Success

  • Feedback is one of the most powerful tools in call center coaching.
  • Positive reinforcement and constructive feedback help agents improve their performance and boost morale.
  • Effective feedback is clear, actionable, and delivered with empathy.
  • Well-coached agents create better customer experiences, leading to higher satisfaction scores.

Read Full Article

like

5 Likes

source image

Amazon

4w

read

410

img
dot

Image Credit: Amazon

Introducing generative AI troubleshooting for Apache Spark in AWS Glue (preview)

  • AWS has announced the preview of generative AI troubleshooting for Apache Spark in its Glue service.
  • The tool uses machine learning and AI tech to provide root cause analysis for failed Spark apps, along with remediation advice.
  • It works by analysing job metadata, metrics and logs to create detailed root cause analyses.
  • Users can initiate the process by clicking one button in the AWS Glue console.
  • The tool aims to reduce mean time to resolution from days to minutes, optimise Spark applications for cost and performance, and allow users to focus on deriving value from data.
  • Manually debugging Spark apps is challenging because of the distributed nature of the platform and the multiple configuration issues that often arise.
  • Common Spark issues, such as resource setup and access problems, memory and disk exceptions, are supported in the preview.
  • The preview is currently available in all commercial regions and on AWS Glue version 4.0.
  • Validation runs, used to test proposed solutions, will be charged according to standard AWS Glue pricing.
  • Generative AI Spark troubleshooting aims to simplify the process of debugging Spark applications by automatically identifying the root cause of failures and providing actionable recommendations to resolve the issue.

Read Full Article

like

24 Likes

source image

Amazon

4w

read

35

img
dot

Image Credit: Amazon

Accelerate your data workflows with Amazon Redshift Data API persistent sessions

  • Amazon Redshift Data API simplifies access to Amazon Redshift data warehouses by providing a secure HTTP endpoint for executing SQL queries which removes the need for managing drivers, connections, network configurations, etc.
  • An improvement that Amazon Redshift has made is session reuse that can significantly streamline multi-step, stateful workloads that involve sequential queries, which helps reduce complexity and optimize the use of database connections.
  • With Amazon Redshift’s Data API session reuse, a single long-lived session can be used for the entire exchange, transform and load(ETL) process. It allows reusing the same temporary tables throughout the phases of the ETL process which was not possible earlier and thus simplifies the ETL pipeline execution.
  • The article explains in detail how to create, populate and query temporary staging tables using session reuse across the full data transformation workflow in a persistent Amazon Redshift database session.
  • The Amazon Redshift Data API is suitable for enabling connections to the Amazon Redshift database, serverless or provisioned clusters, without the need for persistent connections to a cluster.
  • Relevant use cases for Data API include accessing Amazon Redshift from custom applications using programming languages supported by the AWS SDK, building a serverless data processing workflow, designing asynchronous web dashboards, building and scheduling SQL scripts for ETL pipelines.
  • Best practices while using the Data API include federating IAM credentials to the database, customizing policies to provide fine-grained access, limiting data retrieval from clients to 100 MB, etc.
  • The newly launched session reuse functionality in Amazon Redshift Data API has been demonstrated thoroughly with best practices in this article.

Read Full Article

like

2 Likes

source image

Amazon

4w

read

107

img
dot

Image Credit: Amazon

Accelerate your migration to Amazon OpenSearch Service with Reindexing-from-Snapshot

  • Reindexing-from-Snapshot (RFS) is a new mechanism that simplifies migrating from self-managed OpenSearch and Elasticsearch clusters in legacy versions to Amazon OpenSearch Service.
  • RFS uses shard-level codependency and snapshots to migrate data from OpenSearch and Elasticsearch clusters at high throughput without impacting the performance of the source cluster.
  • RFS minimizes the performance impact to source clusters by retrieving data from a snapshot of the source cluster for data migration. This ensures that there is minimal downtime and data consistency concerns during migration.
  • Reindexing is the core mechanism adopted by RFS for data migration, and it can migrate data across multiple major versions in one hop and make sure the data is fully updated and readable in the target cluster’s version.
  • RFS leverages RFS workers to operate at the shard-level for high throughput. Each worker pulls down an un-migrated shard from the snapshot bucket and reindexes its documents against the target cluster.
  • During the RFS migration, users can monitor the progress of the migration using the console CLI, which reports both the number of shards yet to be migrated and the number that have been completed.
  • RFS is a low-cost solution for data migration, and it can take as little as 35 minutes to perform a migration, depending on the cluster size.
  • To use RFS to migrate to Amazon OpenSearch Service, try the Migration Assistant solution.
  • This post is written by Senior AWS Product Manager, Hang (Arthur) Zuo, Senior AWS Engineer, Chris Helma, AWS Software Development Engineer II, Andre Kurait and Sr. Search Specialist Solutions Architect with Amazon OpenSearch Service, Prashant Agrawal.

Read Full Article

like

6 Likes

source image

Amazon

4w

read

26

img
dot

Image Credit: Amazon

From data lakes to insights: dbt adapter for Amazon Athena now supported in dbt Cloud

  • AWS has announced that dbt adapter for Amazon Athena is now officially supported in dbt Cloud to enhance the overall data workflow experience.
  • With the dbt adapter for Athena adapter now supported in dbt Cloud, you can seamlessly integrate your AWS data architecture with dbt Cloud, taking advantage of the scalability and performance of Athena to simplify and scale your data workflows efficiently.
  • The support of the dbt adapter for Athena in dbt Cloud offers several advantages over using it with dbt Core, including easier job scheduling, collaboration and version control, monitoring and alerting, and managed infrastructure for running dbt projects.
  • Common use cases for using the dbt adapter with Athena include building a data warehouse, incremental data processing, cost management and optimization, data archiving and tiered storage, and event-driven data transformations.
  • To get started, data teams must create a project and set up a connection with Athena in dbt Cloud and use the dbt Cloud IDE to deploy their project.
  • The combination of dbt and Athena creates a powerful and efficient environment for transforming and analyzing data in a serverless architecture, making it straightforward to manage complex data pipelines, reduce costs, and scale operations.
  • This partnership between AWS and dbt is a testament to their shared vision of making data more accessible, reliable, and valuable for organizations of all sizes.
  • AWS is committed to providing the best possible tools and services to help organizations succeed in the cloud, and with dbt Labs, they bring the power of dbt directly to the AWS Cloud.

Read Full Article

like

1 Like

source image

Siliconangle

4w

read

312

img
dot

Image Credit: Siliconangle

Elastic shares surge 16% as company beats quarterly earnings and revenue expectations

  • Shares of Elastic N.V. surged 16% after beating earnings and revenue expectations in Q2 2025.
  • Adjusted earnings per share stood at 50 cents, up from 37 cents in the same quarter last year.
  • Total revenue reached $365 million, up 18% YoY, surpassing the expected $354.3 million.
  • Elastic's net expansion rate came in at 112%, with increased customer spending and a growing customer base.

Read Full Article

like

18 Likes

source image

Amazon

4w

read

196

img
dot

Image Credit: Amazon

AWS Glue Data Catalog supports automatic optimization of Apache Iceberg tables through your Amazon VPC

  • The AWS Glue Data Catalog supports automatic table optimization of Apache Iceberg tables, including compaction, snapshots, and orphan data management.
  • The data compaction optimizer constantly monitors table partitions and kicks off the compaction process when the threshold is exceeded for the number of files and file sizes.
  • The Iceberg table compaction process starts and will continue if the table or any of the partitions within the table has more than the configured number of files (default five files), each smaller than 75% of the target file size.
  • The snapshot retention process runs periodically (default daily) to identify and remove snapshots that are older than the specified retention configuration from the table properties.
  • Similarly, the orphan file deletion process scans the table metadata and the actual data files, identifies the unreferenced files, and deletes them to reclaim storage space.
  • To help achieve such requirements, we provide the capability where the Data Catalog optimizes Iceberg tables to run in your specific VPC.
  • By default, a table optimizer is not associated with any of your VPCs and subnets.
  • With this new capability of supporting data access from VPCs, you can associate a table optimizer with an AWS Glue network connection to run in a specific VPC, subnet, and security group.
  • This feature is available today in all AWS Glue supported AWS Regions.
  • The post includes a sample AWS CloudFormation template that enables a quick setup of the solution resources.

Read Full Article

like

11 Likes

source image

Amazon

4w

read

343

img
dot

Image Credit: Amazon

Run high-availability long-running clusters with Amazon EMR instance fleets

  • High availability Amazon EMR on EC2 clusters with instance fleet configuration is now supported by AWS. EMR is a cloud big data processing platform that uses open source frameworks like Apache Spark, Presto and Flink. High availability (HA) provides continuous uptime and fault tolerance for Hadoop clusters, removing single points of failure with redundant standby nodes. Instance fleets provide enhanced resiliency and flexibility, with improved EC2 instance selection through allocation strategies. Improved target capacity management makes instance fleets more resilient to fluctuations in specific pools, while multiple subnets offer enhanced availability, allowing Amazon EMR to choose the best purchasing options and instances across zones for cluster launch.
  • To launch a high availability instance fleet cluster using the Amazon EMR console, create a new cluster, select use high availability, choose desired instance types and target capacities, allocate strategies, subnets, and review cluster configuration before creating the cluster. Launch a high availability cluster using AWS CloudFormation by creating a CloudFormation template, a CloudFormation stack, and a list of clusters response. The describe-cluster command verifies whether the high availability cluster launched successfully with three nodes in running state and provisionedOnDemandCapacity equals to 3.
  • Instances can fail or become unhealthy for multiple reasons, including disk space issues, high CPU utilization, critical cluster daemons shutting down with errors and more. With multi-master instance group nodes running or multi-master instance group nodes running percentage Amazon CloudWatch metrics, primary nodes can be monitored for health and status to ensure smooth operations. Allocation strategies should be enabled, subnets dedicated to EMR clusters, and core nodes configured for enhanced data availability with at least four nodes to minimize the risk of HDSF data losses on production clusters.
  • Through setting up a high availability instance fleet cluster with Amazon EMR on EC2, instance diversity increases, and better spot capacity management is provided. High availability makes it possible to endure failures, maintain uninterrupted operations, and to provide an additional layer of reliability to critical components of clusters. Thus, EMR clusters are created to withstand failures and to maintain continuous operation while providing enhanced resiliency, instance diversity, and better spot capacity management within a single Availability Zone.

Read Full Article

like

20 Likes

source image

Cloudera

4w

read

215

img
dot

Image Credit: Cloudera

Elevating Productivity: Cloudera Data Engineering Brings External IDE Connectivity to Apache Spark

  • Cloudera Data Engineering releases version 1.23 on public cloud with major enhancements in development productivity.
  • It introduces External IDE Connectivity, allowing data engineers to access Apache Spark clusters and data pipelines from their preferred coding environments.
  • The release also includes support for Apache Iceberg 1.5 and Apache Spark 3.5, improving cost-effectiveness and performance.
  • Cloudera Data Engineering offers secure data pipelining, simplified workflows, and data interoperability with lower TCO.

Read Full Article

like

12 Likes

source image

Precisely

4w

read

0

img
dot

Image Credit: Precisely

Automation and Data Integrity: A Duo for Digital Transformation Success

  • Automation and data integrity are at the core of successful digital transformation for businesses today.
  • Data strategy, maintenance, and improvement are the foundation of digital transformation and impact decision-making.
  • Data and processes are deeply interconnected; neither can be neglected in the transformation process.
  • Low-code/no-code automation platforms allow for aligned evolution of processes and data management in SAP® environments.
  • The primary challenges for businesses today include complexities in data, processes, and organizational structures.
  • Automation helps to untangle these complexities, making data more accessible, and processes more seamless.
  • Automation and data integrity ensure high-quality, actionable data every step of the way, leading to increased operational efficiency.
  • Precisely Automate offers three powerful solutions for both data and process automation that simply complex tasks.
  • Precisely customers have realized numerous benefits and opportunities for SAP food records using the Automate platform.
  • Combining automation with data integrity powers innovation and growth for businesses by unlocking the full potential of data.

Read Full Article

like

Like

source image

TechBullion

4w

read

80

img
dot

Image Credit: TechBullion

Snowflake to Acquire Datavolo Expanding Open Data Integration Capabilities

  • Snowflake has signed a definitive agreement to acquire Datavolo, a company that specializes in rapidly increasing the creation, management, and observability of multimodal data pipelines for enterprise AI.
  • With this acquisition, Snowflake aims to deepen its service in the data lifecycle and provide a simple way for data engineering teams to integrate their enterprise systems with Snowflake's unified platform.
  • The partnership between Datavolo and Snowflake will simplify data engineering workloads and offer unmatched data interoperability and extensibility, supporting effective enterprise AI.
  • Snowflake plans to maintain and nurture the Apache NiFi project, enabling full interoperability for both Snowflake customers and the NiFi community.

Read Full Article

like

4 Likes

source image

TechBullion

4w

read

116

img
dot

Image Credit: TechBullion

Snowflake Partners with Anthropic to Bring Claude Models to the AI Data Cloud

  • Snowflake has partnered with Anthropic to bring Claude Models to the AI Data Cloud.
  • Anthropic's Claude 3.5 models will be available within Snowflake Cortex AI, allowing users to develop and scale AI products and workflows.
  • By combining Claude's reasoning and problem-solving with Snowflake's platform, businesses can unlock the potential of data for conversational assistants and language processing.
  • Anthropic's models will be available in select AWS regions, enabling enhanced reasoning and natural conversational abilities for chatbots in Snowflake Cortex AI.

Read Full Article

like

6 Likes

source image

Siliconangle

4w

read

281

img
dot

Image Credit: Siliconangle

Snowflake’s shares surge higher on blowout earnings, a promising acquisition and new AI partnership

  • Cloud data warehouse company Snowflake Inc. saw its stocks rise by over 20% in late trading on Thursday following a massive earnings and revenue beat.
  • The company also announced plans to acquire data integration start-up Datavolo and secured a new artificial intelligence partnership with Anthropic PBC, further adding to investor's optimism.
  • For Q3 2021, Snowflake reported a 20 cents per share in earnings, beating Wall Street's consensus estimate of 15 cents. While the company saw a net loss of $324.3 million in the quarter, up from a loss of $214.3 million in the same period the previous year.
  • Snowflake managed to add 369 paying customers to its roster for Q3 2021, taking its total to 10,618.
  • As companies increasingly look to adopt cloud computing services, Snowflake has positioned itself well in a highly competitive sector, and it remains to be a rival to key partners like AWS and Microsoft, which provide the company with its underlying infrastructure.
  • Snowflake expects to achieve total product revenue of $3.43 billion in fiscal 2025, implying YoY growth of 29%.
  • Another boost to Snowflake came from the acquisition of data analytics start-up Night Shift Development Inc., which focuses on the U.S. public sector.
  • The acquisition of Datavolo will enable Snowflake to create more versatile data pipelines for its own customers.
  • Moreover, a multiyear partnership with Anthropic aims to help improve the ability of Snowflake's AI agents to analyze data and run ad-hoc data analytics.
  • Despite after-hours gains, Snowflake's stock is still down 35% in the year to date, compared to a gain of 24% in the broader S&P 500 index.

Read Full Article

like

16 Likes

source image

TechCrunch

4w

read

312

img
dot

Image Credit: TechCrunch

Snowflake snaps up data management company Datavolo

  • Cloud giant Snowflake is set to acquire data pipeline management company Datavolo.
  • Datavolo uses Apache NiFi to automate data flows and enable flexible data processing pipelines.
  • The acquisition aims to expand Snowflake's data lifecycle capabilities and offer cost savings to customers.
  • Snowflake also reported better-than-expected earnings and announced a partnership with Anthropic.

Read Full Article

like

18 Likes

source image

Siliconangle

4w

read

169

img
dot

Image Credit: Siliconangle

Dell stays ahead of the curve on data management solutions

  • Dell Technologies aims to support customers in updating their data management policies to match the demands of large language models.
  • The complexities of setting up a modern data storage infrastructure, including security requirements and government regulations, are being addressed by Dell's Data Lakehouse solution.
  • Dell's partnership with Nvidia resulted in the Dell AI Factory, which offers AI implementation services and allows customers to extract metadata and produce accurate AI models.
  • Dell also provides professional service opportunities for customers early in their data management journey.

Read Full Article

like

10 Likes

For uninterrupted reading, download the app