menu
techminis

A naukri.com initiative

google-web-stories
Home

>

Big Data News

Big Data News

source image

Cloudera

15h

read

138

img
dot

Image Credit: Cloudera

Introducing Cloudera Fine Tuning Studio for Training, Evaluating, and Deploying LLMs with Cloudera AI

  • Cloudera Fine Tuning Studio, a one-stop-shop studio application that covers the entire workflow and lifecycle of fine tuning, evaluating, and deploying fine-tuned LLMs in Cloudera’s AI Workbench.
  • Large Language Models (LLMs) can optimize costs, save time, and increase productivity for both internal and external use cases in enterprises.
  • Several LLMs are publicly available through APIs from OpenAI, Anthropic, AWS, and others, but enterprises often can’t use them because of private data sources, company-specific formatting, and hosting costs.
  • Fine tuning can solve these issues to provide specific coding formats and standards, reduce training time, and achieve cost benefits with smaller models customized to specific tasks or use cases.
  • Cloudera's Fine Tuning Studio enables users to track all resources for fine tuning and evaluating LLMs, build and test prompts, train new adapters for LLMs, evaluate trained LLMs, and deploy them to production environments.
  • Fine Tuning Studio comes with MLFlow experiments integration, several ways to test the performance of trained models and compare their performance, and deep integrations with Cloudera’s AI suite of tools to deploy, host, and monitor LLMs.
  • Fine Tuning Studio ships with a convenient Python client that makes calls to the Fine Tuning Studio’s core server, enabling data scientists to build and develop their own training scripts.
  • Cloudera’s Fine Tuning Studio is available to Cloudera AI customers as an Accelerator for Machine Learning Projects (AMP).
  • The team behind Fine Tuning Studio aims to provide customers with a streamlined approach to fine tune any model, on any data, for any enterprise application.
  • Fine Tuning Studio can be utilized to fine tune smaller, cost-effective models to perform specific tasks, outperforming larger, more generalized models.

Read Full Article

like

8 Likes

source image

SiliconCanals

1d

read

543

img
dot

Image Credit: SiliconCanals

Germany’s big xyt secures €10M funding led by Amsterdam’s Finch Capital

  • Frankfurt-based big xyt, a data analytics company for financial markets, has secured €10M funding led by Amsterdam's Finch Capital.
  • The funds will be used to expand into key markets such as Europe, the US, and Asia-Pacific.
  • The investment will support hiring and strengthen big xyt's position in AI-based data analytics for financial markets.
  • big xyt offers a scalable platform for analysing global market data and providing analytics solutions to various clients.

Read Full Article

like

6 Likes

source image

Siliconangle

1d

read

1.1k

img
dot

Image Credit: Siliconangle

SAS buys synthetic data software from Hazy to streamline AI development

  • SAS Institute Inc. acquires intellectual property from Hazy Ltd to boost its artificial intelligence portfolio.
  • The acquisition enables SAS to provide customers with tools to create synthetic data for AI workloads.
  • Hazy's platform enables companies to use sensitive data to create synthetic information without exposing private or restricted information.
  • SAS plans to integrate Hazy's tools with SAS Data Maker platform, providing customers with richer synthetic datasets and accelerating AI projects.

Read Full Article

like

13 Likes

source image

Siliconangle

1d

read

174

img
dot

Image Credit: Siliconangle

Data mapping provider Lume raises $4.2M in funding

  • Lume AI Inc. raises $4.2 million in seed funding to enhance its data mapping technology.
  • General Catalyst led the funding round, with participation from Khosla Ventures, Floodgate, Y Combinator, Soma Capital, and angel investors.
  • Lume's software platform simplifies the task of creating data mapping pipelines, automatically generating scripts to reformat data between applications.
  • The company aims to save developers time and enable easier data movement within organizations, offering support for JSON, CSV, and XML formats.

Read Full Article

like

10 Likes

source image

TechBullion

1d

read

354

img
dot

Image Credit: TechBullion

Innovative Database Column Expansion and Automation for Scalable System

  • Crafting an optimal innovative solution requires careful assessment of the system’s current state, aligning with evolving business needs, and balancing security, performance, and budget constraints.
  • The Database Column Expansion Project was initiated to enhance the brewery system, expanding the Product ID field from 2 bytes to 4 bytes—a change essential for supporting future product introductions.
  • Identifying every impacted Product_Id column across 128 applications developed in various programming languages was complex, given inconsistent naming conventions across tables.
  • The initiative demanded rigorous planning and efficient execution to handle the high transaction volume that the brewery system handles while ensuring that the system’s performance wasn’t compromised.
  • To facilitate this process, a comprehensive impact analysis utility was created using Oracle PL/SQL, leveraging stored procedures and packages, to accurately identify and verify the impacted columns.
  • The approved design utilized a more streamlined and efficient approach with views and synonyms to ensure the smooth operation of both remediated and non-remediated applications.
  • The development of source code for each database view object, “Instead of” triggers, and other associated components would require several hours per day, prompting the development of a robust automation framework using Oracle PL/SQL and UNIX Shell scripting.
  • A comprehensive performance tuning initiative, combining proactive and incremental tuning with continuous monitoring, dramatically optimized data processing speeds.
  • This project successfully delivered a groundbreaking solution for database column expansion across multiple systems, enabling seamless scalability for future product launches.
  • Through the automation of database component creation and compilation, significant cost savings and enhanced operational efficiency were achieved.

Read Full Article

like

21 Likes

source image

Amazon

1d

read

157

img
dot

Image Credit: Amazon

Streamlining AWS Glue Studio visual jobs: Building an integrated CI/CD pipeline for seamless environment synchronization

  • AWS Glue enables organizations to make data-driven business decisions by providing seamless integration throughout the development lifecycle, many customers have integrated their data across multiple sources using AWS Glue.
  • AWS Glue Studio visual jobs provide a graphical interface called the visual editor that you can use to author extract, transform, and load (ETL) jobs in AWS Glue visually.
  • To address the needs of a streamlined development lifecycle and seamless synchronization between environments, an end-to-end solution is presented, combining the power of the AWS Glue Visual Job API, a custom AWS Glue Resource Sync Utility, and an based continuous integration and continuous deployment (CI/CD) pipeline.
  • The AWS Glue Resource Sync Utility is a Python application developed on top of the AWS Glue Visual Job API, designed to synchronize AWS Glue Studio visual jobs across different accounts without losing the visual representation.
  • The solution uses three separate AWS accounts. One account is designated for the development environment, another for the production environment, and a third to host the CI/CD infrastructure and pipeline.
  • The AWS account responsible for hosting the CI/CD pipeline is composed of three key components: Managing AWS Glue Job updates, Cross-Account Access Management, and Version Control Integration.
  • You can create AWS Glue Studio visual jobs using the intuitive visual editor in your development account.
  • By serializing AWS Glue Studio visual jobs to JSON files and committing them to a Git repository, you enable version control for your data integration workflows.
  • By following this approach you can track changes, collaborate with team members, and easily deploy jobs to other accounts or environments.
  • This solution empowers data engineers to focus on building robust data integration pipelines while automating the complexities of managing and deploying AWS Glue Studio visual jobs across multiple environments.

Read Full Article

like

9 Likes

source image

Siliconangle

1d

read

288

img
dot

Image Credit: Siliconangle

Alteryx simplifies analytics for hybrid data infrastructures

  • Alteryx is expanding its platform to support hybrid data infrastructures.
  • The Fall 2024 update includes new AI capabilities to automate insight generation and streamline reporting.
  • The update introduces connectors for Google Cloud Storage and SingleStore, making it easier to set up data pipelines.
  • Other enhancements include Magic Reports for advanced data reporting and visualization, as well as improvements to data preparation and blending tools.

Read Full Article

like

17 Likes

source image

Amazon

2d

read

372

img
dot

Image Credit: Amazon

Use Amazon Kinesis Data Streams to deliver real-time data to Amazon OpenSearch Service domains with Amazon OpenSearch Ingestion

  • In this post, we show how to use Amazon Kinesis Data Streams to buffer and aggregate real-time streaming data for delivery into Amazon OpenSearch Service domains and collections using Amazon OpenSearch Ingestion.
  • Kinesis Data Streams enhances log aggregation by decoupling producer and consumer applications, and providing a resilient, scalable buffer to capture and serve log data.
  • OpenSearch Ingestion is a serverless pipeline that provides powerful tools for extracting, transforming, and loading data into an OpenSearch Service domain.
  • The use case for centralizing log aggregation is also discussed for an organization that has a compliance need to archive and retain its log data, and how standardizing logging approaches reduces development and operational overhead for organizations.
  • The article guides readers through creating an AWS Identity and Access Management (IAM) role that allows read access to the Kinesis data stream and read/write access to the OpenSearch domain for configuring OpenSearch Ingestion pipeline to process log data and providing detailed explanation with example how to parse the log message fields.
  • Readers are also provided with several key areas to monitor for maintaining the health of log ingestion pipeline such as Kinesis Data Streams metrics, CloudWatch subscription filter metrics, OpenSearch Ingestion metrics, and OpenSearch Service metrics.
  • Lastly, the article concludes with some suggestions for other use cases for OpenSearch Ingestion and Kinesis Data Streams, such as using anomaly detection, trace analytics, and hybrid search with OpenSearch Service.
  • The authors of the article are M Mehrtens, Arjun Nambiar, and Muthu Pitchaimani.

Read Full Article

like

22 Likes

source image

Precisely

2d

read

146

img
dot

Image Credit: Precisely

Boosting Customer Loyalty with Personalization and Communication Strategies

  • Personalized customer experiences are key to building customer loyalty.
  • Data-driven personalization analyzes customer behaviour to tailor communications to meet specific needs.
  • High-quality data is necessary for effective personalization.
  • Data-driven personalization improves customer experience, engagement and loyalty.
  • To implement data-driven personalization, businesses should focus on high-quality data, develop a personalization strategy, and measure and optimize efforts.
  • Siloed communication channels create fragmented customer experiences, leading to poor customer satisfaction.
  • Unified communication channels streamline change management, improve collaboration and enhance customer engagement.
  • Choosing the right transformation partner is essential, including deep technical expertise, proven change management skills, and experience in customer experience management.
  • Improving data-driven personalization and unified communications increases customer loyalty, streamlines operations and drives sustained success in the competitive market.
  • Partnering with a digital transformation expert is a crucial step in successfully navigating the journey to stronger CX.

Read Full Article

like

8 Likes

source image

Precisely

2d

read

382

img
dot

Image Credit: Precisely

4 Practical Tips for Implementing Data-Driven Personalization

  • Data-driven personalization involves using your customer data to tailor communications and interactions that meet their individual preferences and needs.
  • The data used for personalization must be of high quality, accurate, up-to-date, and free of redundancies to avoid inconsistent messaging, which erodes customer trust.
  • Implementing a unified customer communication solution is crucial to avoid fragmented customer experiences and inconsistent messaging due to siloed communication channels.
  • Unified communications streamline change management, improve collaboration, and eliminate redundant processes, leading to faster decision-making and a more agile approach to customer engagement.
  • To implement data-driven personalization and unify communications, businesses need a digital transformation expert who can integrate advanced technologies and manage organizational change.
  • Selecting the right partner with deep technical expertise, proven change management skills, and customer experience management (CXM) experience is essential for success.
  • Data-driven personalization benefits businesses by enhancing the customer experience, increasing customer engagement, and improving customer loyalty.
  • Specific tips for implementing data-driven personalization include unifying communication channels, focusing on high-quality data, developing a personalization strategy, and measuring and optimizing efforts.
  • High-quality data and seamless communication across all touchpoints are essential for driving engagement and building long-term customer loyalty.
  • Partnership with a digital transformation expert is crucial to navigating the journey of data-driven personalization and unified communications for businesses.

Read Full Article

like

23 Likes

source image

Amazon

6d

read

41

img
dot

Image Credit: Amazon

Amazon OpenSearch Service announces Standard and Extended Support dates for Elasticsearch and OpenSearch versions

  • Amazon OpenSearch Service supports 19 versions of Elasticsearch opensource, and 11 versions of OpenSearch.
  • End of Support dates for legacy Elasticsearch versions up to 6.7, Elasticsearch versions 7.1 through 7.8, OpenSearch versions from 1.0 through 1.2, and OpenSearch versions 2.3 through 2.9 available on Amazon OpenSearch Service, have been announced.
  • We recommend that customers running Elasticsearch versions upgrade to the latest OpenSearch versions.
  • All Elasticsearch versions will receive at least 12 months of Extended Support.
  • For OpenSearch versions running on Amazon OpenSearch Service, we will provide at least 12 months of Standard Support after the end of support date for the corresponding upstream open source OpenSearch version.
  • Upgrading your domain to the latest available OpenSearch version will help you derive maximum value out of OpenSearch Service.
  • Domains running versions under Extended Support will be charged an additional fee per normalized instance hour.
  • We add new capabilities across various vectors to the latest OpenSearch versions, which include new features, performance and resiliency improvements, and security improvements.
  • For any questions on Standard and Extended Support options, see the FAQs. For further questions, contact AWS Support.
  • The authors of this announcement are Arvind Mahesh, Kuldeep Yadav, and Jon Handler.

Read Full Article

like

2 Likes

source image

Precisely

16h

read

264

img
dot

Image Credit: Precisely

Understanding Master Data Management (MDM) and Its Role in Data Integrity

  • Master data management (MDM) ensures accuracy, consistency, and uniformity of a company's data.
  • MDM is important for breaking down data silos, avoiding discrepancies, and informed decision-making.
  • It acts as a single source of truth and offers a holistic and up-to-date view of business data.
  • MDM synchronizes data changes across all systems in case of customer updates thereby avoiding costly errors.
  • MDM ensures accuracy, consistency, and integrity of master data throughout its entire life cycle, including validation and deprecation of records.
  • Challenges in implementing MDM include data literacy, governance, ownership, and data protection.
  • MDM's cross-domain management ability can benefit all industries.
  • A system of record is a centralized repository where critical business data is stored and managed, and is closely tied to MDM.
  • Master data consists of key business entities amidst various data domains including party, reference, location, financial, and revenue.
  • Precisely offers EnterWorks multi-domain MDM and its Data Integrity Suite including Data Observability, Data Governance, and Data Quality to address MDM challenges.

Read Full Article

like

15 Likes

source image

Amazon

2d

read

216

img
dot

Image Credit: Amazon

Achieve data resilience using Amazon OpenSearch Service disaster recovery with snapshot and restore

  • Amazon OpenSearch Service is a fully managed service offered by AWS that enables you to deploy, operate, and scale OpenSearch domains effortlessly.
  • Snapshot and restore in OpenSearch Service involves creating point-in-time backups, known as snapshots, of your OpenSearch domain.
  • Snapshot and restore strategy helps organizations meet Recovery Point Objectives (RPOs) and Recovery Time Objectives (RTOs).
  • The snapshot and restore strategy in OpenSearch Service results in longer downtimes and greater loss of data between when the disaster event occurs and recovery.
  • In the event of a disaster, you can fail over to the OpenSearch Service domain in the secondary Region using IaC.
  • Infrastructure as code (IaC) methods such as using AWS CloudFormation or the AWS Cloud Development Kit (AWS CDK) enable you to deploy consistent infrastructure across Regions.
  • In a disaster, if your OpenSearch Service domain in the primary Region goes down, you can fail over to a domain in the secondary Region.
  • To maintain business continuity during a disaster, you can use message queues like Amazon Simple Queue Service (Amazon SQS) and streaming solutions like Apache Kafka or Amazon Kinesis.
  • When the primary Region becomes available again, you can seamlessly revert to the OpenSearch Service domain in the primary Region.
  • In conclusion, by following the best practices provided in the AWS Well-Architected Reliability Pillar, you can achieve data resilience using Amazon OpenSearch Service disaster recovery with snapshot and restore.

Read Full Article

like

13 Likes

source image

Amazon

5d

read

377

img
dot

Image Credit: Amazon

Incremental refresh for Amazon Redshift materialized views on data lake tables

  • Amazon Redshift allows precomputed query results in the form of materialized views for faster query response times from your data warehouse.
  • Redshift supports incremental refresh capability for local tables, which is useful for aggregations and multi-table joins specifically.
  • Customers use data lake tables for cost-effective storage and interoperability with other tools.
  • Amazon Redshift now provides the ability to incrementally refresh installed materialized views on data lake tables.
  • Incremental refreshes on standard data lake tables enable building and refreshing materialized views in Amazon Redshift maintaining data freshness with a cost-effective approach.
  • Incremental refreshes are also possible for data lake tables using Apache Iceberg.
  • Amazon Redshift's introduction of incremental refresh provides substantial performance gains over full recompute.
  • Materialized views on data lake tables can be valuable for optimizing SQL queries for faster data analysis.
  • For best practices on materialized views on data lake tables in Amazon Redshift Spectrum, check out the AWS documentation.
  • Amazon Redshift makes it cost-effective to analyze structured and semi-structured data using standard SQL and business intelligence tools.

Read Full Article

like

22 Likes

source image

Siliconangle

6d

read

371

img
dot

Image Credit: Siliconangle

Uber revamps its operating model with real-time data and microservices orchestration

  • Uber has built an expansive architecture that powers real-time capabilities such as logistics services and ticket bookings.
  • Central to Uber’s application is its real-time nature, a fundamental aspect that sets it apart from many other applications.
  • The real-time interactions demand that the server rather than the app initiate much of the data push.
  • Uber's infrastructure combines various technologies to handle complex processes.
  • The company uses Google Spanner as the transactional database and custom frameworks for real-time event propagation and orchestration.
  • Behind-the-scenes, machine learning algorithms facilitate the dynamism of Uber’s different apps.
  • The data generated by Uber’s users and server-side data need to be meticulously synchronized.
  • Events and metadata are sent into Kafka for processing and storage in Uber’s Hive tables.
  • The app’s logic is informed by both current context and historical patterns.
  • Real-time orchestration and machine learning integration are essential to the smooth functioning of Uber’s different apps.

Read Full Article

like

22 Likes

For uninterrupted reading, download the app