{"id":3093106,"date":"2024-02-01T13:07:23","date_gmt":"2024-02-01T18:07:23","guid":{"rendered":"https:\/\/wordpress-1016567-4521551.cloudwaysapps.com\/plato-data\/combine-transactional-streaming-and-third-party-data-on-amazon-redshift-for-financial-services-amazon-web-services\/"},"modified":"2024-02-01T13:07:23","modified_gmt":"2024-02-01T18:07:23","slug":"combine-transactional-streaming-and-third-party-data-on-amazon-redshift-for-financial-services-amazon-web-services","status":"publish","type":"station","link":"https:\/\/platodata.io\/plato-data\/combine-transactional-streaming-and-third-party-data-on-amazon-redshift-for-financial-services-amazon-web-services\/","title":{"rendered":"Combine transactional, streaming, and third-party data on Amazon Redshift for financial services | Amazon Web Services"},"content":{"rendered":"
Financial services customers are using data from different sources that originate at different frequencies, which includes real time, batch, and archived datasets. Additionally, they need streaming architectures to handle growing trade volumes, market volatility, and regulatory demands. The following are some of the key business use cases that highlight this need:<\/p>\n
The challenge is to come up with a solution that can handle these disparate sources, varied frequencies, and low-latency consumption requirements. The solution should be scalable, cost-efficient, and straightforward to adopt and operate. Amazon Redshift<\/a> features like streaming ingestion, Amazon Aurora<\/a> zero-ETL integration<\/a>, and data sharing with AWS Data Exchange<\/a> enable near-real-time processing for trade reporting, risk management, and trade optimization.<\/p>\n In this post, we provide a solution architecture that describes how you can process data from three different types of sources\u2014streaming, transactional, and third-party reference data\u2014and aggregate them in Amazon Redshift for business intelligence (BI) reporting.<\/p>\n This solution architecture is created prioritizing a low-code\/no-code approach with the following guiding principles:<\/p>\n The following diagram illustrates the solution architecture and how these guiding principles were applied to the ingestion, aggregation, and reporting components.<\/p>\n You can use the following AWS CloudFormation<\/a> template to deploy the solution.<\/p>\n This stack creates the following resources and necessary permissions to integrate the services:<\/p>\n To ingest data, you use Amazon Redshift Streaming Ingestion<\/a> to load streaming data from the Kinesis data stream. For transactional data, you use the Redshift zero-ETL integration<\/a> with Amazon Aurora MySQL. For third-party reference data, you take advantage of AWS Data Exchange data shares<\/a>. These capabilities allow you to quickly build scalable data pipelines because you can increase the capacity of Kinesis Data Streams shards, compute for zero-ETL sources and targets, and Redshift compute for data shares when your data grows. Redshift streaming ingestion and zero-ETL integration are low-code\/no-code solutions that you can build with simple SQLs without investing significant time and money into developing complex custom code.<\/p>\n For the data used to create this solution, we partnered with FactSet<\/a>, a leading financial data, analytics, and open technology provider. FactSet has several datasets<\/a> available in the AWS Data Exchange marketplace, which we used for reference data. We also used FactSet\u2019s market data solutions<\/a> for historical and streaming market quotes and trades.<\/p>\n Data is processed in Amazon Redshift adhering to an extract, load, and transform (ELT) methodology. With virtually unlimited scale and workload isolation, ELT is more suited for cloud data warehouse solutions.<\/p>\n You use Redshift streaming ingestion for real-time ingestion of streaming quotes (bid\/ask) from the Kinesis data stream directly into a streaming materialized view and process the data in the next step using PartiQL for parsing the data stream inputs. Note that streaming materialized views differs from regular materialized views in terms of how auto refresh works and the data management SQL commands used. Refer to Streaming ingestion considerations<\/a> for details.<\/p>\n You use the zero-ETL Aurora integration for ingesting transactional data (trades) from OLTP sources. Refer to Working with zero-ETL integrations<\/a> for currently supported sources. You can combine data from all these sources using views, and use stored procedures to implement business transformation rules like calculating weighted averages across sectors and exchanges.<\/p>\n Historical trade and quote data volumes are huge and often not queried frequently. You can use Amazon Redshift Spectrum<\/a> to access this data in place without loading it into Amazon Redshift. You create external tables pointing to data in Amazon Simple Storage Service<\/a> (Amazon S3) and query similarly to how you query any other local table in Amazon Redshift. Multiple Redshift data warehouses can concurrently query the same datasets in Amazon S3 without the need to make copies of the data for each data warehouse. This feature simplifies accessing external data without writing complex ETL processes and enhances the ease of use of the overall solution.<\/p>\n Let\u2019s review a few sample queries used for analyzing quotes and trades. We use the following tables in the sample queries:<\/p>\n You can use the following query to find weighted average spreads on quotes:<\/p>\n You can use the following query to find You can use Amazon QuickSight<\/a> and Amazon Managed Grafana<\/a> for BI and real-time reporting, respectively. These services natively integrate with Amazon Redshift without the need to use additional connectors or software in between.<\/p>\n You can run a direct query from QuickSight for BI reporting and dashboards. With QuickSight, you can also locally store data in the SPICE cache with auto refresh for low latency. Refer to Authorizing connections from Amazon QuickSight to Amazon Redshift clusters<\/a> for comprehensive details on how to integrate QuickSight with Amazon Redshift.<\/p>\n You can use Amazon Managed Grafana for near-real-time trade dashboards that are refreshed every few seconds. The real-time dashboards for monitoring the trade ingestion latencies are created using Grafana and the data is sourced from system views in Amazon Redshift. Refer to Using the Amazon Redshift data source<\/a> to learn about how to configure Amazon Redshift as a data source for Grafana.<\/p>\n The users who interact with regulatory reporting systems include analysts, risk managers, operators, and other personas that support business and technology operations. Apart from generating regulatory reports, these teams require visibility into the health of the reporting systems.<\/p>\n In this section, we explore some examples of historical quotes analysis from the Amazon QuickSight<\/a> dashboard.<\/p>\n The following chart shows the daily aggregation by sector of the weighted average bid-ask spreads of all the individual trades on NASDAQ and NYSE for 3 months. To calculate the average daily spread, each spread is weighted by the sum of the bid and the ask dollar volume. The query to generate this chart processes 103 billion of data points in total, joins each trade with the sector reference table, and runs in less than 10 seconds.<\/p>\n The following chart shows the daily aggregation of the weighted average bid-ask spreads of all the individual trades on NASDAQ and NYSE for 3 months. The calculation methodology and query performance metrics are similar to those of the preceding chart.<\/p>\n In this section, we explore some examples of historical trades analysis from the Amazon QuickSight<\/a> dashboard.<\/p>\n The following chart shows the daily aggregation by sector of all the individual trades on NASDAQ and NYSE for 3 months. The query to generate this chart processes 3.6 billion of trades in total, joins each trade with the sector reference table, and runs in under 5 seconds.<\/p>\n The following chart shows the daily aggregation by exchange group of all the individual trades for 3 months. The query to generate this chart has similar performance metrics as the preceding chart.<\/p>\n Monitoring and observability is an important requirement for any critical business application such as trade reporting, risk management, and trade management systems. Apart from system-level metrics, it\u2019s also important to monitor key performance indicators in real time so that operators can be alerted and respond as soon as possible to business-impacting events. For this demonstration, we have built dashboards in Grafana that monitor the delay of quote and trade data from the Kinesis data stream and Aurora, respectively.<\/p>\n The quote ingestion delay dashboard shows the amount of time it takes for each quote record to be ingested from the data stream and be available for querying in Amazon Redshift.<\/p>\n The trade ingestion delay dashboard shows the amount of time it takes for a transaction in Aurora to become available in Amazon Redshift for querying.<\/p>\n To clean up your resources, delete the stack you deployed using AWS CloudFormation. For instructions, refer to Deleting a stack on the AWS CloudFormation console<\/a>.<\/p>\n Increasing volumes of trading activity, more complex risk management, and enhanced regulatory requirements are leading capital markets firms to embrace real-time and near-real-time data processing, even in mid- and back-office platforms where end of day and overnight processing was the standard. In this post, we demonstrated how you can use Amazon Redshift capabilities for ease of use, low maintenance, and cost-efficiency. We also discussed cross-service integrations to ingest streaming market data, process updates from OLTP databases, and use third-party reference data without having to perform complex and expensive ETL or ELT processing before making the data available for analysis and reporting.<\/p>\n Please reach out to us if you need any guidance in implementing this solution. Refer to Real-time analytics with Amazon Redshift streaming ingestion<\/a>, Getting started guide for near-real time operational analytics using Amazon Aurora zero-ETL integration with Amazon Redshift<\/a>, and Working with AWS Data Exchange data shares as a producer<\/a> for more information.<\/p>\n <\/p>\nSolution overview<\/h2>\n
\n
<\/p>\n
Deploy the solution<\/h2>\n
<\/a><\/p>\n
Ingestion<\/h2>\n
Processing<\/h2>\n
\n
Sample query for analyzing historical quotes<\/h3>\n
select\ndate_dt :: date,\ncase\nwhen exchange_name like 'Cboe%' then 'CBOE'\nwhen (exchange_name) like 'NYSE%' then 'NYSE'\nwhen (exchange_name) like 'New York Stock Exchange' then 'NYSE'\nwhen (exchange_name) like 'Nasdaq%' then 'NASDAQ'\nend as parent_exchange_name,\nsector_name,\nsum(spread * weight)\/sum(weight) :: decimal (30,5) as weighted_average_spread\nfrom\n(\nselect date_dt,exchange_name,\nfactset_sector_desc sector_name,\n((bid_price*bid_volume) + (ask_price*ask_volume))as weight,\n((ask_price - bid_price)\/ask_price) as spread\nfrom\ndt_hist_quotes a\njoin\nfds_adx_fundamentals_db.ref_v2.factset_sector_map b\non(a.sector_code = b.factset_sector_code)\nwhere ask_price <> 0 and bid_price <> 0\n)\ngroup by 1,2,3<\/code><\/pre>\n<\/p><\/div>\n
Sample query for analyzing historical trades<\/h3>\n
$-volume<\/code> on trades by detailed exchange, by sector, and by major exchange (NYSE and Nasdaq):<\/p>\n
select\ncast(date_dt as date) as date_dt,\ncase\nwhen exchange_name like 'Cboe%' then 'CBOE'\nwhen (exchange_name) like 'NYSE%' then 'NYSE'\nwhen (exchange_name) like 'New York Stock Exchange' then 'NYSE'\nwhen (exchange_name) like 'Nasdaq%' then 'NASDAQ'\nend as parent_exchange_name,\nfactset_sector_desc sector_name,\nsum((price * volume):: decimal(30,4)) total_transaction_amt\nfrom\ndt_hist_trades a\njoin\nfds_adx_fundamentals_db.ref_v2.factset_sector_map b\non(a.sector_code = b.factset_sector_code)\ngroup by 1,2,3<\/code><\/pre>\n<\/p><\/div>\n
Reporting<\/h2>\n
Historical quotes analysis<\/h3>\n
Weighted average spread by sectors<\/h4>\n
<\/p>\n
Weighted average spread by exchanges<\/h4>\n
<\/p>\n
Historical trades analysis<\/h3>\n
Trade volumes by sector<\/h4>\n
<\/p>\n
Trade volumes for major exchanges<\/h4>\n
<\/p>\n
Real-time dashboards<\/h3>\n
<\/p>\n
<\/p>\n
Clean up<\/h2>\n
Conclusion<\/h2>\n
\nAbout the Authors<\/h3>\n
<\/strong>Satesh Sonti<\/strong> is a Sr. Analytics Specialist Solutions Architect based out of Atlanta, specialized in building enterprise data platforms, data warehousing, and analytics solutions. He has over 18 years of experience in building data assets and leading complex data platform programs for banking and insurance clients across the globe.<\/p>\n
Alket Memushaj<\/strong> works as a Principal Architect in the Financial Services Market Development team at AWS. Alket is responsible for technical strategy for capital markets, working with partners and customers to deploy applications across the trade lifecycle to the AWS Cloud, including market connectivity, trading systems, and pre- and post-trade analytics and research platforms.<\/p>\n
Ruben Falk<\/strong> is a Capital Markets Specialist focused on AI and data & analytics. Ruben consults with capital markets participants on modern data architecture and systematic investment processes. He joined AWS from S&P Global Market Intelligence where he was Global Head of Investment Management Solutions.<\/p>\n
Jeff Wilson<\/strong> is a World-wide Go-to-market Specialist with 15 years of experience working with analytic platforms. His current focus is sharing the benefits of using Amazon Redshift, Amazon\u2019s native cloud data warehouse. Jeff is based in Florida and has been with AWS since 2019.<\/p>\n
\n