In-Memory - Avnet Technology Solutions
Transkript
In-Memory - Avnet Technology Solutions
Netradiční řešení na POWER Technologii [email protected] Agenda • • • • In-Memory In-Memory koncept, jeho přínosy a souvislosti Koncept SAP HANA Proč IBM Power pro In-Memory řešení • BigData • HW infrastruktura pro BigData/Hadoop • Platforma pro NoSQL Database © 2012 IBM Corporation 2 Co je In-Memory Databáze Definice: An in-memory database is a database whose data is stored in main memory to facilitate faster response times. Dále obvykle přináší podporu sloupcových tabulek, komprese… Příklady: Cognos TM1, SAP HANA, IBM DB2 BLU, Informix, Oracle Výhoda: Rychlost, bez agregátů, indexů, kombinace OLTP/OLAP Nevýhoda: Data v RAM, limitace velikosti, persistence © 2012 IBM Corporation 3 RYCHLOST: RAM, SSD/Flash a Spining disky Non-Persistent, Volatile Processors Memory Very, very, very, very, very fast Very, very, very fast < 10’s ns ~100 ns Persistent, Non-Volatile Disk SSD Fast ~200,000 ns Very, very slow comparatively 1,000,000 8,000,000 ns In-Memory DBPřístupová rychlost Klasická DB ~1 second ~33 minutes ~ 12.5 hours Srozumitelnější pro člověka… © 2012 IBM Corporation 4 In-Memory Technologie HW Technology Innovations Multi-Core/Thread Architecture SW Technology Innovations Row and Column Store Large Scale Memory Massive parallel scaling with many compute nodes Compression Partitioning 64bit address space 100GB/s data throughput Dramatic decline in price/performance No Aggregate Tables Insert Only on Delta © 2012 IBM Corporation 5 Agenda • Proč IBM POWER pro In-Memory? © 2012 IBM Corporation 6 Run Fast: IBM Power – designed for In-Memory 4X 4X 5X threads per core vs Intel (up to 1536 threads per system) memory bandwidth vs Intel (up to 16TB of memory) more cache vs Intel (up to 224MB cache per socket) Processors Memory Cache flexible, fast execution of analytics algorithms large, fast workspace to maximize business insight ensure continuous data load for fast responses 7 Run Fast: POWER8 faster memory bandwidth is ideal for in-memory applications POWER8 4X POWER7+ memory bandwidth vs Intel (up to 16TB of memory) Memory large, fast workspace to maximize business insight IvyBridge EX IvyBridge EP Min / Max SandyBridge EP 0 100 200 300 400 Maximum Memory Bandwidth GB/s Source: IBM CPO 8 Memory Bandwidth Power versus x86 Ivy Bridge EP E5-26xx v2 Haswell EP E5-26xx v3 Ivy Bridge EX E7-88xx v2 Haswell EX*1 E7-88xx v3 POWER 7+ POWER8 1.7-3.7GHz 1.7-3.7GHz 1.9-3.4 GHz 2-3.2 GHz 3.1-4.4 GHz 3.0-4.15 GHz 1, 2*2 1, 2*2 1, 2*2 1, 2*2 1, 2, 4 1, 2, 4, 8 Cores per socket 12 18 15 18 8 12 Max Threads / sock 24 36 30 36 32 96 Max L1 Cache 32KB*3 32KB*3 32KB*3 32KB*3? 32KB 64KB Max L2 Cache 256 KB 256 KB 256 KB 256 KB 256 KB 512 KB Max L3 Cache 30 MB 45 MB 37.5 MB 40-45 MB 80 MB 96 MB Max L4 Cache 0 0 0 0 0 128 MB 42.6-59.7 GB/s 51.2-68.3 GB/s 68-85*4 GB/s ≈81-97? GB/s 100 – 180 GB/sec 230 - 410 GB/sec Clock rates SMT options Memory Bandwidth (per Socket) SAPS (BS7UC) / Core (Fastes for HANA available Processor) 2727 3317 2259 2575? 3650 5260 E5-2667v2 Hex Core 3.3GHz E5-2667v3, 3.2 GHz Xeon E7-8890v2 15 Core 2.8GHz Xeon E7-8890v3 18 Core 2,5GHz Power7+, 4.42GHz Power8, 4.15 GHz *1 Haswell EX will be announced in Q2/15 all values with “?” are assumptions based on existing Haswell EP Processors *2 Intel calls this Hyper-Threading Technology (No HT, with HT) *3 32 KB running in “Non-RAS mode” Only 16 KB in RAS mode *4 85 GB/s running in “Non-RAS mode” = dual device error NOT supported 9 RAS 1/2: Checkers & Fault Isolation Registers The Mechanism to Ensure First Failure Data Correction Other RAS, Hot-Plug PCI 10 RAS 2/2: POWER8 Memory DRAM Chips Memory Buffer 10 chips per rank for double chipkill DDR Interfaces Intelligence Moved into Memory • Scheduling logic, caching structures • Energy Mgmt, RAS decision point – Formerly on Processor – Moved to Memory Buffer Processor Interface • 9.6 GB/s high speed interface • More robust RAS •“ On-the-fly” lane isolation/repair Performance Value • End-to-end fastpath and data retry (latency) • Cache latency/bandwidth, partial updates • Cache write scheduling, prefetch, energy 16MB Scheduler & Memory POWER8 Management Cache Link PowerVM Virtualizace: Maximum Sustained Utilization Better PowerVM: LPM mobilita, Dynamičnost, Security (EAL4+) Source: Comparative VM Load Analysis; Query Report 2012A67; Solitaire Interglobal, LTD. © 2012 IBM Corporation Agenda • SAP HANA a IBM POWER © 2012 IBM Corporation 13 © 2012 IBM Corporation 14 SAP HANA Deployment Options / Different Use Cases 1/2 Reporting SAP Business Suite Computing Engine InMemory Non SAP Data Source SAP HANA e.g. Demographics & Google Maps Data Modeling SAP Business Suite Computing Engine Rea d InMemory Traditional DB ETL SAP UI Data Modeling Data Modeling Computing Engine BI Clients BICS SQL BIC S MD X SQ L 3rd Party BI Clients 3rd Party Accelerators MD X Technology Platform In-Memory Traditional DB Replication SAP HANA e.g. SAP ERP: Operational reporting Replication SAP HANA e.g. SAP ERP: CO-PA © 2012 IBM Corporation SAP HANA Deployment Options / Different Use Cases 2/2 Products on In-Memory DB SAP Business Suite Products on In-Memory DB BW ERP, SCM, CRM Computing Engine Computing Engine Traditional Extractors In-Memory In-Memory Traditional DB SAP HANA e.g. SAP BW -7.3 SP8- on HANA SAP HANA e.g. SAP ERP 6.0, EHP7, CRM 7.0, EHP3, SCM 7.0 EHP3, SAP SRM 7.0 EHP3, Version for HANA © 2012 IBM Corporation SAP Vision on HANA From: • One DB per application • Point-to-point integration • Long running queries, e.g. in batch mode To: • One DB per landscape • No integration necessary • Real time execution Critical component = Enterprise class RAS Infrastructure © 2012 IBM Corporation Is my data safe? The SAP HANA database holds the bulk of its data in memory for maximum performance but still uses persistent storage to provide a fallback in case of failure. The log is capturing all changes by database transactions Data and log are automatically saved to disk at regular savepoints, the log is also saved to disk after each COMMIT of a database transaction After a power failure, the database can be restarted like a disk-based database: System is normally restarted („lazy“ reloading of tables to keep the restart time short) System returns to its last consistent state (by replaying the log since the last savepoint) Time Data savepoint to persistent storage Log written to persistent storage (committed transactions) Power failure © 2012 IBM Corporation 18 Appliance versus TDI (Tailored Datacenter Integration) Fast Implementation • • • Solution validation by SAP & partner Preconfigured hardware set-up Preinstalled software More Flexibility • • Installation to be done by customer partner Aligns with the hardware partner on support mode © 2012 IBM Corporation 19 Storage: HWCCT – Hardware configuration check tool Determine if system meets KPI requirements • Landscape test • OS config validity • Consistency of landscape based on reference architecture File system throughput/latency Network throughput for multinode configurations • 9.5 GBits for single stream • 9.0 GBits for duplex stream • • Source: SAP SE Další témata: • • • Sizing (Memory based) Replikace, DR, HA Zálohování © 2012 IBM Corporation 20 HANA on POWER Power Landscape example Applications PowerVM ® SAP HANA HANA VIOS BW HA App DB/App BW QA DB/App ITM CRM QA DB/App TSM BW App App VIOS CRM SUSE HA Ext, RH HA Plugin, Symantec HA, Tivoli SA (GPFS), XFS HANA ECC App High Availability File System ECC PowerVC Data OS Linux Enterprise Server Priority Support for SAP applications Any POWER7+ or POWER8 Standalone or shared/PVM virtualized HANA ECC HA HANA ECC App App ECC QA HANA ECC HA QA HANA ECC, BW, SM App Sbx Customer choice BW HA QA App VIOS Storage Hardware BW VIOS Server Hardware PowerVM HANA © 2012 IBM Corporation 21 SAP HANA on POWER Ramp-Up Customer Test and period General availability Evaluation phase November 2014 – February 2015 Release to Customer Ramp-Up Start March 2015 Announcement May 8th of GA date May 2015 POWER Solutions for HANA - TDI Předpřipravené konfigurace ke snadné modifikaci v rámci TDI K objednání od června © 2012 IBM Corporation 22 Proč IBM POWER pro In-Memory DB a HANA? -Memory propustnust - a další parametry jako max RAM 16 TB, - max thread 1500, - L4 cache -RAS a bezodstávkovost - RAM dual chip kill, extra redundance POWER7+ High End a všechny POWER8 - PCI, ostatní komonenty hot plug -Flexibilita - Diky virtualizaci - Ideální pro HA/DR scénáře © 2012 IBM Corporation 23 BIG DATA © 2012 IBM Corporation 24 Co jsou BIG DATA Gartner za big data označuje data, jejichž velikost je mimo schopnosti zachycovat, spravovat a zpracovávat běžně používanými prostředky v rozumném čase. Datové sklady – většinou strukturovaná data získáná z různých zdrojů a informačních systémů pomocí ETL. Nad daty jsou prováděny analýzy. TB dat. Big Data – PB dat. Nejde o prosté zvětšení nároků na zpracování většího objemu dat. Jde i o jejich další charakteristiky: 3V (případně 4V). •volume (objem) - objem dat roste exponenciálně. •velocity (rychlost) - okamžité zpracování velkého objemu průběžně vznikajících dat. Příkladem může být zpracování dat produkovaných kamerou. •variety (různorodost, variabilnost) - kromě obvyklých strukturovaných dat jde o zpracování nestrukturovaných textů, ale i různých typů multimediálních dat. •veracity (věrohodnost) - nejistá věrohodnost dat v důsledku jejich nekonzistence, neúplnosti, nejasnosti a podobně. Příkladem jsou data na sociálních sítích. © 2012 IBM Corporation 25 IBM Solution for Hadoop – Power Systems Edition Key requirements & design parameters – focused on customer value Best-in-class hardware Dense storage subsystem Advanced software capabilities Better reliability & management Best in class file system Automated cluster provisioning IBM Platform Symphony IBM InfoSphere BigInsights or Open-source Hadoop IBM Platform Symphony IBM Platform Cluster Manager Distributed File System IBM Elastic Storage, HDFS Linux Operating Environment RHEL SUSE IBM Power Systems IBM Power 7+, Power8 26 Balanced Configuration – A POD has 2 Data Nodes and 1 DCS3700 DCS3700 uses Out-of-Band management, OS management shared with data network, dedicated service network System Management Node PowerLinux 6 x 4.2GHz cores 32GB RAM, up to 512GB 2 x 300GB 2.5” HDD (OS) Dual-port 10GbE (data + mgmt) Hadoop Management Node PowerLinux 16 x 4.2GHz cores 128GB RAM, up to 512GB 6 x 600GB 2.5” HDD (OS) Dual-port 10GbE (data + mgmt) Data Node PowerLinux 16 x 4.2GHz cores 128GB RAM, up to 512GB 2 x 300GB 2.5” HDD (OS) Dual-port 10GbE (data + mgmt) Configuration First Rack Additional Data Rack Pair Available Storage 720 TB 2400 TB Raw Data 180 TB 600 TB Data Nodes (links) 6 DN (12x 10Gb / 6x 1Gb) 20 DN (40x 10Gb / 20x 1Gb) Hadoop Management or Edge Nodes (links) 3 MN (6x 10Gb / 3x 1Gb) N/A 1 SMN (2x 10Gb / 2x 1Gb) N/A HMC (2x 1Gb) N/A 3 DCS (6x 1Gb) 10 DCS (20x 1Gb) 22x 10Gb, 19x 1Gb 40x 10Gb, 40x 1Gb 1x G8264 + 1x G8052 1x G8264 + 1x G8052 System Management Node, HMC (links) HMC (links) DCS3700 (links) Total 10Gb links, 1Gb links Network Switches Taking a building block approach to Big Data infrastructure Hardware Management node Hardware management & monitoring tools Deployment node Platform Cluster Manager for automated OS and software provisioning from bare metal Management nodes HDFS and MapReduce services, Symphony services, GPFS quorum services, Job Trackers Edge node Interface to public network – parallel ingest and output of data, ETL processing, Sqoop, DataStage Data nodes Nodes to run task trackers and other applications – may be compute or storage rich 28 Standard Configurations – POD – based design Power Big Data clusters are built using a simple building block approach to tailor the mix of CPU and storage to application requirements. Compute Dense Compute Dense Data POD Balanced Data POD Storage Dense Storage Dense Data POD One server per storage subsystem Two servers per storage subsystem Four servers per storage subsystem 29 Understanding IBM Elastic Storage Hadoop MapReduce applications Native OS Applications (POSIX) Posix file system supports Hadoop and non-Hadoop apps Supports both large and small block I/O optimally Distributed metadata, no single point of failure Switched Fabric GPFS Server HDFS Namenode 30 Secondary Namenode GPFS Server Shared Elastic Storage Deploy in distributed or shared disk topology Avoid data bottlenecks Avoid costly n-way data replication Fast, scalable network design Data Network Management Network Top of rack 10 GbE for data network (bonded links) Top of rack 1 GbE switch for management / service network • • In-band OS administration Out of bad FSP HW mgmt 40 GbE switch interconnecting racks with link aggregation IBM Solution for Hadoop – Power Systems Edition Diverse HPC & Big Data application frameworks share a common resource manager Serial Batch MPI Parallel Distributed Workflow On-line SOA Hadoop MapReduce OpenStack Cloud Yarn Applications Platform Computing Resource Manager HDFS, IBM Elastic Storage Cluster Management With sophisticated multitenancy, customers can share a broader set of application types and scheduling patterns on a common resource foundation 32 IBM POWER Solution for Hadoop Úspora kapacity (2:1 versus 3:1) Odpadá limitace Name Node Menší počet uzlů networking licence facility jednodušší deployment Rychlost nasazení (Platform) 33 Otázky? [email protected] 34 Power Systems RAS vs x86 RAS Feature Application/Partition RAS Live Partition Mobility Live Application Mobility Partition Availability priority POWER x86 Yes Yes Yes Yes Yes, support issues No System RAS OS independent First Failure Data Capture Memory Keys (including OS exploitation) Processor RAS Processor Instruction Retry Alternate Processor Recovery Dynamic Processor Deallocation Dynamic Processor Sparing Memory RAS Chipkill™ Survives Double Memory Failures Selective Memory Mirroring Redundant Memory I/O RAS Extended Error Handling I/O Adapter Isolation (PI-Bus and TCEs) See the following URLs for addition details: http://www-03.ibm.com/systems/migratetoibm/systems/power/availability.html http://www-03.ibm.com/systems/migratetoibm/systems/power/virtualization.html Yes Yes EX – MCA Recovery Yes Yes Yes Yes No No No No Yes Yes Yes Yes Yes, some vendors Yes, optional No Yes Yes Yes No No No OpenPOWER Foundation System/Software/Services I/O, Storage, Acceleration Boards/Systems Chip/SOC COMPUTE MEMORY IO NETWORK STORAGE Differentiating DB2 database on Power/Linux using GPU acceleration Cost optimized “in memory like” noSQL infrastructure with CAPI + IBM FlashSystem Distributed store accelerated by RDMA Next-gen big data architecture leveraging GPFS/GSS, Platform, & FPGA based compression acceleration 36