In-Memory - Avnet Technology Solutions

Transkript

In-Memory - Avnet Technology Solutions
Netradiční řešení na POWER
Technologii
[email protected]
Agenda
•
•
•
•
In-Memory
In-Memory koncept, jeho přínosy a souvislosti
Koncept SAP HANA
Proč IBM Power pro In-Memory řešení
• BigData
• HW infrastruktura pro BigData/Hadoop
• Platforma pro NoSQL Database
© 2012 IBM Corporation
2
Co je In-Memory Databáze
Definice: An in-memory database is a database whose data is stored in
main memory to facilitate faster response times.
Dále obvykle přináší podporu sloupcových tabulek, komprese…
Příklady: Cognos TM1, SAP HANA, IBM DB2 BLU, Informix, Oracle
Výhoda: Rychlost, bez agregátů, indexů, kombinace OLTP/OLAP
Nevýhoda: Data v RAM, limitace velikosti, persistence
© 2012 IBM Corporation
3
RYCHLOST: RAM, SSD/Flash a Spining disky
Non-Persistent, Volatile
Processors
Memory
Very, very,
very, very,
very fast
Very, very,
very fast
< 10’s ns
~100 ns
Persistent, Non-Volatile
Disk
SSD
Fast
~200,000 ns
Very, very slow
comparatively
1,000,000 8,000,000 ns
In-Memory DBPřístupová rychlost Klasická DB
~1 second
~33 minutes
~ 12.5 hours
Srozumitelnější pro člověka…
© 2012 IBM Corporation
4
In-Memory Technologie
HW Technology Innovations
Multi-Core/Thread Architecture
SW Technology Innovations
Row and Column Store
Large Scale Memory
Massive parallel scaling with many
compute nodes
Compression
Partitioning
64bit address space
100GB/s data throughput
Dramatic decline in
price/performance
No Aggregate Tables
Insert Only on Delta
© 2012 IBM Corporation
5
Agenda
• Proč IBM POWER pro In-Memory?
© 2012 IBM Corporation
6
Run Fast: IBM Power – designed for In-Memory
4X
4X
5X
threads per core vs Intel
(up to 1536 threads per system)
memory bandwidth vs Intel
(up to 16TB of memory)
more cache vs Intel
(up to 224MB cache per socket)
Processors
Memory
Cache
flexible, fast execution
of analytics algorithms
large, fast workspace to
maximize business insight
ensure continuous data load
for fast responses
7
Run Fast: POWER8 faster memory
bandwidth is ideal for in-memory applications
POWER8
4X
POWER7+
memory bandwidth vs Intel
(up to 16TB of memory)
Memory
large, fast workspace
to maximize business
insight
IvyBridge EX
IvyBridge EP
Min / Max
SandyBridge EP
0
100
200
300
400
Maximum Memory Bandwidth GB/s
Source: IBM CPO
8
Memory Bandwidth Power versus x86
Ivy Bridge EP
E5-26xx v2
Haswell EP
E5-26xx v3
Ivy Bridge EX
E7-88xx v2
Haswell EX*1
E7-88xx v3
POWER 7+
POWER8
1.7-3.7GHz
1.7-3.7GHz
1.9-3.4 GHz
2-3.2 GHz
3.1-4.4 GHz
3.0-4.15 GHz
1, 2*2
1, 2*2
1, 2*2
1, 2*2
1, 2, 4
1, 2, 4, 8
Cores per socket
12
18
15
18
8
12
Max Threads / sock
24
36
30
36
32
96
Max L1 Cache
32KB*3
32KB*3
32KB*3
32KB*3?
32KB
64KB
Max L2 Cache
256 KB
256 KB
256 KB
256 KB
256 KB
512 KB
Max L3 Cache
30 MB
45 MB
37.5 MB
40-45 MB
80 MB
96 MB
Max L4 Cache
0
0
0
0
0
128 MB
42.6-59.7
GB/s
51.2-68.3
GB/s
68-85*4
GB/s
≈81-97?
GB/s
100 – 180
GB/sec
230 - 410
GB/sec
Clock rates
SMT options
Memory Bandwidth
(per Socket)
SAPS (BS7UC) / Core
(Fastes for HANA
available Processor)
2727
3317
2259
2575?
3650
5260
E5-2667v2 Hex
Core 3.3GHz
E5-2667v3,
3.2 GHz
Xeon E7-8890v2
15 Core 2.8GHz
Xeon E7-8890v3
18 Core 2,5GHz
Power7+, 4.42GHz
Power8, 4.15 GHz
*1 Haswell EX will be announced in Q2/15 all values with “?” are assumptions based on existing Haswell EP Processors
*2 Intel calls this Hyper-Threading Technology (No HT, with HT)
*3 32 KB running in “Non-RAS mode” Only 16 KB in RAS mode
*4 85 GB/s running in “Non-RAS mode” = dual device error NOT supported
9
RAS 1/2: Checkers & Fault Isolation Registers
The Mechanism to Ensure First Failure Data Correction
Other RAS, Hot-Plug PCI
10
RAS 2/2: POWER8 Memory
DRAM
Chips
Memory
Buffer
10 chips per rank for
double chipkill
DDR Interfaces
Intelligence Moved into Memory
• Scheduling logic, caching structures
• Energy Mgmt, RAS decision point
– Formerly on Processor
– Moved to Memory Buffer
Processor Interface
• 9.6 GB/s high speed interface
• More robust RAS
•“ On-the-fly” lane isolation/repair
Performance Value
• End-to-end fastpath and data retry (latency)
• Cache  latency/bandwidth, partial updates
• Cache  write scheduling, prefetch, energy
16MB
Scheduler & Memory POWER8
Management Cache Link
PowerVM Virtualizace: Maximum Sustained Utilization
Better
PowerVM: LPM mobilita, Dynamičnost, Security (EAL4+)
Source: Comparative VM Load Analysis; Query Report 2012A67; Solitaire Interglobal, LTD.
© 2012 IBM Corporation
Agenda
• SAP HANA a IBM POWER
© 2012 IBM Corporation
13
© 2012 IBM Corporation
14
SAP HANA Deployment Options / Different Use Cases 1/2
Reporting
SAP
Business
Suite
Computing Engine
InMemory
Non SAP
Data Source
SAP HANA
e.g. Demographics
& Google Maps
Data Modeling
SAP
Business
Suite
Computing Engine
Rea
d
InMemory
Traditional DB
ETL
SAP UI
Data Modeling
Data Modeling
Computing Engine
BI Clients
BICS
SQL
BIC
S
MD
X
SQ
L
3rd Party
BI Clients
3rd Party
Accelerators
MD
X
Technology
Platform
In-Memory
Traditional DB
Replication
SAP HANA
e.g. SAP ERP:
Operational reporting
Replication
SAP HANA
e.g. SAP ERP: CO-PA
© 2012 IBM Corporation
SAP HANA Deployment Options / Different Use Cases 2/2
Products on
In-Memory DB
SAP
Business
Suite
Products on
In-Memory DB
BW
ERP, SCM, CRM
Computing Engine
Computing Engine
Traditional
Extractors
In-Memory
In-Memory
Traditional DB
SAP HANA
e.g. SAP BW -7.3 SP8- on HANA
SAP HANA
e.g. SAP ERP 6.0, EHP7, CRM 7.0, EHP3,
SCM 7.0 EHP3, SAP SRM 7.0 EHP3, Version for HANA
© 2012 IBM Corporation
SAP Vision on HANA
From:
• One DB per application
• Point-to-point integration
• Long running queries, e.g.
in batch mode
To:
• One DB per landscape
• No integration necessary
• Real time execution
Critical component = Enterprise class RAS Infrastructure
© 2012 IBM Corporation
Is my data safe?
The SAP HANA database holds the bulk of its data in memory for maximum performance
but
still uses persistent storage to provide a fallback in case of failure.





The log is capturing all changes by database transactions
Data and log are automatically saved to disk at regular savepoints, the log is also saved
to disk after each COMMIT of a database transaction
After a power failure, the database can be restarted like a disk-based database:
System is normally restarted („lazy“ reloading of tables to keep the restart time short)
System returns to its last consistent state (by replaying the log since the last savepoint)
Time
Data savepoint
to persistent
storage
Log written
to persistent storage
(committed transactions)
Power
failure
© 2012 IBM Corporation
18
Appliance versus TDI (Tailored Datacenter Integration)
Fast Implementation
•
•
•
Solution validation by SAP & partner
Preconfigured hardware set-up
Preinstalled software
More Flexibility
•
•
Installation to be done by customer partner
Aligns with the hardware partner on support
mode
© 2012 IBM Corporation
19
Storage: HWCCT – Hardware configuration check tool
Determine if system meets KPI
requirements
•
Landscape test
• OS config validity
• Consistency of landscape based
on reference architecture
File system throughput/latency
Network throughput for multinode
configurations
• 9.5 GBits for single stream
• 9.0 GBits for duplex stream
•
•
Source: SAP SE
Další témata:
•
•
•
Sizing (Memory based)
Replikace, DR, HA
Zálohování
© 2012 IBM Corporation
20
HANA on POWER
Power Landscape example
Applications
PowerVM
®
SAP HANA
HANA
VIOS
BW HA
App
DB/App
BW QA
DB/App
ITM
CRM QA
DB/App
TSM
BW App
App
VIOS
CRM
SUSE HA Ext, RH HA Plugin,
Symantec HA, Tivoli SA
(GPFS), XFS
HANA
ECC App
High Availability
File System
ECC
PowerVC
Data
OS
Linux Enterprise Server
Priority Support for SAP applications
Any POWER7+ or POWER8
Standalone or shared/PVM virtualized
HANA
ECC HA
HANA
ECC App
App
ECC QA
HANA
ECC HA QA
HANA
ECC, BW, SM App Sbx
Customer choice
BW HA QA
App
VIOS
Storage Hardware
BW
VIOS
Server Hardware
PowerVM
HANA
© 2012 IBM Corporation
21
SAP HANA on POWER
Ramp-Up
Customer Test and period
General
availability
Evaluation phase
November 2014 – February 2015
Release to Customer
Ramp-Up Start
March 2015
Announcement May 8th
of GA date
May 2015
POWER Solutions for HANA
-
TDI
Předpřipravené konfigurace ke snadné modifikaci v rámci TDI
K objednání od června
© 2012 IBM Corporation
22
Proč IBM POWER pro In-Memory DB a HANA?
-Memory propustnust
- a další parametry jako max RAM 16 TB,
- max thread 1500,
- L4 cache
-RAS a bezodstávkovost
- RAM dual chip kill, extra redundance POWER7+ High End a všechny POWER8
- PCI, ostatní komonenty hot plug
-Flexibilita
- Diky virtualizaci
- Ideální pro HA/DR scénáře
© 2012 IBM Corporation
23
BIG DATA
© 2012 IBM Corporation
24
Co jsou BIG DATA
Gartner za big data označuje data, jejichž velikost je mimo schopnosti zachycovat,
spravovat a zpracovávat běžně používanými prostředky v rozumném čase.
Datové sklady – většinou strukturovaná data získáná z různých zdrojů a
informačních systémů pomocí ETL. Nad daty jsou prováděny analýzy. TB dat.
Big Data – PB dat. Nejde o prosté zvětšení nároků na zpracování většího objemu
dat. Jde i o jejich další charakteristiky: 3V (případně 4V).
•volume (objem) - objem dat roste exponenciálně.
•velocity (rychlost) - okamžité zpracování velkého objemu průběžně vznikajících
dat. Příkladem může být zpracování dat produkovaných kamerou.
•variety (různorodost, variabilnost) - kromě obvyklých strukturovaných dat jde o
zpracování nestrukturovaných textů, ale i různých typů multimediálních dat.
•veracity (věrohodnost) - nejistá věrohodnost dat v důsledku jejich nekonzistence,
neúplnosti, nejasnosti a podobně. Příkladem jsou data na sociálních sítích.
© 2012 IBM Corporation
25
IBM Solution for Hadoop – Power Systems Edition
Key requirements & design parameters – focused on customer value







Best-in-class hardware
Dense storage subsystem
Advanced software capabilities
Better reliability & management
Best in class file system
Automated cluster provisioning
IBM Platform Symphony
IBM InfoSphere BigInsights
or Open-source Hadoop
IBM Platform Symphony
IBM Platform Cluster Manager
Distributed File System
IBM Elastic Storage, HDFS
Linux Operating Environment
RHEL SUSE
IBM Power Systems
IBM Power 7+, Power8
26
Balanced Configuration – A POD has 2 Data Nodes and 1 DCS3700
DCS3700 uses Out-of-Band management, OS management shared with data network, dedicated service network
System Management Node
PowerLinux
6 x 4.2GHz cores
32GB RAM, up to 512GB
2 x 300GB 2.5” HDD (OS)
Dual-port 10GbE (data + mgmt)
Hadoop Management Node
PowerLinux
16 x 4.2GHz cores
128GB RAM, up to 512GB
6 x 600GB 2.5” HDD (OS)
Dual-port 10GbE (data + mgmt)
Data Node
PowerLinux
16 x 4.2GHz cores
128GB RAM, up to 512GB
2 x 300GB 2.5” HDD (OS)
Dual-port 10GbE (data + mgmt)
Configuration
First Rack
Additional Data Rack Pair
Available Storage
720 TB
2400 TB
Raw Data
180 TB
600 TB
Data Nodes (links)
6 DN (12x 10Gb / 6x 1Gb)
20 DN (40x 10Gb / 20x 1Gb)
Hadoop Management or Edge Nodes (links)
3 MN (6x 10Gb / 3x 1Gb)
N/A
1 SMN (2x 10Gb / 2x 1Gb)
N/A
HMC (2x 1Gb)
N/A
3 DCS (6x 1Gb)
10 DCS (20x 1Gb)
22x 10Gb, 19x 1Gb
40x 10Gb, 40x 1Gb
1x G8264 + 1x G8052
1x G8264 + 1x G8052
System Management Node, HMC (links)
HMC (links)
DCS3700 (links)
Total 10Gb links, 1Gb links
Network Switches
Taking a building block approach to Big Data infrastructure
 Hardware Management node
 Hardware management & monitoring tools
 Deployment node
 Platform Cluster Manager for automated OS and
software provisioning from bare metal
 Management nodes
 HDFS and MapReduce services, Symphony
services, GPFS quorum services, Job Trackers
 Edge node
 Interface to public network – parallel ingest and
output of data, ETL processing, Sqoop, DataStage
 Data nodes
 Nodes to run task trackers and other applications –
may be compute or storage rich
28
Standard Configurations – POD – based design
Power Big Data clusters are built using a simple
building block approach to tailor the mix of CPU
and storage to application requirements.
Compute
Dense
Compute Dense Data POD
Balanced Data POD
Storage
Dense
Storage Dense Data POD
One server per
storage subsystem
Two servers per
storage subsystem
Four servers per
storage subsystem
29
Understanding IBM Elastic Storage
Hadoop MapReduce
applications
Native OS
Applications (POSIX)
 Posix file system
supports Hadoop and
non-Hadoop apps
 Supports both large and
small block I/O optimally
 Distributed metadata, no
single point of failure
Switched Fabric
GPFS Server
HDFS
Namenode
30
Secondary
Namenode
GPFS Server
Shared Elastic Storage
 Deploy in distributed or
shared disk topology
 Avoid data bottlenecks
 Avoid costly n-way
data replication
Fast, scalable network design
 Data Network
 Management Network
 Top of rack 10 GbE for data
network (bonded links)
 Top of rack 1 GbE switch for
management / service network
•
•
In-band OS administration
Out of bad FSP HW mgmt
 40 GbE switch interconnecting
racks with link aggregation
IBM Solution for Hadoop – Power Systems Edition
Diverse HPC & Big Data application frameworks share a common resource manager
Serial
Batch
MPI
Parallel
Distributed
Workflow
On-line
SOA
Hadoop
MapReduce
OpenStack
Cloud
Yarn
Applications
Platform Computing Resource Manager
HDFS, IBM Elastic Storage
Cluster Management
With sophisticated multitenancy, customers can share a broader set of application
types and scheduling patterns on a common resource foundation
32
IBM POWER Solution for Hadoop
 Úspora kapacity (2:1 versus 3:1)
 Odpadá limitace Name Node
 Menší počet uzlů
 networking
 licence
 facility
 jednodušší deployment
 Rychlost nasazení (Platform)
33
Otázky?
[email protected]
34
Power Systems RAS vs x86
RAS Feature
Application/Partition RAS
Live Partition Mobility
Live Application Mobility
Partition Availability priority
POWER
x86
Yes
Yes
Yes
Yes
Yes,
support issues
No
System RAS
OS independent First Failure Data Capture
Memory Keys (including OS exploitation)
Processor RAS
Processor Instruction Retry
Alternate Processor Recovery
Dynamic Processor Deallocation
Dynamic Processor Sparing
Memory RAS
Chipkill™
Survives Double Memory Failures
Selective Memory Mirroring
Redundant Memory
I/O RAS
Extended Error Handling
I/O Adapter Isolation (PI-Bus and TCEs)
See the following URLs for addition details:
http://www-03.ibm.com/systems/migratetoibm/systems/power/availability.html
http://www-03.ibm.com/systems/migratetoibm/systems/power/virtualization.html
Yes
Yes
EX – MCA Recovery
Yes
Yes
Yes
Yes
No
No
No
No
Yes
Yes
Yes
Yes
Yes, some vendors
Yes, optional
No
Yes
Yes
Yes
No
No
No
OpenPOWER Foundation
System/Software/Services
I/O, Storage, Acceleration
Boards/Systems
Chip/SOC
COMPUTE
MEMORY
IO NETWORK
STORAGE
Differentiating DB2 database on
Power/Linux using GPU acceleration
Cost optimized “in memory like”
noSQL infrastructure with
CAPI + IBM FlashSystem
Distributed store accelerated by
RDMA
Next-gen big data architecture
leveraging GPFS/GSS, Platform, &
FPGA based
compression acceleration
36