就像當年波士頓的愛國者為反抗英國重稅的行動一樣, NoSQL 的支持者們從各地湧來,分享他們如何推翻緩慢而昂貴的關聯式資料庫的暴政,怎樣使用更有效和更便宜的方法來管理資料,他們開始對 SQL 說不! NoSQL 組織上個月在三藩市的行動讓人聞到了資料庫革命的味道,好像一個現代 IT 業版的波士頓傾茶事件正在策劃之中。
新聞來源: IT168
對 SQL 說不!
NoSQL 的資料庫技術革命
新 聞來源: IT168
就像當年波士頓的愛國者為反抗英國重稅的行動一樣,
NoSQL 的支持者們從各地湧來,分享他們如何推翻緩慢而昂貴的關聯式資料庫的暴政,怎樣使用更有效和更便宜的方法來管理資料,他們開始對 SQL 說不!
NoSQL 組織上個月在三藩市的行動讓人聞到了資料庫革命的味道,好像一個現代 IT 業版的波士頓傾茶事件正在策劃之中。
在
NoSQL 運動的最新一次聚會中,來自世界各地的 150 人擠滿了 CBS Interactive 的一間會議室。
就像當年波士頓的愛國者為反抗英國重稅的行動一樣,
NoSQL 的支持者們從各地湧來,分享他們如何推翻緩慢而昂貴的關聯式資料庫的暴政,怎樣使用更有效和更便宜的方法來管理資料。雲計算可能會為資料庫技術的發展帶來 新的契機,如果想在雲中獲得豐富而隨需應變的可伸縮性,你需要一個非關聯式資料庫。
「關聯式資料庫給你強加了太多東西。它們要你強行修改 物件資料,以滿足 RDBMS (Relational Database Management System,關聯式資料庫管理系統)的需要,」Java 工具提供商 SpringSource 公司的首席工程師 Jon Travis 說,他是本次集會的 10 位發言人之一,在他看來,基於
NoSQL 的替代方案「只是給你所需要的」。
開源的興起
反 SQL 運動的主要宣導者都是 Web 和 Java 開發者,他們中許多人都在創業的初期歷經了資金短缺並因此與 Oracle 說再見,然後效仿 Google 和 Amazon 的道路建設起自己的資料存儲解決方案,並隨後將自己的成果開源發佈。現在,他們的開來源資料商店管理著成百 TB 甚至 PB 的資料,由於 Web 2.0 和雲計算的興起,無論從技術上還是從經濟上他們都無需再返回從前,甚至連想也不用想。
「Web 2.0 的企業應該抓住機會,他們需要可擴展性,」總部設在倫敦的
NoSQL 會議組織者 Johan Oskarsson 說,他任職於著名的音樂網站 Last.fm ,其他的大多數與會者也都是網路開發者。
Oskarsson 說,許多人甚至拋棄了 MySQL 資料庫這個長期以來 Web 2.0 的寵兒,而改由
NoSQL 的方案來替代,因為優勢實在是引人注目。 51CTO.com 之前曾報導過 MySQL 創始人宣佈創建開放源碼資料庫聯盟的消息,過度的商業化是 MySQL 失去原來的優勢。
例如 Facebook 建立了自己的 Cassandra 資料商店並且在其網站上重點推出一項新的搜索功能,沒有使用到現有的 MySQL 資料庫。據 Facebook 的工程師 Avinash Lakshma 介紹, Cassandra 僅用 0.12 毫秒就可以寫入 50GB 的資料,比 MySQL 快了超過 2500 倍。 Google 也開始公測他們的雲資料庫 Fusion Tables ,這是一個和傳統資料庫完全不同的資料庫,主要優勢能夠簡單的解決關係型數據庫中管理不同類型資料麻煩,以及排序整合的常見操作的性能問題等。
什 麼是
NoSQL (從技術上說)?
從這些
NoSQL 項目的名字上看不出什麼相同之處: Hadoop 、 Voldemort 、 Dynomite ,還有其它很多。
但是,它們通常在某些方面相同:
不要叫它們資料庫。 Amazon.com 的首席技術官 Werner Vogels 將他們的重要的 Dynamo 系統稱作「高可用性的鍵值商店」。 Google 將自己的 BigTable 稱作「管理結構化資料的分散式存儲系統」,在 51CTO.com 之前的外電《雲服務顛覆開發傳統觀念》中曾提到, Google 的 BigTable 不是 SQL 資料庫,原因是 SQL 資料庫支援的一些功能實在難以進行分割,這與我們跨機器存儲資料的想法無法結合。它們都是許多
NoSQL 追隨者的效仿模式。
它們可以處理超大量的資 料。比如 Zvents 公司以 BigTable 模式搭建的開來源資料庫 Hypertable ,據 Zvents 工程師 Doug Judd 介紹,它可以每天在搜尋引擎中寫入 10 億單中繼資料。
另外, BigTable 與其姊妹技術 MapReduce 相結合,每天可以處理多達 20PB 的資料。
「毫無疑問,資料量越來越巨大也讓人們尋找其他的資料庫替代技術,」 SpringSource 的 Travis 說。
它們運行在便宜的 PC 伺服器集群上。 PC 集群擴充起來非常方便並且成本很低,避免了「sharding」操作的複雜性和成本。
Google 曾表示一個 BigTable 的大集群可以管理數千台伺服器上多達 6PB 的資料。
「Oracle 會告訴你需要購買一些硬體然後正確配置 Oracle RAC ,然而用其他的神奇軟體你也可以達到相同的可擴展性。但是兩者的開銷可是天差地別。」 SpringSource 首席技術官 Javier Soltero 說。
它們擊碎了性能瓶頸。
NoSQL 的支持者稱,通過
NoSQL 架構可以省去將 Web 或 Java 應用和資料轉換成 SQL 友好格式的時間,執行速度變得更快。
「SQL 並非適用於所有的程式碼,」資料庫分析師 Curt Monash 說。對於那些繁重的重複操作的資料, SQL 值得花錢。但是當資料庫結構非常簡單時, SQL 可能沒有太大用處。
Adobe 公司資深電腦科學家 Raffaele Sena 說,當一年半前 Adobe 準備重新更新 ConnectNow 網路協作服務時,正是由於上面的理由,他們決定不採用關係型數據庫。
Adobe 決定使用 Terracotta 提供的 Java 集群軟體,管理 Java 格式的資料, Sena 說,這使 ConnectNow 的性能提高到前一版本的 2 至 3 倍。
沒有過 多的操作。雖然
NoSQL 的支持者也承認關聯式資料庫提供了無可比擬的功能集合,而且在資料完整性上也發揮絕對穩定,他們同時也表示,企業的具體需求可能沒有那麼多。
以 Adobe 的 ConnectNow 為例, Sena 說,當用戶線上時它會不通過資料庫而製作三份會話資料,在離線後刪除。「因此我們並不需要資料庫,因為具體所需要的資料是在記憶體中的,」他說。
Bootstrap 支持
因為
NoSQL 項目都是開源的,因此它們缺乏供應商提供的正式支援。這一點它們與大多數開源項目一樣,不得不從社區中尋求支持。
但是一些人承認,沒有正 式的官方支持,萬一出了差錯會是可怕的,至少很多管理人員是這樣看。
「我們確實需要做一些說服工作,」 Adobe的Sena 承認,「但基本在他們看到我們的第一個原型運行良好之後,我們就能夠說服他們,這是條正確的道路。」
「大多數大型企業已經習慣於使用關係 型數據庫管理系統。因此他們會想為什麼要改變呢?」 Monash 說。比如 MapReduce 和類似的項目「對企業有意。但是在具體使用時,它很可能會和一個分析 DBMS (資料庫管理系統)相集成。」
即使
NoSQL 的組織者 Oskarsson 也承認,就算是他自己的公司 Last.fm 也還沒有做好轉為
NoSQL 替代者的準備,現在依然在使用開放源碼資料庫。
他認為這場革命目前仍然需要等待。
http://www.computerworld.com/s/article/9135086/No_to_SQL_Anti_database_movement_gains_steam_
No to SQL? Anti-database movement gains steam
But can enterprises take open-source alternatives Hadoop, Voldemort seriously?
By Eric Lai
The meet-up in San Francisco last month had a whiff of revolution about it, like a latter-day techie version of the American Patriots planning the Boston Tea Party.
The
inaugural get-together of the burgeoning NoSQL communitycrammed 150 attendees into a meeting room at CBS Interactive.
Like the Patriots, who rebelled against Britain's heavy taxes,
NoSQLers came to share how they had overthrown the tyranny of slow, expensive relational databases in favor of more efficient and cheaper ways of managing data.
"Relational databases give you too much. They force you to twist your object data to fit a RDBMS [relational database management system]," said Jon Travis, principal engineer at Java toolmaker SpringSource, one of the 10
presenters at the NoSQL confab (PDF).
NoSQL-based alternatives "just give you what you need," Travis said.
Open source rises up
The movement's chief champions are Web and Java developers, many of whom learned to get by at their cash-strapped startups without Oracle by building their own data storage solutions, emulating those being built by Google Inc. and Amazon.com Inc., and which they subsequently released as open source.
Now that their open source data stores manage hundreds of terabytes or even petabytes of data for thriving Web 2.0 and cloud computing vendors, switching back is neither technically, economically or even ideologically feasible.
"Web 2.0 companies can take chances and they need scalability," said Johan Oskarsson, the London-based organizer of the
NoSQL meeting and, like most of the other attendees, a Web developer (of music streaming site Last.fm). "When you have these two things in combination, it makes [
NoSQL] very compelling."
Many, said Oskarsson, had even dumped the open-source MySQL database, a
long-time Web 2.0 favorite, for a
NoSQL alternative, because the advantages were too compelling to ignore.
Facebook, for instance, created its Cassandra data store to power a new search feature on its Web site rather than use its exisiting database, MySQL. According to a presentation by
Facebook engineer Avinash Lakshman (PDF document), Cassandra can write to a data store taking up 50GB on disk in just 0.12 milliseconds, more than 2,500 times faster than MySQL.
What is NoSQL (technically speaking)?
The names of these projects are as diverse as they are whimsical: Hadoop, Voldemort, Dynomite, and others.
But they are generally unified by a few things, including:
Don't call them databases. Amazon.com's CTO, Werner Vogels, refers to the company's influential Dynamo system as a
"highly available key-value store." Google calls its BigTable, the other role model for many
NoSQL adherents, a
"distributed storage system for managing structured data."
They can blow through enormous amounts of data. Hypertable, an open-source column-based database modeled upon BigTable, is used by local search engine Zvents Inc. to write 1 billion cells of data per day,
according to a presentation by Doug Judd (PDF document), a Zvents engineer.
Meanwhile BigTable, in conjunction with its sister technology, MapReduce, processes as much as
20 petabytes of data per day.
"Definitely, the volume of data is getting so huge that people are looking at other technologies," said SpringSource's Travis, whose 'VPork' technology helps
NoSQL users benchmark the performance of their database alternative.
They run on clusters of cheap PC servers. PC clusters can be easily and cheaply expanded without the
complexity and cost of "sharding,"which involves cutting up databases into multiple tables to run on large clusters or grids.
Google has said that one of BigTable's bigger clusters
manages as much as 6 petabytes of data across thousands of servers.
"Oracle would tell you that with the right degree of hardware and the right configuration of Oracle RAC (Real Application Clusters) and other associated magic software, you can achieve the same scalability. But at what cost?" asks Javier Soltero, CTO of SpringSource.
They beat performance bottlenecks. By sidestepping the time-consuming toil of translating Web or Java apps and data into a SQL-friendly format,
NoSQL architectures perform much faster, say proponents.
"SQL is an awkward fit for procedural code, and almost all code is procedural," said Curt Monash, an independent database analyst and blogger. For data upon which users expect to do heavy, repeated manipulations, the cost of mapping data into SQL is "well worth paying ... But when your database structure is very, very simple, SQL may not seem that beneficial."
Raffaele Sena, a senior computer scientist at Adobe Systems Inc., said that when Adobe relaunched its ConnectNow Web collaboration service a year and a half ago, it decided against using a relational database for just the reason raised by Monash.
Adobe uses Java clustering software from Terracotta Inc. to manage data in Java formats, which Sena says is key to boosting ConnectNow's performance two to three times over the prior version.
"The system would have been more complex and harder to develop using a relational database," he said.
Another project, MongoDB, calls itself a
"document-oriented" databasebecause of its native storage of object-style data.
No overkill. While conceding that relational databases offer an
unparalleled feature set and a rock-solid reputation for data integrity,
NoSQL proponents say this can be too much for their needs.
Take Adobe's ConnectNow, which, even without a database, makes three copies of users' session data while they are online -- data that is mostly deleted after logoff, said Sena.
"We didn't need a database since the best representation of the data was already in memory," he said.
Support by bootstrap
Because they are open source,
NoSQL alternatives lack vendors offering formal support. That's no deal breaker to most proponents, who are plugged closely into this Silicon Valley-centric community and are thus comfortable with the bootstrap approach.
But some admitted that working without a formal
"throat to choke" when things go wrong was scary, at least for their managers.
"We did have to do some selling," admitted Adobe's Sena. "But basically after they saw our first prototype was working, we were able to convince the higher-ups that this was the right way to go."
Despite their huge promise, most enterprises
needn't worry that they are missing out just yet, said Monash.
"Most large enterprises have an established way of doing OLTP [online transaction processing], probably via relational database management systems. Why change?" he said. MapReduce and similar BI-oriented projects "may be useful for enterprises. But where it is, it probably should be integrated into an analytic DBMS [database management system.]"
Even
NoSQL's organizer, Oskarsson, admits that his company, Last.fm, has yet to move to a
NoSQL alternative for production, instead relying on open-source databases.
He agrees that a revolution, for now, remains on hold.
"It's true that [
NoSQL] aren't relevant right now to mainstream enterprises," Oskarsson said, "but that might change one to two years down the line."
No comments:
Post a Comment