加入收藏 | 设为首页 | 会员中心 | 我要投稿 宿州站长网 (https://www.0557zz.com/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 教程 > 正文

专访:大数据群雄逐鹿 Hadoop坚持开源?

发布时间:2017-01-08 23:35:11 所属栏目:教程 来源:皮丽华
导读:副标题#e# 【评论】出身名门雅虎的Hortonworks拥有许多优秀的Hadoop架构师与源代码的贡献者,它们为Apache Hadoop项目贡献了超过80%的源代码。随着各种Hadoop发行版的涌现,Hortonworks如何能一枝独秀,坚持自己百分之百的开源路线呢?本期IT名人堂嘉宾,我

  【 评论】出身名门雅虎的Hortonworks拥有许多优秀的Hadoop架构师与源代码的贡献者,它们为Apache Hadoop项目贡献了超过80%的源代码。随着各种Hadoop发行版的涌现,Hortonworks如何能一枝独秀,坚持自己百分之百的开源路线呢?本期IT名人堂嘉宾,我们在2015中国Hadoop技术峰会上,邀请到了Hortonworks的 CTO Jeff,对他进行了独家视频访谈。




  Jeff:我回顾了2014年的历程,也讲到了这一年重点发生的一些业界大事儿,整个Hadoop生态系统变得越来越成熟,变得越来越重要。在技术层面上,我还谈及了架构、SQL on Hadoop的解决方案等。此外,我还从整个开源项目的角度,预测了2015年Hadoop生态系统的发展趋势。







  皮皮:您说得非常好,我们今天在谈大数据,经常会提及到3V( volume、variety、 velocity),Hadoop是怎么来满足这些需求的?





  我们Hortonworks收购了一家XA Secure的企业,我们为Apache软件基金会贡献了一个新项目Apache Ranger。这个新项目结合了一些安全性特征,被引入到了Hadoop项目的内核中,为Hadoop的发行版提供了全面的安全套件。在这个安全套件里,不管你把数据存储在Hadoop集群里,还是存储在Hive Table、或者HDFS里,我们都可以使用Apache Ranger项目来确保数据的安全性。


  Jeff:你说到,很少人会直接使用Apache发行版,这个没错。事实上,当你在使用Hortonworks 数据平台的时候,你在使用开源的Apache 软件基金会的发行版。我们坚信,开源能够带来最好的价值,开源能够实现最好的创新,开源能够为数据中心引入最好的技术。因此,我们要做的事情都会围绕Apache软件基金会展开。

  当然,我对其它的发行版也心生敬畏,比如Cloudera Manager、 Cloudera Navigator等,这些项目在开源的世界里发挥着非常重要的作用。而我们一直以来,都坚持将它开源,保持了整个Hadoop生态系统的纯开源的本质。除了Hortonworks,没有其它的企业还能坚持百分之百的开源。


  Jeff:当我们投入Apache软件基金会的研发过程时,当我们在开发Hadoop核心代码时,我们要把已有的技术和资金投入到我们的数据中心里。无论你用的是Oracle、SQL Server、还是Teradata等数据库,我们想做的是将Hadoop整合到已有的技术中,能够将现有技术的价值最大化。因此,我想对CTO说的是,请在你的数据中心中使用Hadoop吧,将Hadoop整合到您的产品中吧,因为它们是开源的。


  Jeff:对于个人来讲,我的建议是上官方网站Hortonworks.com 下载Sandbox体验下,这是一个大家都可以使用的虚拟机,它能够免费运行在桌面上,同时支持Windows和Mac操作系统,大家可以在VMware里运行Sandbox,也可以在VirtualBox里运行它。



  PiPi:Hello, Jeff, Nice to meet you!You are well known overseas as Horthonworks CTO, Maybe you are not so familiar with Chinese people.So can you Introduce yourself?

  Jeff Markham:Sure. My name is Jeff Markham. I am the Technical Director for Asia Pacific for Hortonworks, the providers of the only open-sourced Hadoop distribution.

  PiPi:So,on China Hadoop Summit,what is your presentation?Could you share your keynote?

  Jeff Markham:Sure. Today I talk about what happened in 2014 and it was an importance in the Hadoop ecosystem. We talk a little bit about architecture; we talk a little bit about the SQL on Hadoop solutions and then we look at what is coming in 2015 in Hadoop ecosystem in terms of what is available in the pure open source projects.

  PiPi:When we can talk about big data, we will think of Hadoop, so people may ask, is big data equals to hadoop? What do you think of their relationship?

  Jeff Markham:That’s a good question. Some people say Big Data is Hadoop, some people say Big Data is… is not Hadoop. We of course see the rise in the popularity of Big Data, very much in parallel with the rise in the popularity of Hadoop. And the reason for that, popularity for Hadoop, the reason for the huge rate of the option is that Hadoop is built on a couple of key things. One is that it’s a pure open source; two is that it runs on commodity hardware. That means anybody can start downloading and experimenting and finding out new ways to process and analyze their data. Today, well, as before, they were never able to do that. So in my opinion, yes ,I think Big Data and Hadoop are so closely related and they can be considered as the same thing .

  PiPi:Hadoop is so popular during these years,it seems that everybody is talking about Hadoop. How do you see the future of the Hadoop ecosystem?

  Jeff Markham:Well, I see the rate of the option only increasing, just for the same reason that I mentioned before.The fact that it’s open-sourced, the fact that it runs on commodity hardware, enables any company of any size to start ingesting and analyzing data as they have never been able to do before. So I only see the rate of the option simply increasing, uh…, during this year. Well, I think it’s going to be important this year is we’re gotta  move away from how fast can one distribution versus another process’s certain query, and I think we gotta start discussing more broad-level used cases.

  How can industries, uh…, such as the financial sector, how can industries, such as manufacturing, telcos, take the data that they have today and use it as a competitive advantage? I don’t think we gotta have a lot of the discussion this year on who does what query 5 seconds faster. I think we gotta have a bigger discussion on what is the overall value of Hadoop to each individual organization. How can they use it to not only monetize their data, but to give a better, deeper understanding of their customers, or their product, or their service?


