显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

Ooi Beng Chin 黄铭钧

Databases, Machine Learning and Systems

 
 
 
 
 
 

[置顶] 个人简介

2008-12-4 16:05:04 阅读2501 评论0 42008/12 Dec4

    黄铭钧,新加坡国立大学计算机科学杰出教授和浙江大学长江讲座(adjunct)教授的研究兴趣包括机器学习,数据库性能问题,索引技术大数据,多媒体及空间数据库处理,内存数据管理,云计算和并行系统的研究与高级应用等。他的研究工作以企业的实际应用为出发点,致力于将最新的科研成果转化为实际的生产力   

作者  | 2008-12-4 16:05:04 | 阅读(2501) |评论(0) | 阅读全文>>

Apache SINGA 分布式深度学习 (Deep Learning) 平台 - 黄铭钧 - Ooi Beng Chin 黄铭钧
  

深度学习(deep learning)可以说是神经网络的品牌重塑,因为它继承了神经网络研究中很多关键的算法技术。它因为最近在图像识别与语音识别领域取得突破性的成功而再次得到了大家的关注[1, 2]。两个关键因素使得深度学习获得如此大的成功:计算能力的大幅度提升以及训练数据的大规模增加。 现在大部分开源的深度学习软件工具和平台都是用单个的GPU节点,这种方法不仅限制了模型的规模也限制了训练数据集的规模。

       分布式训练方法能帮助大规模的深度学习训练,而且也得到了学术界和工业界的关注。利用我们已有的开发分布式数据库系统的经验,我们开发了SINGA(狮子),  一个Apache开源的分布式深度学习平台。SINGA具有三方面的特性,可用性,可扩展性和外延性[5, 6, 7, 9]。SINGA的模型很容易地让别人使

作者  | 2014-8-20 20:28:40 | 阅读(2530) |评论(0) | 阅读全文>>

Cohort Analysis

2016-11-13 15:32:08 阅读28 评论0 132016/11 Nov13

Modern Internet applications often produce a large volume of user activity records. Data analysts are interested in cohort analysis [1], or finding unusual user behavioral trends, in these large tables of activity records. Applications include  deeper

作者  | 2016-11-13 15:32:08 | 阅读(28) |评论(0) | 阅读全文>>

Raw Data to Analytics Pipeline

2016-9-24 12:39:55 阅读104 评论0 242016/09 Sept24

 
Software stack after so many years of systems building - 黄铭钧 - Ooi Beng Chin 黄铭钧

作者  | 2016-9-24 12:39:55 | 阅读(104) |评论(0) | 阅读全文>>

Database Meets Deep Learning: Challenges and Opportunities

2016-2-24 0:05:40 阅读214 评论0 242016/02 Feb24


Deep learning is one of the most popular topics in compute science in recent years. It has boosted many complex data driven applications such as image classification and speech recognition. Database community has worked on data-driven applications for many years. However, databases and deep learning are different in terms of techniques and applications. In [1], we discuss research

作者  | 2016-2-24 0:05:40 | 阅读(214) |评论(0) | 阅读全文>>

Healthcare Analytics

2015-6-6 13:20:19 阅读210 评论0 62015/06 June6

Healthcare Analytics - 黄铭钧 - Ooi Beng Chin 黄铭钧

Like other  application domains, doctors and medical specialists are now interested in exploiting healthcare data for better healthcare and disease prevention, and also better utilization of resources.  However, the data although is deposited in a central, possibly national scale, storage systems, the data is owned by different healthcare providers who will guide the data zealously, due to its commercial value

作者  | 2015-6-6 13:20:19 | 阅读(210) |评论(0) | 阅读全文>>

In-Memory Big Data Management and Processing: A Survey (TKDE Open Access)

2014-11-28 10:02:52 阅读818 评论0 282014/11 Nov28

  In-Memory Big Data Management and Processing: A Survey - 黄铭钧 - Ooi Beng Chin 黄铭钧

 Figure 1. Landscape of Modern Database Systems

Growing

作者  | 2014-11-28 10:02:52 | 阅读(818) |评论(0) | 阅读全文>>

System Architecture Driven by Hardware

2014-11-28 9:42:50 阅读321 评论0 282014/11 Nov28

The advancement of h/w is no doubt phenomenal, and it has either invalidated existing design principles or caused a redesign to exploit the h/w for speed.  As often said, speed is not an option but a must in business, and so, the tenet "capacity as data, speed as memory and price as disk" is always being reinforced.  For example, with more cores being squeezed into a chip,

作者  | 2014-11-28 9:42:50 | 阅读(321) |评论(0) | 阅读全文>>

Big Data on Small Nodes

2014-11-20 15:57:40 阅读370 评论0 202014/11 Nov20

The continuous increase in volume, variety and velocity of Big Data exposes datacenter resource scaling to an energy utilization problem. Traditionally, datacenters employ x86-64 (big) servers with power usage of tens to hundreds of Watts. But lately, low-power (small) systems originally developed for mobile devices have seen significant improvements in performance. These improvements could lead

作者  | 2014-11-20 15:57:40 | 阅读(370) |评论(0) | 阅读全文>>

LADS: Exploiting Single-Threaded Model in Multi-Core Systems

2014-7-24 10:52:36 阅读611 评论0 242014/07 July24

The widely adopted single-threaded OLTP model assigns a single thread to each static partition of the database for processing transactions in a partition. This simplifies concurrency control while retaining parallelism. However, it suffers performance loss arising from skewed workloads as well as transactions that span multiple partitions. In this paper, we present a dynamic single-threaded in-memory

作者  | 2014-7-24 10:52:36 | 阅读(611) |评论(0) | 阅读全文>>

MemepiC -- 内存数据管理

2014-7-17 21:24:24 阅读589 评论0 172014/07 July17

 

In-memory databases have gained a lot of traction in recent years due to sustained drop in memory cost and also increase in memory size and speed.  Indeed, as has been said many years ago: "memory is the new disk, disk is the new tape".  Figure 1 shows the development of database systems in recent years. 

作者  | 2014-7-17 21:24:24 | 阅读(589) |评论(0) | 阅读全文>>

 

随着数据量的剧增以及对数据进行复杂分析(analytics)的需求日益迫切,分布式系统必须不断地增加节点、扩大规模以应对巨大的工作量。显然,节点数量的增加不可避免地会导致节点故障的频发。因此,有效的故障恢复策略对于分布式系统来说非常重要。在现有的分布式系统中,存在两种常见的故障恢复(failure recovery)策略:一种是基于检查点(Checkpoint recovery)的恢复策略,另一种是基于密闭回收(Confined recovery)的恢复策略。最近

作者  | 2013-12-15 10:01:50 | 阅读(805) |评论(0) | 阅读全文>>

内存数据管理剖析:Memcached vs Redis vs RDD

2013-12-3 9:45:38 阅读612 评论0 32013/12 Dec3

 

epiC最初被设计为基于磁盘的分布式数据管理和处理平台。最近我们在研究多种方法来使epiC更加“内存化”,通过把尽可能多的数据驻留在内存,从而避免了每次处理后都要写回磁盘。

我们着手对适合内存数据管理的系统(包括两个流行的系统和一个新兴的系统)进行性能分析,分别是Memcached、RedisResilient Distributed Dataset (RDD)。通过对数据分析操作和细粒度对象操作(例如set/get操作)的全面的性能分析,结果显示全部系统均不能同时高效地支持这两类操作。对于Memcached和Redis,其性能瓶颈在于TCP协议的CPU性能和I/O性能 - 即使在同一台机器上访问内存对象;而RDD依靠顺序扫描,不能高效地支持随机对象查找,因此内存读写成为了瓶颈。我们的分析还揭示了欲实现高效的内存数据管理所需具备的一系列特征[1]。

作者  | 2013-12-3 9:45:38 | 阅读(612) |评论(0) | 阅读全文>>

Database Self-Assessment Meeting at Beckman Center (13-14 Oct)

2013-10-15 2:11:45 阅读498 评论0 152013/10 Oct15

I attended the meeting at Beckman Center, UCI.  I presented my 5 minute pitch on "Contextual Crowd Intelligence"[1].  It is about engaging and exploiting real users who are subject matter experts as HIT workers and reviewers for improving the accuracy of analytics and usability of database applications.  Take the healthcare system for example, medical specialists help provide the

作者  | 2013-10-15 2:11:45 | 阅读(498) |评论(0) | 阅读全文>>

CIIDAA [1] is a large scale system project that was initiated late last year --  the project has six research fellows, six research assistants, and eight graduate students working with some 15 faculty members (including visiting professors) from various research areas, such as system performance, architecture, security, programming language, software engineering, database

作者  | 2013-9-18 18:00:36 | 阅读(523) |评论(0) | 阅读全文>>

查看所有日志>>

 
 
 
 
 
 
 
 

海外 新加坡

 发消息  写留言

 
博客等级加载中...
今日访问加载中...
总访问量加载中...
最后登录加载中...
 
 
 
 
 
 
 
列表加载中...
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2016

注册 登录  
 加关注