显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

Ooi Beng Chin 黄铭钧

Databases, Machine Learning and Systems

 
 
 
 
 
 

[置顶] 互空间 Co-Space/ CoSpace / Co(existing) Space

2009-2-19 16:58:22 阅读830 评论0 192009/02 Feb19

We wrote [1] in 2009 and Pokémon Go exemplifies the co-space game.  Soon, many VR and industrial operations/manufacturing will be co-space in nature.  23/07/2016.

 

 从传统意义上说,现实世界(physical space)和通过计算机实现的虚拟世界(virtual space)是互相独立的。用户不能跨空间进行操作和交流。然而,普适计算(ubiquitous computing)、智能接口(smart

作者  | 2009-2-19 16:58:22 | 阅读(830) |评论(0) | 阅读全文>>

[置顶] 个人简介

2008-12-4 16:05:04 阅读2425 评论0 42008/12 Dec4

    黄铭钧,新加坡国立大学计算机科学杰出教授和浙江大学长江讲座(adjunct)教授的研究兴趣包括数据库性能问题,索引技术,大数据,多媒体及空间数据库处理,内存数据管理,云计算和并行系统的研究与高级应用等。他的研究工作以企业的实际应用为出发点,致力于将最新的科研成果转化为实际的生产力   

    

作者  | 2008-12-4 16:05:04 | 阅读(2425) |评论(0) | 阅读全文>>

Apache SINGA 分布式深度学习 (Deep Learning) 平台 - 黄铭钧 - Ooi Beng Chin 黄铭钧
  

深度学习(deep learning)可以说是神经网络的品牌重塑,因为它继承了神经网络研究中很多关键的算法技术。它因为最近在图像识别与语音识别领域取得突破性的成功而再次得到了大家的关注[1, 2]。两个关键因素使得深度学习获得如此大的成功:计算能力的大幅度提升以及训练数据的大规模增加。 现在大部分开源的深度学习软件工具和平台都是用单个的GPU节点,这种方法不仅限制了模型的规模也限制了训练数据集的规模。

       分布式训练方法能帮助大规模的深度学习训练,而且也得到了学术界和工业界的关注。利用我们已有的开发分布式数据库系统的经验,我们开发了SINGA(狮子),  一个Apache开源的分布式深度学习平台。SINGA具有三方面的特性,可用性,可扩展性和外延性[5, 6, 7]。SINGA的模型很容易地让别人使用,就

作者  | 2014-8-20 20:28:40 | 阅读(2288) |评论(0) | 阅读全文>>

Raw Data to Analytics Pipeline

2016-9-24 12:39:55 阅读45 评论0 242016/09 Sept24

 
Software stack after so many years of systems building - 黄铭钧 - Ooi Beng Chin 黄铭钧

作者  | 2016-9-24 12:39:55 | 阅读(45) |评论(0) | 阅读全文>>

Database Meets Deep Learning: Challenges and Opportunities

2016-2-24 0:05:40 阅读171 评论0 242016/02 Feb24


Deep learning is one of the most popular topics in compute science in recent years. It has boosted many complex data driven applications such as image classification and speech recognition. Database community has worked on data-driven applications for many years. However, databases and deep learning are different in terms of techniques and applications. In [1], we discuss research

作者  | 2016-2-24 0:05:40 | 阅读(171) |评论(0) | 阅读全文>>

UStore: A "Gitable" Data Store

2016-1-10 23:04:35 阅读156 评论0 102016/01 Jan10

Today's storage systems expose abstractions which are either too low-level (e.g., key-value store, raw-block store) that they require developers to re-invent the wheels, or too high-level (e.g., relational databases, GitHub) that they are not general enough to support many different applications.

We propose to build a new distributed Universal data storage system, called UStore,

作者  | 2016-1-10 23:04:35 | 阅读(156) |评论(0) | 阅读全文>>

Healthcare Analytics

2015-6-6 13:20:19 阅读185 评论0 62015/06 June6

Like other  application domains, doctors and medical specialists are now interested in exploiting healthcare data for better healthcare and disease prevention, and also better utilization of resources.  However, the data although is deposited in a central, possibly national scale, storage systems, the data is owned by different healthcare providers who will guide the data zealously,

作者  | 2015-6-6 13:20:19 | 阅读(185) |评论(0) | 阅读全文>>

In-Memory Big Data Management and Processing: A Survey (TKDE Open Access)

2014-11-28 10:02:52 阅读792 评论0 282014/11 Nov28

  In-Memory Big Data Management and Processing: A Survey - 黄铭钧 - Ooi Beng Chin 黄铭钧

 Figure 1. Landscape of Modern Database Systems

Growing

作者  | 2014-11-28 10:02:52 | 阅读(792) |评论(0) | 阅读全文>>

System Architecture Driven by Hardware

2014-11-28 9:42:50 阅读315 评论0 282014/11 Nov28

The advancement of h/w is no doubt phenomenal, and it has either invalidated existing design principles or caused a redesign to exploit the h/w for speed.  As often said, speed is not an option but a must in business, and so, the tenet "capacity as data, speed as memory and price as disk" is always being reinforced.  For example, with more cores being squeezed into a chip,

作者  | 2014-11-28 9:42:50 | 阅读(315) |评论(0) | 阅读全文>>

Big Data on Small Nodes

2014-11-20 15:57:40 阅读350 评论0 202014/11 Nov20

The continuous increase in volume, variety and velocity of Big Data exposes datacenter resource scaling to an energy utilization problem. Traditionally, datacenters employ x86-64 (big) servers with power usage of tens to hundreds of Watts. But lately, low-power (small) systems originally developed for mobile devices have seen significant improvements in performance. These improvements could lead

作者  | 2014-11-20 15:57:40 | 阅读(350) |评论(0) | 阅读全文>>

LADS: Exploiting Single-Threaded Model in Multi-Core Systems

2014-7-24 10:52:36 阅读564 评论0 242014/07 July24

The widely adopted single-threaded OLTP model assigns a single thread to each static partition of the database for processing transactions in a partition. This simplifies concurrency control while retaining parallelism. However, it suffers performance loss arising from skewed workloads as well as transactions that span multiple partitions. In this paper, we present a dynamic single-threaded in-memory

作者  | 2014-7-24 10:52:36 | 阅读(564) |评论(0) | 阅读全文>>

MemepiC -- 内存数据管理

2014-7-17 21:24:24 阅读550 评论0 172014/07 July17

 

In-memory databases have gained a lot of traction in recent years due to sustained drop in memory cost and also increase in memory size and speed.  Indeed, as has been said many years ago: "memory is the new disk, disk is the new tape".  Figure 1 shows the development of database systems in recent years. 

作者  | 2014-7-17 21:24:24 | 阅读(550) |评论(0) | 阅读全文>>

R-Store: A distributed system for supporting real time analytics (RTOLAP)

2014-5-11 20:30:24 阅读556 评论0 112014/05 May11

R-Store [1] is a scalable distributed system for supporting real-time OLAP by extending the MapReduce framework. We extend an open source distributed key/value system, HBase, as the underlying storage system that stores data cube and real-time data. When real-time data are updated,

作者  | 2014-5-11 20:30:24 | 阅读(556) |评论(0) | 阅读全文>>

Data Sensitive Hashing (DSH) for High-Dimensional KNN Seach

2014-2-9 19:38:55 阅读1014 评论0 92014/02 Feb9

The need to locate the K-nearest data points with respect to a given query points in high-dimensional space is used as the basic operation in many applications.  Tree-structure based approaches do not scale up in terms of dimensionality due to the curse of high-dimensionality. 

作者  | 2014-2-9 19:38:55 | 阅读(1014) |评论(0) | 阅读全文>>

 

随着数据量的剧增以及对数据进行复杂分析(analytics)的需求日益迫切,分布式系统必须不断地增加节点、扩大规模以应对巨大的工作量。显然,节点数量的增加不可避免地会导致节点故障的频发。因此,有效的故障恢复策略对于分布式系统来说非常重要。在现有的分布式系统中,存在两种常见的故障恢复(failure recovery)策略:一种是基于检查点(Checkpoint recovery)的恢复策略,另一种是基于密闭回收(Confined recovery)的恢复策略。最近

作者  | 2013-12-15 10:01:50 | 阅读(784) |评论(0) | 阅读全文>>

查看所有日志>>

 
 
 
 
 
 
 
 

海外 新加坡

 发消息  写留言

 
博客等级加载中...
今日访问加载中...
总访问量加载中...
最后登录加载中...
 
 
 
 
 
 
 
列表加载中...
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2016

注册 登录  
 加关注