显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

Ooi Beng Chin 黄铭钧

Databases, Machine Learning and Systems


[置顶] 互空间 Co-Space/ CoSpace / Co(existing) Space

2009-2-19 16:58:22 阅读815 评论0 192009/02 Feb19

We wrote [1] in 2009 and Pokémon Go exemplifies the co-space game.  Soon, many VR and industrial operations/manufacturing will be co-space in nature.  23/07/2016.


 从传统意义上说,现实世界(physical space)和通过计算机实现的虚拟世界(virtual space)是互相独立的。用户不能跨空间进行操作和交流。然而,普适计算(ubiquitous computing)、智能接口(smart

作者  | 2009-2-19 16:58:22 | 阅读(815) |评论(0) | 阅读全文>>

[置顶] 个人简介

2008-12-4 16:05:04 阅读2385 评论0 42008/12 Dec4



作者  | 2008-12-4 16:05:04 | 阅读(2385) |评论(0) | 阅读全文>>

Apache SINGA 分布式深度学习 (Deep Learning) 平台 - 黄铭钧 - Ooi Beng Chin 黄铭钧

深度学习(deep learning)可以说是神经网络的品牌重塑,因为它继承了神经网络研究中很多关键的算法技术。它因为最近在图像识别与语音识别领域取得突破性的成功而再次得到了大家的关注[1, 2]。两个关键因素使得深度学习获得如此大的成功:计算能力的大幅度提升以及训练数据的大规模增加。 现在大部分开源的深度学习软件工具和平台都是用单个的GPU节点,这种方法不仅限制了模型的规模也限制了训练数据集的规模。

       分布式训练方法能帮助大规模的深度学习训练,而且也得到了学术界和工业界的关注。利用我们已有的开发分布式数据库系统的经验,我们开发了SINGA(狮子),  一个Apache开源的分布式深度学习平台。SINGA具有三方面的特性,可用性,可扩展性和外延性[5, 6, 7]。SINGA的模型很容易地让别人使用,就

作者  | 2014-8-20 20:28:40 | 阅读(2217) |评论(0) | 阅读全文>>

Database Meets Deep Learning: Challenges and Opportunities

2016-2-24 0:05:40 阅读153 评论0 242016/02 Feb24

Deep learning is one of the most popular topics in compute science in recent years. It has boosted many complex data driven applications such as image classification and speech recognition. Database community has worked on data-driven applications for many years. However, databases and deep learning are different in terms of techniques and applications. In [1], we discuss research

作者  | 2016-2-24 0:05:40 | 阅读(153) |评论(0) | 阅读全文>>

UStore: A "Gitable" Data Store

2016-1-10 23:04:35 阅读134 评论0 102016/01 Jan10

Today's storage systems expose abstractions which are either too low-level (e.g., key-value store, raw-block store) that they require developers to re-invent the wheels, or too high-level (e.g., relational databases, GitHub) that they are not general enough to support many different applications.

We propose to build a new distributed Universal data storage system, called UStore,

作者  | 2016-1-10 23:04:35 | 阅读(134) |评论(0) | 阅读全文>>

Healthcare Analytics

2015-6-6 13:20:19 阅读182 评论0 62015/06 June6

Like other  application domains, doctors and medical specialists are now interested in exploiting healthcare data for better healthcare and disease prevention, and also better utilization of resources.  However, the data although is deposited in a central, possibly national scale, storage systems, the data is owned by different healthcare providers who will guide the data zealously,

作者  | 2015-6-6 13:20:19 | 阅读(182) |评论(0) | 阅读全文>>

In-Memory Big Data Management and Processing: A Survey (TKDE Open Access)

2014-11-28 10:02:52 阅读753 评论0 282014/11 Nov28

  In-Memory Big Data Management and Processing: A Survey - 黄铭钧 - Ooi Beng Chin 黄铭钧

 Figure 1. Landscape of Modern Database Systems


作者  | 2014-11-28 10:02:52 | 阅读(753) |评论(0) | 阅读全文>>

System Architecture Driven by Hardware

2014-11-28 9:42:50 阅读310 评论0 282014/11 Nov28

The advancement of h/w is no doubt phenomenal, and it has either invalidated existing design principles or caused a redesign to exploit the h/w for speed.  As often said, speed is not an option but a must in business, and so, the tenet "capacity as data, speed as memory and price as disk" is always being reinforced.  For example, with more cores being squeezed into a chip,

作者  | 2014-11-28 9:42:50 | 阅读(310) |评论(0) | 阅读全文>>

Big Data on Small Nodes

2014-11-20 15:57:40 阅读337 评论0 202014/11 Nov20

The continuous increase in volume, variety and velocity of Big Data exposes datacenter resource scaling to an energy utilization problem. Traditionally, datacenters employ x86-64 (big) servers with power usage of tens to hundreds of Watts. But lately, low-power (small) systems originally developed for mobile devices have seen significant improvements in performance. These improvements could lead

作者  | 2014-11-20 15:57:40 | 阅读(337) |评论(0) | 阅读全文>>

LADS: Exploiting Single-Threaded Model in Multi-Core Systems

2014-7-24 10:52:36 阅读537 评论0 242014/07 July24

The widely adopted single-threaded OLTP model assigns a single thread to each static partition of the database for processing transactions in a partition. This simplifies concurrency control while retaining parallelism. However, it suffers performance loss arising from skewed workloads as well as transactions that span multiple partitions. In this paper, we present a dynamic single-threaded in-memory

作者  | 2014-7-24 10:52:36 | 阅读(537) |评论(0) | 阅读全文>>

MemepiC -- 内存数据管理

2014-7-17 21:24:24 阅读521 评论0 172014/07 July17


In-memory databases have gained a lot of traction in recent years due to sustained drop in memory cost and also increase in memory size and speed.  Indeed, as has been said many years ago: "memory is the new disk, disk is the new tape".  Figure 1 shows the development of database systems in recent years. 

作者  | 2014-7-17 21:24:24 | 阅读(521) |评论(0) | 阅读全文>>

R-Store: A distributed system for supporting real time analytics (RTOLAP)

2014-5-11 20:30:24 阅读553 评论0 112014/05 May11

R-Store [1] is a scalable distributed system for supporting real-time OLAP by extending the MapReduce framework. We extend an open source distributed key/value system, HBase, as the underlying storage system that stores data cube and real-time data. When real-time data are updated,

作者  | 2014-5-11 20:30:24 | 阅读(553) |评论(0) | 阅读全文>>

Data Sensitive Hashing (DSH) for High-Dimensional KNN Seach

2014-2-9 19:38:55 阅读989 评论0 92014/02 Feb9

The need to locate the K-nearest data points with respect to a given query points in high-dimensional space is used as the basic operation in many applications.  Tree-structure based approaches do not scale up in terms of dimensionality due to the curse of high-dimensionality. 

作者  | 2014-2-9 19:38:55 | 阅读(989) |评论(0) | 阅读全文>>


随着数据量的剧增以及对数据进行复杂分析(analytics)的需求日益迫切,分布式系统必须不断地增加节点、扩大规模以应对巨大的工作量。显然,节点数量的增加不可避免地会导致节点故障的频发。因此,有效的故障恢复策略对于分布式系统来说非常重要。在现有的分布式系统中,存在两种常见的故障恢复(failure recovery)策略:一种是基于检查点(Checkpoint recovery)的恢复策略,另一种是基于密闭回收(Confined recovery)的恢复策略。最近

作者  | 2013-12-15 10:01:50 | 阅读(782) |评论(0) | 阅读全文>>

内存数据管理剖析:Memcached vs Redis vs RDD

2013-12-3 9:45:38 阅读590 评论0 32013/12 Dec3



我们着手对适合内存数据管理的系统(包括两个流行的系统和一个新兴的系统)进行性能分析,分别是Memcached、RedisResilient Distributed Dataset (RDD)。通过对数据分析操作和细粒度对象操作(例如set/get操作)的全面的性能分析,结果显示全部系统均不能同时高效地支持这两类操作。对于Memcached和Redis,其性能瓶颈在于TCP协议的CPU性能和I/O性能 - 即使在同一台机器上访问内存对象;而RDD依靠顺序扫描,不能高效地支持随机对象查找,因此内存读写成为了瓶颈。我们的分析还揭示了欲实现高效的内存数据管理所需具备的一系列特征[1]。

作者  | 2013-12-3 9:45:38 | 阅读(590) |评论(0) | 阅读全文>>



海外 新加坡

 发消息  写留言



网易公司版权所有 ©1997-2016




注册 登录