Ahmed Akakzia

November 20, 2020

How to Motivate Embodied Machines?

By Ahmed Akakzia

Reinforcement learning represents a mathematical framework for training embodied machines to fulfill a particular task. This task is usually pre-defined by the conceptor, as is modelled as a Markov Decision Process whose reward function usually dictates the learned behavior. However, can these embodied machines learn by their own without this external rewarding signal ? In this blog post, we present a vulgarization of the idea of intrinsically motivated embodied machines.

Continue reading →

May 23, 2019

Comparing Multi-task and Meta Reinforcement Learning

By Ahmed Akakzia

When tasks and goals are not known in advance, an agent may use either multitask learning or meta reinforcement learning to learn how to transfer knowledge from what it learned before. Recently, goal-conditioned policies and hindsight experience replay have become standard tools to address transfer between goals in the multitask learning setting. In this blog post, I show that these tools can also be imported into the meta reinforcement learning when one wants to address transfer between tasks and between goals at the same time. More importantly, I compare the computation gradients in MAML—a state of the art meta learning algorithm— to gradients in classic multi-task learning setups.

Continue reading →

March 10, 2019

From CDN to P2P: a Brief Dataset Analysis for Video Streaming

By Ahmed Akakzia

Recently, peer-to-peer technology has been welcoming many “shifters” that chose to quit content distribution networks. The reason behind this is the self-scalability of P2P systems provided by the principles of communal collaboration and resource sharing in P2P systems. By building a P2P Content Distribution Network (CDN), peers collaborate to distribute the content of under-provisioned websites and to serve queries for large audiences on behalf of the websites. When designing a P2P CDN, the main challenge is to actually maintain an acceptable level of performance in terms of client-perceived latency and hit ratio while minimizing the incurred overhead.

Continue reading →