Skip to main content

Main menu

  • Home
  • Current Issue
  • Past Issues
  • Videos
  • Submit an article
  • More
    • About JOD
    • Editorial Board
    • Published Ahead of Print (PAP)
  • IPR Logo
  • About Us
  • Journals
  • Publish
  • Advertise
  • Videos
  • Webinars
  • More
    • Awards
    • Article Licensing
    • Academic Use
  • Follow IIJ on LinkedIn
  • Follow IIJ on Twitter

User menu

  • Sample our Content
  • Request a Demo
  • Log in

Search

  • ADVANCED SEARCH: Discover more content by journal, author or time frame
The Journal of Derivatives
  • IPR Logo
  • About Us
  • Journals
  • Publish
  • Advertise
  • Videos
  • Webinars
  • More
    • Awards
    • Article Licensing
    • Academic Use
  • Sample our Content
  • Request a Demo
  • Log in
The Journal of Derivatives

The Journal of Derivatives

ADVANCED SEARCH: Discover more content by journal, author or time frame

  • Home
  • Current Issue
  • Past Issues
  • Videos
  • Submit an article
  • More
    • About JOD
    • Editorial Board
    • Published Ahead of Print (PAP)
  • Follow IIJ on LinkedIn
  • Follow IIJ on Twitter

QLBS: Q-Learner in the Black-Scholes(-Merton) Worlds

Igor Halperin
The Journal of Derivatives Fall 2020, 28 (1) 99-122; DOI: https://doi.org/10.3905/jod.2020.1.108
Igor Halperin
is an AI research associate at Fidelity Investments and a research professor in the Tandon School of Engineering at New York University in Brooklyn, NY
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Article
  • Info & Metrics
  • PDF (Subscribers Only)
Loading

Click to login and read the full article.

Don’t have access? Click here to request a demo 

Alternatively, Call a member of the team to discuss membership options
US and Overseas: +1 646-931-9045
UK: 0207 139 1600

Abstract

This article presents a discrete-time option pricing model that is rooted in reinforcement learning (RL), and more specifically in the famous Q-Learning method of RL. We construct a risk-adjusted Markov Decision Process for a discrete-time version of the classical Black-Scholes-Merton (BSM) model, where the option price is an optimal Q-function, while the optimal hedge is a second argument of this optimal Q-function, so that both the price and hedge are parts of the same formula. Pricing is done by learning to dynamically optimize risk-adjusted returns for an option replicating portfolio, as in Markowitz portfolio theory. Using Q-Learning and related methods, once created in a parametric setting, the model can go model-free and learn to price and hedge an option directly from data, without an explicit model of the world. This suggests that RL may provide efficient data-driven and model-free methods for the optimal pricing and hedging of options. Once we depart from the academic continuous-time limit, and vice versa, option pricing methods developed in Mathematical Finance may be viewed as special cases of model-based reinforcement learning. Further, due to the simplicity and tractability of our model, which only needs basic linear algebra (plus Monte Carlo simulation, if we work with synthetic data), and its close relationship to the original BSM model, we suggest that our model could be used in the benchmarking of different RL algorithms for financial trading applications.

TOPICS: Derivatives, options

Key Findings

  • • Reinforcement learning (RL) is the most natural way for pricing and hedging of options that relies directly on data and not on a specific model of asset pricing.

  • • The discrete-time RL approach to option pricing generalizes classical continuous-time methods; enables tracking mis-hedging risk, which disappears in the formal continuous-time limit; and provides a consistent framework for using options for both hedging and speculation.

  • • A simple quadratic reward function, which presents a minimal extension of the classical Black-Scholes framework when combined with the Q-learning method of RL, gives rise to a particularly simple computational scheme where option pricing and hedging are semianalytical, as they amount to multiple uses of a conventional least-squares regression.

  • © 2020 Pageant Media Ltd
View Full Text

Don’t have access? Click here to request a demo

Alternatively, Call a member of the team to discuss membership options

US and Overseas: +1 646-931-9045

UK: 0207 139 1600

Log in using your username and password

Forgot your user name or password?
PreviousNext
Back to top

Explore our content to discover more relevant research

  • By topic
  • Across journals
  • From the experts
  • Monthly highlights
  • Special collections

In this issue

The Journal of Derivatives: 28 (1)
The Journal of Derivatives
Vol. 28, Issue 1
Fall 2020
  • Table of Contents
  • Index by author
  • Complete Issue (PDF)
Print
Download PDF
Article Alerts
Sign In to Email Alerts with your Email Address
Email Article

Thank you for your interest in spreading the word on The Journal of Derivatives.

NOTE: We only request your email address so that the person you are recommending the page to knows that you wanted them to see it, and that it is not junk mail. We do not capture any email address.

Enter multiple addresses on separate lines or separate them with commas.
QLBS: Q-Learner in the Black-Scholes(-Merton) Worlds
(Your Name) has sent you a message from The Journal of Derivatives
(Your Name) thought you would like to see the The Journal of Derivatives web site.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Citation Tools
QLBS: Q-Learner in the Black-Scholes(-Merton) Worlds
Igor Halperin
The Journal of Derivatives Aug 2020, 28 (1) 99-122; DOI: 10.3905/jod.2020.1.108

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Save To My Folders
Share
QLBS: Q-Learner in the Black-Scholes(-Merton) Worlds
Igor Halperin
The Journal of Derivatives Aug 2020, 28 (1) 99-122; DOI: 10.3905/jod.2020.1.108
del.icio.us logo Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo LinkedIn logo Mendeley logo
Tweet Widget Facebook Like LinkedIn logo

Jump to section

  • Article
    • Abstract
    • Links with Physics
    • DISCRETE-TIME BSM MODEL
    • QLBS
    • Q-LEARNING AND FITTED Q ITERATION IN QLBS
    • DISCUSSION
    • POSSIBLE NUMERICAL EXPERIMENTS
    • SUMMARY
    • ADDITIONAL READING
    • ACKNOWLEDGMENTS
    • APPENDIX A
    • APPENDIX B
    • ENDNOTES
    • REFERENCES
  • Info & Metrics
  • PDF (Subscribers Only)
  • PDF (Subscribers Only)

Similar Articles

Cited By...

  • No citing articles found.
  • Google Scholar
LONDON
One London Wall, London, EC2Y 5EA
United Kingdom
+44 207 139 1600
 
NEW YORK
41 Madison Avenue, New York, NY 10010
USA
+1 646 931 9045
pm-research@pageantmedia.com
 

Stay Connected

  • Follow IIJ on LinkedIn
  • Follow IIJ on Twitter

MORE FROM PMR

  • Home
  • Awards
  • Investment Guides
  • Videos
  • About PMR

INFORMATION FOR

  • Academics
  • Agents
  • Authors
  • Content Usage Terms

GET INVOLVED

  • Advertise
  • Publish
  • Article Licensing
  • Contact Us
  • Subscribe Now
  • Log In
  • Update your profile
  • Give us your feedback

© 2021 Pageant Media Ltd | All Rights Reserved | ISSN: 1074-1240 | E-ISSN: 2168-8524

  • Site Map
  • Terms & Conditions
  • Privacy Policy
  • Cookies