<< Chapter < Page Chapter >> Page >

MachineLearning-Lecture17

Instructor (Andrew Ng) :Okay, good morning. Welcome back. So I hope all of you had a good Thanksgiving break. After the problem sets, I suspect many of us needed one. Just one quick announcement so as I announced by email a few days ago, this afternoon we’ll be doing another tape ahead of lecture, so I won’t physically be here on Wednesday, and so we’ll be taping this Wednesday’s lecture ahead of time. If you’re free this afternoon, please come to that; it’ll be at 3:45 p.m. in the Skilling Auditorium in Skilling 193 at 3:45. But of course, you can also just show up in class as usual at the usual time or just watch it online as usual also.

Okay, welcome back. What I want to do today is continue our discussion on Reinforcement Learning in MDPs. Quite a long topic for me to go over today, so most of today’s lecture will be on continuous state MDPs, and in particular, algorithms for solving continuous state MDPs, so I’ll talk just very briefly about discretization. I’ll spend a lot of time talking about models, assimilators of MDPs, and then talk about one algorithm called fitted value iteration and two functions which builds on that, and then hopefully, I’ll have time to get to a second algorithm called, approximate policy iteration

Just to recap, right, in the previous lecture, I defined the Reinforcement Learning problem and I defined MDPs, so let me just recap the notation. I said that an MDP or a Markov Decision Process, was a ? tuple, comprising those things and the running example of those using last time was this one right, adapted from the Russell and Norvig AI textbook. So in this example MDP that I was using, it had 11 states, so that’s where S was. The actions were compass directions: north, south, east and west.

The state transition probability is to capture chance of your transitioning to every state when you take any action in any other given state and so in our example that captured the stochastic dynamics of our robot wondering around [inaudible], and we said if you take the action north and the south, you have a .8 chance of actually going north and .1 chance of veering off, so that .1 chance of veering off to the right so said model of the robot’s noisy dynamic with a [inaudible]and the reward function was that +/-1 at the absorbing states and -0.02 elsewhere. This is an example of an MDP, and that’s what these five things were. Oh, and I used a discount factor G of usually a number slightly less than one, so that’s the 0.99. And so our goal was to find the policy, the control policy and that’s at ?, which is a function mapping from the states of the actions that tells us what action to take in every state, and our goal was to find a policy that maximizes the expected value of our total payoff. So we want to find a policy. Well, let’s see. We define value functions Vp (s) to be equal to this. We said that the value of a policy ? from State S was given by the expected value of the sum of discounted rewards, conditioned on your executing the policy ? and you’re stating off your [inaudible] to say in the State S, and so our strategy for finding the policy was sort of comprised of two steps. So the goal is to find a good policy that maximizes the suspected value of the sum of discounted rewards, and so I said last time that one strategy for finding the [inaudible]of a policy is to first compute the optimal value function which I denoted V*(s) and is defined like that. It’s the maximum value that any policy can obtain, and for example, the optimal value function for that MDP looks like this. So in other words, starting from any of these states, what’s the expected value of the sum of discounted rewards you get, so this is V*. We also said that once you’ve found V*, you can compute the optimal policy using this.

Questions & Answers

prostaglandin and fever
Maha Reply
Discuss the differences between taste and flavor, including how other sensory inputs contribute to our  perception of flavor.
John Reply
taste refers to your understanding of the flavor . while flavor one The other hand is refers to sort of just a blend things.
Faith
While taste primarily relies on our taste buds, flavor involves a complex interplay between taste and aroma
Kamara
which drugs can we use for ulcers
Ummi Reply
omeprazole
Kamara
what
Renee
what is this
Renee
is a drug
Kamara
of anti-ulcer
Kamara
Omeprazole Cimetidine / Tagament For the complicated once ulcer - kit
Patrick
what is the function of lymphatic system
Nency Reply
Not really sure
Eli
to drain extracellular fluid all over the body.
asegid
The lymphatic system plays several crucial roles in the human body, functioning as a key component of the immune system and contributing to the maintenance of fluid balance. Its main functions include: 1. Immune Response: The lymphatic system produces and transports lymphocytes, which are a type of
asegid
to transport fluids fats proteins and lymphocytes to the blood stream as lymph
Adama
what is anatomy
Oyindarmola Reply
Anatomy is the identification and description of the structures of living things
Kamara
what's the difference between anatomy and physiology
Oyerinde Reply
Anatomy is the study of the structure of the body, while physiology is the study of the function of the body. Anatomy looks at the body's organs and systems, while physiology looks at how those organs and systems work together to keep the body functioning.
AI-Robot
what is enzymes all about?
Mohammed Reply
Enzymes are proteins that help speed up chemical reactions in our bodies. Enzymes are essential for digestion, liver function and much more. Too much or too little of a certain enzyme can cause health problems
Kamara
yes
Prince
how does the stomach protect itself from the damaging effects of HCl
Wulku Reply
little girl okay how does the stomach protect itself from the damaging effect of HCL
Wulku
it is because of the enzyme that the stomach produce that help the stomach from the damaging effect of HCL
Kamara
function of digestive system
Ali Reply
function of digestive
Ali
the diagram of the lungs
Adaeze Reply
what is the normal body temperature
Diya Reply
37 degrees selcius
Xolo
37°c
Stephanie
please why 37 degree selcius normal temperature
Mark
36.5
Simon
37°c
Iyogho
the normal temperature is 37°c or 98.6 °Fahrenheit is important for maintaining the homeostasis in the body the body regular this temperature through the process called thermoregulation which involves brain skin muscle and other organ working together to maintain stable internal temperature
Stephanie
37A c
Wulku
what is anaemia
Diya Reply
anaemia is the decrease in RBC count hemoglobin count and PVC count
Eniola
what is the pH of the vagina
Diya Reply
how does Lysin attack pathogens
Diya
acid
Mary
I information on anatomy position and digestive system and there enzyme
Elisha Reply
anatomy of the female external genitalia
Muhammad Reply
Got questions? Join the online conversation and get instant answers!
Jobilize.com Reply

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Machine learning. OpenStax CNX. Oct 14, 2013 Download for free at http://cnx.org/content/col11500/1.4
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Machine learning' conversation and receive update notifications?

Ask