1.16 Machine learning lecture 17

Machine learning Page 1 / 11

MachineLearning-Lecture17

Instructor (Andrew Ng) :Okay, good morning. Welcome back. So I hope all of you had a good Thanksgiving break. After the problem sets, I suspect many of us needed one. Just one quick announcement so as I announced by email a few days ago, this afternoon we’ll be doing another tape ahead of lecture, so I won’t physically be here on Wednesday, and so we’ll be taping this Wednesday’s lecture ahead of time. If you’re free this afternoon, please come to that; it’ll be at 3:45 p.m. in the Skilling Auditorium in Skilling 193 at 3:45. But of course, you can also just show up in class as usual at the usual time or just watch it online as usual also.

Okay, welcome back. What I want to do today is continue our discussion on Reinforcement Learning in MDPs. Quite a long topic for me to go over today, so most of today’s lecture will be on continuous state MDPs, and in particular, algorithms for solving continuous state MDPs, so I’ll talk just very briefly about discretization. I’ll spend a lot of time talking about models, assimilators of MDPs, and then talk about one algorithm called fitted value iteration and two functions which builds on that, and then hopefully, I’ll have time to get to a second algorithm called, approximate policy iteration

Just to recap, right, in the previous lecture, I defined the Reinforcement Learning problem and I defined MDPs, so let me just recap the notation. I said that an MDP or a Markov Decision Process, was a ? tuple, comprising those things and the running example of those using last time was this one right, adapted from the Russell and Norvig AI textbook. So in this example MDP that I was using, it had 11 states, so that’s where S was. The actions were compass directions: north, south, east and west.

The state transition probability is to capture chance of your transitioning to every state when you take any action in any other given state and so in our example that captured the stochastic dynamics of our robot wondering around [inaudible], and we said if you take the action north and the south, you have a .8 chance of actually going north and .1 chance of veering off, so that .1 chance of veering off to the right so said model of the robot’s noisy dynamic with a [inaudible]and the reward function was that +/-1 at the absorbing states and -0.02 elsewhere. This is an example of an MDP, and that’s what these five things were. Oh, and I used a discount factor G of usually a number slightly less than one, so that’s the 0.99. And so our goal was to find the policy, the control policy and that’s at ?, which is a function mapping from the states of the actions that tells us what action to take in every state, and our goal was to find a policy that maximizes the expected value of our total payoff. So we want to find a policy. Well, let’s see. We define value functions Vp (s) to be equal to this. We said that the value of a policy ? from State S was given by the expected value of the sum of discounted rewards, conditioned on your executing the policy ? and you’re stating off your [inaudible] to say in the State S, and so our strategy for finding the policy was sort of comprised of two steps. So the goal is to find a good policy that maximizes the suspected value of the sum of discounted rewards, and so I said last time that one strategy for finding the [inaudible]of a policy is to first compute the optimal value function which I denoted V*(s) and is defined like that. It’s the maximum value that any policy can obtain, and for example, the optimal value function for that MDP looks like this. So in other words, starting from any of these states, what’s the expected value of the sum of discounted rewards you get, so this is V*. We also said that once you’ve found V*, you can compute the optimal policy using this.

Questions & Answers

prostaglandin and fever

Maha Reply

Discuss the differences between taste and flavor, including how other sensory inputs contribute to our perception of flavor.

John Reply

taste refers to your understanding of the flavor . while flavor one The other hand is refers to sort of just a blend things.

Faith

While taste primarily relies on our taste buds, flavor involves a complex interplay between taste and aroma

Kamara

which drugs can we use for ulcers

Ummi Reply

omeprazole

Kamara

what

Renee

what is this

Renee

is a drug

Kamara

of anti-ulcer

Kamara

Omeprazole Cimetidine / Tagament For the complicated once ulcer - kit

Patrick

what is the function of lymphatic system

Nency Reply

Not really sure

Eli

to drain extracellular fluid all over the body.

asegid

The lymphatic system plays several crucial roles in the human body, functioning as a key component of the immune system and contributing to the maintenance of fluid balance. Its main functions include: 1. Immune Response: The lymphatic system produces and transports lymphocytes, which are a type of

asegid

to transport fluids fats proteins and lymphocytes to the blood stream as lymph

Adama

what is anatomy

Oyindarmola Reply

Anatomy is the identification and description of the structures of living things

Kamara

what's the difference between anatomy and physiology

Oyerinde Reply

Anatomy is the study of the structure of the body, while physiology is the study of the function of the body. Anatomy looks at the body's organs and systems, while physiology looks at how those organs and systems work together to keep the body functioning.

AI-Robot

what is enzymes all about?

Mohammed Reply

Enzymes are proteins that help speed up chemical reactions in our bodies. Enzymes are essential for digestion, liver function and much more. Too much or too little of a certain enzyme can cause health problems

Kamara

yes

Prince

how does the stomach protect itself from the damaging effects of HCl

Wulku Reply

little girl okay how does the stomach protect itself from the damaging effect of HCL

Wulku

it is because of the enzyme that the stomach produce that help the stomach from the damaging effect of HCL

Kamara

function of digestive system

Ali Reply

function of digestive

Ali

the diagram of the lungs

Adaeze Reply

what is the normal body temperature

Diya Reply

37 degrees selcius

Xolo

37°c

Stephanie

please why 37 degree selcius normal temperature

Mark

36.5

Simon

37°c

Iyogho

the normal temperature is 37°c or 98.6 °Fahrenheit is important for maintaining the homeostasis in the body the body regular this temperature through the process called thermoregulation which involves brain skin muscle and other organ working together to maintain stable internal temperature

Stephanie

37A c

Wulku

what is anaemia

Diya Reply

anaemia is the decrease in RBC count hemoglobin count and PVC count

Eniola

what is the pH of the vagina

Diya Reply

how does Lysin attack pathogens

Diya

acid

Mary

I information on anatomy position and digestive system and there enzyme

Elisha Reply

anatomy of the female external genitalia

Muhammad Reply

Got questions? Join the online conversation and get instant answers!

Jobilize.com Reply

<< Chapter < Page Page > Chapter >>

Read also:

Get Jobilize Job Search Mobile App in your pocket Now!

100% Free Mobile Applications
Receive real-time job alerts and never miss the right job again

Source: OpenStax, Machine learning. OpenStax CNX. Oct 14, 2013 Download for free at http://cnx.org/content/col11500/1.4

Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Machine learning' conversation and receive update notifications?

Ask

	Anthropology Economic System By Richley Crapo Start Assignment
	14 Dr Landholt Large Animal Medicine-GI quiz By Brooke Delaney Start Exam
	3 Enterprise Java Design Patterns By JavaChamp Team Start Quiz
	Vocabulary for "A Rose for Emily" By Bonnie Hurst Start Quiz
	25 Toxicology Dr. Gustafson/Hamar quiz By Brooke Delaney Start Exam
	Social Work midterm By Katy Pratt Start Exam
	4 Microeconomics 04 Labor Financial Markets By OpenStax Start Flashcards
	4 Pharmacology Essay Excl. Nervous System By Rohini Ajay Start Test
	19 AP 19 Cardiovascular System Heart MCQ By OpenStax Start Quiz
	18 AP 18 Cardiovascular System Blood MCQ By OpenStax Start Quiz