Screen Automation using Pyautogui

Screen Automation” otherwise called “GUI Automation” is an automation process that involves virtually controlling the input devices of a computer with programming.

The industry name for the above process is Robotic Process Automation.

In this post, I will teach you how to perform Screen Automation with the Pyautogui module of Python. At the end of this post, you will take away the ability to automate some repetitive manual tasks at work.

Tools Required

It is a must to have your Jupyter notebook or Python IDLE(Version 3.0 and above) installed into your system. Install Anaconda Navigator to use Jupyter Notebook.

After installing Anaconda or Python IDLE, type ‘cmd‘ in your search to open Command prompt.

To install the Pyautogui library, use the below code.

pip install pyautogui

In case you are using Anaconda Navigator, then you can also try the below code, to install Pyautogui.

conda install pyautogui

Note: The above command should be tried in Anaconda Prompt.

Importing Pyautogui module

If you are familiar with python, you know about this step. Else, no worries!

When we are about to use a module in our code, we need to import it into our coding window. Only then we can make use of the functions and methods available in that particular module.

To import the Pyautogui module, open a new Jupyter Notebook and run the below lines of code.

from pyautogui import *

The above step is a necessary step, irrespective of what we are going to do with the pyautogui module. Let’s move on further.

The Screen

Each and every point inside an x-y chart can be represented with a coordinate of the format (x,y). Right?

Robotic Process Automation basics

Similarly, each and every point inside a monitor/laptop screen can be represented with a similar (x,y) format. But x and y are in the unit of pixels. Because the monitor screen is divided into pixels.

To know the pixel division of your monitor, just use the below code in one of your cells in your Jupyter notebook.

 size() 

And it will return an output like below.

Size(width=1366, height=768)

This means that my monitor width is 1366 pixels and monitor height is 768 pixels. Yours may vary accordingly.

Screen automation with pyautogui

So, the coordinate of the top left corner is (0,0), while the coordinate of the bottom right corner is (1366, 768).

Pixels

Every element or object on our monitor screen, be it a folder or an icon, it has a position. And that position can be represented with (x, y) pixel coordinates.

So, how to find the position of an icon on the screen?

Obviously, we cannot use a ruler scale to measure the distance of a desktop icon from the left and top sides of the screen.

That’s why Pyautogui uses the position() attribute. This position() when executed, will return the coordinates of the mouse pointer’s current location/position – in pixels of the format (x=, y= ).

Let’s test them. Now, I will place the mouse pointer on the windows icon on my laptop screen. As you can guess the windows icon will be at the bottom left corner of our screen.

screen automation

After placing my mouse pointer here, if I execute the below code.

 position() 

The output will be…

Point(x=22, y=741)

Thus, we can say that the position of the windows icon on the screen is (22, 741).

By placing the mouse pointer anywhere and executing the above code, we can easily find the location of the object in terms of pixel coordinates.

Well, that’s all about screen and pixels, now we are over to the “mouse”.

Mouse Automation

Before reading further, I recommend you to take a few seconds to think of the variety of clicks/operations which can be performed with a mouse.

Have you done that? Read on…

  1. Moving the mouse pointer all over the screen
  2. Left-click
  3. Right-click
  4. Middle-click
  5. Double click
  6. Triple-click :p
  7. Scroll Up
  8. Scroll Down
  9. Drag and Drop

Totally, 9 types of operations can be performed with an ordinary mouse!

Now, I will tell you one more thing. All the above 9 mouse operations can be virtually performed by Pyautogui.

Let’s see one by one.

1.Moving the mouse pointer

As a first step, let’s move our mouse pointer with our python code. We will use the moveTo() attribute.

Syntax: moveTo(destination pixel coordinates)

By following the method outlined in the previous section(The Screen), we can find the pixel coordinates of our destination. Let’s say, my destination is the windows icon. I already have the coordinates (x=22, y=741) of the windows icon.

moveTo(x=22, y=741)

Press Shift+Enter to execute the above code on your Jupyter notebook.

You will find that your mouse pointer has moved to the windows icon, no matter where the mouse pointer was initially.

Instead of an absolute position of the target, let’s say I need to move 10 pixels to right and 20 pixels to the top from my current position. This is where the moveRel() attribute is useful.

Syntax: moveRel(pixels to move horizontally, pixels to move vertically)

moveRel(10,-20)

Notice, the negative sign for y coordinate(-20). The reason is we need to move to the top of the current position.

All the above inputs can be sent virtually to our system with the help of Pyautogui.

Next up is to click on the destination.

2.Left-click

The most common activity by anyone with a laptop or PC – the left click or click. So, how to perform this simple click event with Pyautogui?

Just use the below code. It will immediately click at the current position of the mouse pointer.

click()

If you have a coordinate where you want it to click. Then feed the coordinates into the parentheses().

click(x=22, y=741)

or simply

click(22, 741)

It will click on the specified coordinate.

3.Right-click

It works similar to the click(). The code for the right click is rightClick(), beware of the capital ‘C’.

rightClick(22, 741)

4.Middle-click

We won’t use this much. Anyways, the command is middleClick(). Again it is similar to the click().

middleClick(22, 741)

5.Double click

For executing a double click, you can either use the click() command twice, else use the doubleClick().

doubleClick(22, 741)

6.Triple-click

Well, triple-click is used to select a complete sentence in a paragraph.

tripleClick(22, 741)

If you have any other applications of triple-click, please let me know in the comments.

7.Scroll Up

Below is the code to scroll the mouse wheel.

scroll(10)

Here, I have specified the value of 10 inside the () parentheses. This will scroll the mouse wheel 10 times.

8.Scroll Down

To scroll down.

scroll(-10)

The negative value signifies that we need to scroll down.

9.Drag and Drop

Let us think of what we actually do during a drag and drop event.

  • We press the left mouse button down.
  • Hold it like that.
  • Then we move the mouse to the desired location(while keeping the mouse button down).
  • Release it.

To hold the mouse button down:

mouseDown()

To move the mouse: 

moveTo(x,y)

And to release the mouse button:

mouseUp()

Now, you know what to do. Use the above code to drag and drop.

Keyboard Automation

Other than the mouse, what lies in front of us every day is the keyboard. I counted the number of keys on my keyboard.

It turned out to be 85.

I asked Google. It replied with a number 101.

No matter how many keys are there on any keyboard. But what matters here is whether keys can be virtually pressed with Pyautogui?

The answer is “yes‘.

typewrite()

And the good news is, there is only a single line of code for all virtual key presses.

Say, if I need to type “I love my readers”, I will type the below code.

typewrite("I love my readers")

We can make use of the place holder ‘\n’ for a new line to move to the next line. Like below.

 typewrite("I love my readers\n I love everyone") 

The output will be

I love my readers. 
I love everyone.

In manual typing, all our keypresses go to the blinking cursor.

So, when we are about to execute the above typewrite() method, we should make sure that the cursor is at the desired location where we want our letters to be typed.

As discussed in the previous section, use click() to position the cursor at the desired location.

Keywords

And if you want to press enter,

 typewrite(['enter']) 

Here, ‘enter‘ is a keyword for the Enter button.

Since ‘enter’ is a keyword it needs to be enclosed inside square brackets [].

[‘enter’]

Below is the repository of all the keywords of the keys in a standard keyboard.

['\t', '\n', '\r', ' ', '!', '"', '#', ',' '%', '&', "'", '(',
')', '*', '+', ',', '-', '.', '/', '0', '1', '2', '3', '4', '5', '6', '7',
'8', '9', ':', ';', '<', '=', '>', '?', '@', '[', '\\', ']', '^', '_', '`',
'a', 'b', 'c', 'd', 'e','f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o',
'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '{', '|', '}', '~',
'accept', 'add', 'alt', 'altleft', 'altright', 'apps', 'backspace',
'browserback', 'browserfavorites', 'browserforward', 'browserhome',
'browserrefresh', 'browsersearch', 'browserstop', 'capslock', 'clear',
'convert', 'ctrl', 'ctrlleft', 'ctrlright', 'decimal', 'del', 'delete',
'divide', 'down', 'end', 'enter', 'esc', 'escape', 'execute', 'f1', 'f10',
'f11', 'f12', 'f13', 'f14', 'f15', 'f16', 'f17', 'f18', 'f19', 'f2', 'f20',
'f21', 'f22', 'f23', 'f24', 'f3', 'f4', 'f5', 'f6', 'f7', 'f8', 'f9',
'final', 'fn', 'hanguel', 'hangul', 'hanja', 'help', 'home', 'insert', 'junja',
'kana', 'kanji', 'launchapp1', 'launchapp2', 'launchmail',
'launchmediaselect', 'left', 'modechange', 'multiply', 'nexttrack',
'nonconvert', 'num0', 'num1', 'num2', 'num3', 'num4', 'num5', 'num6',
'num7', 'num8', 'num9', 'numlock', 'pagedown', 'pageup', 'pause', 'pgdn',
'pgup', 'playpause', 'prevtrack', 'print', 'printscreen', 'prntscrn',
'prtsc', 'prtscr', 'return', 'right', 'scrolllock', 'select', 'separator',
'shift', 'shiftleft', 'shiftright', 'sleep', 'space', 'stop', 'subtract', 'tab',
'up', 'volumedown', 'volumemute', 'volumeup', 'win', 'winleft', 'winright', 'yen',
'command', 'option', 'optionleft', 'optionright']

Hotkeys

When you try to execute a shortcut like Ctrl+C, you need to keep pressing ‘Ctrl’ key until you press ‘C’. This is a peculiar case.

Pyautogui has a solution in the form of the hotkey attribute.

 hotkey('Ctrl', 'c') 

This will execute Ctrl+C.

For Ctrl+Alt+Shift, use the below code.

 hotkey('Ctrl', 'Alt', 'Shift') 

Just add the combination of keys inside the parentheses(). That’s all.

Now, any key on the keyboard can be pressed virtually with the above code knowledge.

Image Recognition

When you are not aware of the position of the object to be clicked, but you are aware of the object, then Image recognition comes very much handy.

Take a screenshot

Say, for example, if I need to find the position of the download icon inside the Google Chrome browser, then I will take a screenshot of the icon and save it in the name of “download.png”.

Please note that I should save the image in the same folder where we have our Jupyter notebook saved.

download icon <<– The download icon in Chrome.

locateOnScreen("download.png")
Box(left=1218, top=50, width=21, height=20)

Now, you can see that the output consists of four coordinates all specifying the four corners of the download icon.

But, I need the pixel coordinates of the center of the download icon. To do this…

center(locateOnScreen("download.png"))

I am just enclosing locateOnScreen function inside the center() function. By doing this, we will get the pixel coordinates of the center of the download icon image.

Point(x=1228, y=60)

Once, we have the coordinate, it is easy to move the mouse to the location specified by the coordinate. Hence, I will enclose the above code inside the moveTo() function which we discussed earlier.

moveTo(center(locateOnScreen("download.png")))

The moment you execute the above code, you can observe that the mouse pointer has moved to the download icon.

Now, to click on that icon.

click(moveTo(center(locateOnScreen("download.png"))))

You can find the mouse clicking on the download icon and opening a new tab. If you ever want to click on an image, then take a screenshot, save it in the same folder where the code is saved, then use the name of the image in the above code.

Conclusion

I hope this post is helpful to you people. Don’t just read and forget. Do give it a try. Try to automate simple mouse clicks and keyboard strokes.
Trust me! screen automation with Pyautogui is easy and when you succeed in it, you are a step ahead of others.

Thanks for reading, and if you have any doubts please post it in the comments section, I will be happy to help you out. If you liked this post, then give it a thumbs-up.

Also, read my other useful Python articles.

Learn how to visualize the Confusion Matrix as a Heatmap

Image recognition with Pyautogui

Leave a Comment