1. I need to automatically shoot a 10ms shot with manual settings as soon as motion is detected. As far as I understood this is easily done with the standard motion detection script.
This should be possible.
2. I need to do image processing directly on the camera (e.g. detect the size and position of a yellow circle in the image). As far as I understood this can also be done with a simple script.
This is not currently possible with script. It should be fairly straightforward with C code, you should be able to just throw your code in core/raw.c raw_process around where the shot histogram stuff is done. But see below...
3. I need to send the calculated values (e.g. circle center position) to an external PLC running a third party OS. I have read the section PTP Extension, but it is not clear to me if I can just send data (as simple bytes) over the USB instead of entire files. In case PTP would not work, are there alternatives to it (e.g. UDP over the USB line)?
You cannot just send data over the PTP connection, you need a USB host on the other end with a software stack that understands PTP and the CHDK extension. If you have this, you can send whatever data you like using the script message interface http://chdk.wikia.com/wiki/Lua/PTP_Scripting#read_usb_msg
You cannot just send arbitrary bytes over USB, the camera firmware expects a PTP host on the other end, and we have not reverse engineered the lower levels of the camera USB stack.
If you don't mind opening up the camera and soldering, you could probably use the UART: http://chdk.wikia.com/wiki/UART
If your data is very limited you could use one of the camera LEDs for output. You can also use http://chdk.wikia.com/wiki/USB_Shutter_Remote
for limited, low speed input.
4. Points 2 and 3 need to complete within 100ms, which is why I cannot wait until the camera saves the image to the CF card,
If your image analysis requires examining a large part of the raw image buffer, this will be difficult or impossible. The camera CPU is not particularly fast, just reading every pixel would take quite a bit more than 100ms.
Note that if you do not need high resolution, you don't need to shoot at all, you could just analyze the live view data used on the LCD. This is typically 240 or 480 lines depending on the camera. This is the same buffer we use for motion detection.