resources JS vs actors
-
@dusx I knew that I had over simplified the process as soon as I had posted .Oh well thanks. Sorry for gate crashing this thread , the possibility of it working had made a little me over excited .
-
@n-jones said:
I knew that I had over simplified the process as soon as I had posted .Oh well thanks. Sorry for gate crashing this thread , the possibility of it working had made a little me over excited
Just know that if I were only coding Isadora for myself, and not taking the user base in to account, getting that kind of pose detection into the program would be where I spent every second of my time. It is possible by the way, but just not "out of the box" using Javascript for the reasons @DusX mentioned. It's possible to move the tensor flow stuff into C++ I reckon, but it is something I have zero experience with.
Best Wishes,
Mark -
@dusx said:
This is likely due to Text Draw. Drawing text is a CPU heavy process.
Just to expand on the "why" of this, every character you draw is specified by a bunch of bezier curves, e.g.:
When you draw text, the CPU (not the faster GPU) has to figure out where to draw all the pixels based on that specification, i.e., inside the curve is dark and outside is light. Now imagine the computer doing that for all the letters in a long sentence. You can see why that would take a lot of CPU power.
It's especially bad when the resolution is high... again, because this happens on the CPU not the GPU. There are thousands upon thousands of pixels to fill in based on the shape of the letter.
That's why @DusX's example is very important to pay attention to.
My 'best practice' goes like this: set the 'font size' input so that the text fills the entire frame that is output by the Text Draw actor. Then reduce the 'output width' and 'output height' inputs and scale the resulting image as needed using either the Projector actor or the Matte++ actor. Rule of thumb, if there's a lot of empty space in the frame output by the Text Draw actor, then you're not being as efficient as you could be.
Understanding how to keep the text resolution as low as possible while still giving you the quality you need is key to using the Text Draw actor effectively for the best performance.
Best Wishes,
Mark -
Thank you @mark and @DusX, for the deep dive into it!
I'll play around with the text draw actor to see what's best.
But still there stays my question, why my system isn't realy using the full potential.
I added some video delay actors, and again Isadoras LOAD is going up to 120%, while the overall CPU system stays at 90% idle and GPU is at around 25%.
I'm using a MacPro 5,1 2012 with Dual Xeon X5690, 12 Core and a Gforce 1080Ti under High Sierra.
Isadora seem to use multithreading, as the 117% CPU load for the Process, shown in the activity monitor, seems to be spread over several cores (see screenshot in previous post).
Why is Isadora struggling while there seem to be alot of potential recources unused? -
There is another question rising regarding performance.
My patch contains 40 text draw actors. If I put the half of it to stand by, it reduces the LOAD to a certain extend, but far from what it does, if I take them out completly (deleting).
The same goes for the video delay actors (did not try others).
Is this usual? I thought "stand by" would be comparable to "shut off" or "bypass" and kind of restore nearly 100% of the resources used by it? -
@dillthekraut said:
I added some video delay actors, and again Isadoras LOAD is going up to 120%, while the overall CPU system stays at 90% idle and GPU is at around 25%.
The video delay actors convert the GPU based image to a the CPU. I made this choice when designing the actor because of memory constraints on the typical video cards is simply less than the amount of RAM on a system.
For example, a five second delay at 30fps of 1920x1080 images requires 1.2 Gigabytes. If you add four of those delays, you've now run out of memory on the 4 GB GPU on the GPU on my relatively powerful Mac Book pro and Isadora crashes.
Given that most systems have so much more CPU memory than GPU memory, it seemed wise to make this choice. (I expect any professional level system to have 16 GB, but really a lot of folks now have 32GB or more.) Unfortunately there is a high cost performance wise when moving images from the GPU to the CPU. That's what you're seeing when you add those video delay actors.
If I put the half of it to stand by, it reduces the LOAD to a certain extend, but far from what it does, if I take them out completely
"stand by"? I don't know what you mean. Do you mean Pause Engine???
Best Wishes,
Mark -
@dillthekraut said:
LOAD is going up to 120%
LOAD is a measure of how much time is being used to process each frame, based on the target frame rate. It is NOT a measure of your system resource usage. In Isadora it is most important to know if the scene can process at the selected framerate, LOAD provides that, a measure of 100% means that calculating/rendering the frame is taking all the time available between each frames delivery. This will lead to dropped frames.
Isadora is both multi-threaded and single-threaded. Numerous processes including video playback are very multi-threaded. Video effects, mapping, compositing etc.. are massively multi-threaded due to the use of the GPU. The scene-graph (the calculations, routing etc..) you build within your scene are single-threaded.
-
Thank you for the explanation @mark, I suspected something like that.
@mark said:If I put the half of it to stand by, it reduces the LOAD to a certain extend, but far from what it does, if I take them out completely"stand by"? I don't know what you mean. Do you mean Pause Engine???
No, I just tried to find a word for comparison for what I understood and thought the "bypass" would work. It seems, that putting "bypass" to 'on' is not the same like "deactivate actor". What I expected was a full recovery of the resources the actor would consume while NOT 'bypassed'.
E.g. LOAD without the actor at all = 50%,
adding an actor 'bypass' off = actor is working = LOAD 80%,
set actor bypass to "on" = LOAD back to 50%,
But this isn't the case, instead it is like this:LOAD without the actor at all = 50%,
adding an actor with 'bypass' off = actor is working = LOAD 80%,
set actor bypass to "on" = LOAD 65% instead of expected 50%,
It is an example only. The numbers might be different. -
@dusx thank you for clearification. I'm aware of this. But still, shoudn't there be a connection between system recources and the LOAD (resp. possible framerate and cycles)?
My question here is, what is the bottleneck if the CPU and GPU are far from beeing stressed? As there isn't any video file playing and all content is generative only or comes from videocapture, it shoudn't be the Flashdrive (1500Mbit/s).
Is it maybe the BUS system where the Data between CPU, RAM and GPU are connected? Or maybe just the RAM itself?
Marks explanation about how the video delay works, could be explained by this. -
@dillthekraut said:
maybe the BUS system where the Data between CPU, RAM and GPU are connected?
Without looking at your file I can only guess.. but for sure one that is common is moving GPU data to the CPU (up to the GPU is fast).
If you would like me to take a deeper look, please feel free to open a support request, where I can then request a copy of your project file.
-
Did it. Thank you!
-
@dillthekraut said:
My question here is, what is the bottleneck if the CPU and GPU are far from beeing stressed? As there isn't any video file playing and all content is generative only or comes from videocapture, it shoudn't be the Flashdrive (1500Mbit/s).
The speed of the hard drive has nothing to do with this issue.
As I mentioned above about the Video Delay actor, it needs to move the image from the GPU to the CPU. Then I said "Unfortunately there is a high cost performance wise when moving images from the GPU to the CPU. That's what you're seeing when you add those video delay actors."
GPUs are designed to pull data from CPU RAM very very quickly. But they are not designed to go in the other direction. (Why is this? Because GPUs are designed for gaming, not for video processing. A game never need to get the image back from the GPU, so GPUs are not designed to deal with this use case.)
In any case, when you ask the GPU to give the data back to you, it causes what's called a "stall" -- the GPU needs to finish all operations at the moment you ask for the image. Such a stall destroys the parallel processing (= threading) that makes the GPU so fast. Moreover, the CPU needs to sit and wait for all the pending GPU operations to complete.
It is possible that we could make a Video Delay actor that keeps all the frames on the GPU, which would make it more efficient. The problem is it's not trivial to find out how much memory is available in the GPU and to get the actor to fail gracefully if there isn't enough GPU memory.
Again, every frame of a 1920x1080 image consumes 8.29MB. You want a ten second delay at 30 fps? That's 30 x 10 x 8.29MB = 2.4GB. A lot of GPUs could handle this, but some could would run out of memory. It was this fact that led to my decision to keep the delayed frames in CPU RAM.
Best Wishes,
Mark -
@mark said:
GPU to the CPU
Ok, I missed that part as I wasn't aware, that the direction of data transmission would be handled differently. This means, not the BUS system is the bottleneck, but the design of the grafikcards? In this case VRAM/shared Mem is comming to my mind, but maybe this is a whole other story?
Thank you, for allways taking the time to explain in the deep.
It shows the complexity of programming live video tools and therefore the affort you put in Isadora.
This community and having the creator as intense part of it, makes the Isadora project even more one of a kind!
Thanks a lot!
-
@mark said:
That's 30 x 10 x 8.29MB = 2.4GB
I guess that's what is copied back and forth per s? Given a PCI 2.0 16x PCIe BUS Speed has 8GB/s (That's the specs of the old 5,1 Mac) throughput, this would only allow a max. 3 of those delays, right?
-
@dillthekraut said:
I guess that's what is copied back and forth per s?
No, that's not the case. You need to get 8.29 MB for each frame. For each render cycle, you're grabbing the 'video in' frame from the GPU (slow, bottleneck with stalls)and storing it in CPU RAM, but you are also grabbing one of the delayed frames in the CPU and shipping it to the GPU (fast). Assuming a frame rate of 30fps, 2 x 8.29MB x 30 = 497.4 MB per second transfer.
RE: this:
VRAM/shared Mem
Video RAM on the GPU is not shared with the CPU... at least on most existing computers with discrete graphics cards. Now, I'm actually not sure about an integrated GPUs like the Intel UHD Graphics 630 1536 MB on my computer... maybe there is some sharing there? (I just looked it up; kinda complicated.) Interestingly, I know on the new M1 chips the RAM is shared by the CPU and GPU, which might make this bottleneck go away as we transition to those machines. (Again, not 100% certain about this.... inferring this from a few things I've read.)
Best Wishes,
Mark -
@mark any thoughts on the new resizable BAR technologies that have just become available on most GPUs? Would this have an impact on Isadora's capabilities?