After last commit (20099) warping meshes got slower (more quality == less performance). Since we don't need an extra warping for truncated domes, It's better to handle them directly in openGL without the need of warping it.
I'll talk with some Dome owners to see if we need both Upright and Downright modes. I may remove one of them by 2.49 them.
*) also: a proper GLEW_EXT_framebuffer_object check before generating FBO (for warping meshes).
**) next in line (maybe after RC2): tilt option to tilt the camera up to 90º upward.
We are using an image twice as big to render the fisheye before warping.
It'll slow down warping meshes a little, but we get way more resolution.
Therefore I will bring Truncated Dome mode back in order to avoid using warping mesh for that.
not fixed but the problem is now less bad when projection painting, bilinear interpolation was rounding down.
- added gameOb.attrDict to get the internal gameObject dict.
- mesh.getVertex wasnt setting an exception.
This commit extend the technique of dynamic linked list to the mesh
slots so as to eliminate dumb scan or map lookup. It provides massive
performance improvement in the culling and in the rasterizer when
the majority of objects are static.
Other improvements:
- Compute the opengl matrix only for objects that are visible.
- Simplify hash function for GEN_HasedPtr
- Scan light list instead of general object list to render shadows
- Remove redundant opengl calls to set specularity, shinyness and diffuse
between each mesh slots.
- Cache GPU material to avoid frequent call to GPU_material_from_blender
- Only set once the fixed elements of mesh slot
- Use more inline function
The following table shows the performance increase between 2.48, 1st round
and this round of improvement. The test was done with a scene containing
40000 objects, of which 1000 are in the view frustrum approximately. The
object are simple textured cube to make sure the GPU is not the bottleneck.
As some of the rasterizer processing time has moved under culling, I present
the sum of scenegraph(includes culling)+rasterizer time
Scenegraph+rasterizer(ms) 2.48 1st round 3rd round
All objects static, 323.0 86.0 7.2
all visible, 1000 in
the view frustrum
All objects static, 219.0 49.7 N/A(*)
all invisible.
All objects moving, 323.0 105.6 34.7
all visible, 1000 in
the view frustrum
Scene destruction 40min 40min 4s
(*) : this time is not representative because the frame rate was at 60fps.
In that case, the GPU holds down the GE by frame sync. By design, the
overhead of the rasterizer is 0 when the the objects are invisible.
This table shows a global speed up between 9x and 45x compared to 2.48a
for scenegraph, culling and rasterizer overhead. The speed up goes much
higher when objects are invisible.
An additional 2-4x speed up is possible in the scenegraph by upgrading
the Moto library to use Eigen2 BLAS library instead of C++ classes but
the scenegraph is already so fast that it is not a priority right now.
Next speed up in logic: many things to do there...
also deprecated getActuators() and getSensors() for 'sensors' and 'actuators' attributes.
an example of getting every sensor connected to an object.
all_sensors = [s for c in ob.controllers for s in c.sensors]
When enabled, this option converts any positive trigger from the sensor
into a pair of positive+negative trigger, with the negative trigger sent
in the next frame. The negative trigger from the sensor are not passed
to the controller as the option automatically generates the negative triggers.
From the controller point of view, the sensor is positive only for 1 frame,
even if the underlying sensor state remains positive.
The option interacts with the other sensor option in this way:
- Level option: tap option is mutually exclusive with level option. Both
cannot be enabled at the same time.
- Invert option: tap option operates on the negative trigger of the
sensor, which are converted to positive trigger by the invert option.
Hence, the controller will see the sensor positive for 1 frame when
the underlying sensor state turns negative.
- Positive pulse option: tap option adds a negative trigger after each
repeated positive pulse, unless the frequency option is 0, in which case
positive pulse are generated on every frame as before, as long as the
underlying sensor state is positive.
- Negative pulse option: this option is not compatible with tap option
and is ignored when tap option is enabled.
Notes:
- Keyboard "All keys" is handled specially when tap option is set:
There will be one pair of positive/negative trigger for each new
key press, regardless on how many keys are already pressed and there
is no trigger when keys are released, regardless if keys are still
pressed.
In case two keys are pressed in succesive frames, there will
be 2 positive triggers and 1 negative trigger in the following frame.
Use dynamic linked list to handle scenegraph rather than dumb scan
of the whole tree. The performance improvement depends on the fraction
of moving objects. If most objects are static, the speed up is
considerable. The following table compares the time spent on
scenegraph before and after this commit on a scene with 10000 objects
in various configuratons:
Scenegraph time (ms) Before After
(includes culling)
All objects static, 8.8 1.7
all visible but small fraction
in the view frustrum
All objects static, 7,5 0.01
all invisible.
All objects moving, 14.1 8.4
all visible but small fraction
in the view frustrum
This tables shows that static and invisible objects take no CPU at all
for scenegraph and culling. In the general case, this commit will
speed up the scenegraph between 2x and 5x. Compared to 2.48a, it should
be between 4x and 10x faster. Further speed up is possible by making
the scenegraph cache-friendly.
Next round of performance improvement will be on the rasterizer: use
the same dynamic linked list technique for the mesh slots.
patch from Alex Fraser (z0r)
eg.
- vec.xyz = vec.zyx
- vec.xy = vec.zw
- vec.xxy = vec.wzz
- vec.yzyz = vec.yxyx
See http://en.wikipedia.org/wiki/Swizzling_(computer_graphics)
made some minor modifications to this patch.
tested access times and adding 336 attributes to vectors doesn't make a noticeable differences to speed of existing axis attributes (x,y,z,w) - thanks to python dict lookups.
Old bug (2.42): when using node material, transparent shadow did not work.
It was missing to set the proper 'pass flag'.
Do note an important difference with non-node materials for 'transparent shadow'.
If there are no nodes, it uses the color from the unshaded material. When it has
nodes, it uses the color output from the entire node tree, which is typically
from shaded materials. The latter is because node shaders have no support for
shade passes yet (it only outputs rgb + a).
Node editor didn't support editing non-material texture node trees.
Campbell pointed me to fact it's been used already, like for brush
painting. However, this only worked via linking the texture to a
material... hackish stuff.
Now the Node Editor supports all other Textures too, with three extra
icon buttons to define which.
- Active Object: for textures linked to Materials or Lamps
- World: textures from Scene world.
- Brush: textures from active Brush
The latter can only be set and used when in Paint or Sculpt mode:
- Paint mode: in Image window, Paint Tool panel, set active brush
- Sculpt mode: in EditButtons, Texture panel, select empty slot, add texture.
Note that refreshes of previews in Node Editor is not always happening on
switching contextes. Just click a socket to refresh view.
Texture Nodes: the option to use nodes was also visible for textures
on world, lamp, brush. Since it only works for Materials now, I've made
the button disappear then.
- when the attribute check function failed it didnt set an error raising a SystemError instead
- Rasterizer.getMaterialMode would never return KX_BLENDER_MULTITEX_MATERIAL
- PropertySensor value attribute checking function was always returning a fail.
- Vertex Self Shadow python script didnt update for meshes with modifiers.
- Added support for any number of attributes, this means packages are supported automatically.
so as well as "myModule.myFunc" you can do "myPackage.myModule.myFunc", nested packages work too.
- pass the controller to the python function as an argument for functions that take 1 arg, this check is only done at startup so it wont slow things down.
added support for
- Vast performance increase when removing scene containing large number of
objects: the sensor/controller map was updated for each deleted object,
causing massive slow down when the number of objects was large (O(n^2)).
- Use reference when scanning the sensor map => avoid useless copy.
- Remove dynamically the object bounding box from the DBVT when the object
is invisible => faster culling.
This function sets the maximum number of logic frame executed per render frame.
Valid values: 1..5
This function is useful to control the amount of processing consumed by logic.
By default, up to 5 logic frames can be executed per render frame. This is fine
as long as the time spent on logic is negligible compared to the render time.
If it's not the case, the default value will drag the performance of the game
down by executing unnecessary logic frames that take up most of the CPU time.
You can avoid that by lowering the value with this function.
The drawback is less precision in the logic system to physics and I/O activity.
Note that it does not affect the physics system: physics will still run
at full frame rate (actually up to 5 times the ticrate).
You can further control the render frame rate with GameLogic.setLogicTicRate().