• Docs >
  • libcom.color_transfer
Shortcuts

libcom.color_transfer

libcom.color_transfer.color_transfer(composite_image, composite_mask)[source]

Generate composite image through copy-and-paste.

Parameters
  • composite_image (str | numpy.ndarray) – The path to composite image or the compposite image in ndarray form.

  • composite_mask (str | numpy.ndarray) – Mask of composite image which indicates the foreground object region in the composite image.

Returns

Transfered image with the same resolution as input image.

Return type

transfered image (numpy.ndarray)

Examples

>>> from libcom import color_transfer
>>> from libcom.utils.process_image import make_image_grid
>>> import cv2
>>> comp_img1  = '../tests/source/composite/1.jpg'
>>> comp_mask1 = '../tests/source/composite_mask/1.png'
>>> trans_img1 = color_transfer(comp_img1, comp_mask1)
>>> comp_img2  = '../tests/source/composite/8.jpg'
>>> comp_mask2 = '../tests/source/composite_mask/8.png'
>>> trans_img2 = color_transfer(comp_img2, comp_mask2)
>>> # visualization results
>>> grid_img  = make_image_grid([comp_img1, comp_mask1, trans_img1,
>>>                             comp_img2, comp_mask2, trans_img2], cols=3)
>>> cv2.imwrite('../docs/_static/image/colortransfer_result1.jpg', grid_img)

Expected result:

_images/colortransfer_result1.jpg

libcom.fos_score

class libcom.fos_score.FOSScoreModel(device=0, model_type='FOS_D', **kwargs)[source]

Foreground object search score prediction model.

Parameters
  • device (str | torch.device) – gpu id

  • model_type (str) – predefined model type

  • kwargs (dict) – other parameters for building model

Examples

>>> from libcom.utils.process_image import make_image_grid
>>> from libcom import FOSScoreModel
>>> import cv2
>>> import torch
>>> task_name = 'fos_score_prediction'
>>> MODEL_TYPE = 'FOS_D'
>>> background = '../tests/source/background/f80eda2459853824_m09g1w_b2413ec8_11.png'
>>> fg_bbox    = [175, 82, 309, 310] # x1,y1,x2,y2
>>> foreground = '../tests/source/foreground/f80eda2459853824_m09g1w_b2413ec8_11.png'
>>> foreground_mask = '../tests/source/foreground_mask/f80eda2459853824_m09g1w_b2413ec8_11.png'
>>> composite_image = '../tests/source/composite/f80eda2459853824_m09g1w_b2413ec8_11.png'
>>> net = FOSScoreModel(device=0, model_type=MODEL_TYPE)
>>> score = net(background, foreground, fg_bbox, foreground_mask=foreground_mask)
>>> grid_img  = make_image_grid([background, foreground, composite_image], text_list=[f'fos_score:{score:.2f}'])
>>> cv2.imshow('fos_score_demo', grid_img)

Expected result:

_images/fos_score_result3.jpg _images/fos_score_result2.jpg
__call__(background_image, foreground_image, bounding_box, foreground_mask=None)

Predicting the compatibility score between the given background and the given foreground.

Parameters
  • background_image (str | numpy.ndarray) – The path to background image or the background image in ndarray form.

  • foreground_image (str | numpy.ndarray) – The path to foreground image or the background image in ndarray form.

  • bounding_box (list) – The bounding box which indicates the foreground’s location in the background. [x1, y1, x2, y2].

  • foreground_mask (str | numpy.ndarray) – Mask of foreground image which indicates the foreground object region in the foreground image. default: None.

Returns

Predicted compatibility score between the given background image and the given foreground image.

Return type

fos_score (float)

build_pretrained_model(weight_path)[source]

Build pretrained model from path of weight.

libcom.harmony_score

class libcom.harmony_score.HarmonyScoreModel(device=0, model_type='BargainNet', **kwargs)[source]

Foreground object search score prediction model.

Parameters
  • device (str | torch.device) – gpu id

  • model_type (str) – predefined model type.

  • kwargs (dict) – other parameters for building model

Examples

>>> from libcom import HarmonyScoreModel
>>> from libcom.utils.process_image import make_image_grid
>>> import cv2
>>> net = HarmonyScoreModel(device=0, model_type='BargainNet')
>>> test_dir   = '../tests/harmony_score_prediction/'
>>> img_names  = ['vaulted-cellar-247391_inharm.jpg', 'ameland-5651866_harm.jpg']
>>> vis_list,scores = [], []
>>> for img_name in img_names:
>>>     comp_img  = test_dir + 'composite/' + img_name
>>>     comp_mask = test_dir + 'composite_mask/' + img_name
>>>     score     = net(comp_img, comp_mask)
>>>     vis_list += [comp_img, comp_mask]
>>>     scores.append(score)
>>> grid_img  = make_image_grid(vis_list, text_list=[f'harmony_score:{scores[0]:.2f}', 'composite-mask', f'harmony_score:{scores[1]:.2f}', 'composite-mask'])
>>> cv2.imwrite('../docs/_static/image/harmonyscore_result1.jpg', grid_img)

Expected result:

_images/harmonyscore_result1.jpg
__call__(composite_image, composite_mask)

Predicting the compatibility score between background and foreground in the given composite image.

Parameters
  • composite_image (str | numpy.ndarray) – The path to composite image or the compposite image in ndarray form.

  • composite_mask (str | numpy.ndarray) – Mask of composite image which indicates the foreground object region in the composite image.

Returns

Predicted harmony score within [0,1] between background region and foreground region of the given composite image. Larger harmony score implies more harmonious composite image.

Return type

harmony_score (float)

libcom.naive_composition

libcom.naive_composition.get_composite_image(foreground_image, foreground_mask, background_image, bbox, option='none')[source]

Generate composite image through copy-and-paste.

Parameters
  • foreground_image (str | numpy.ndarray) – The path to foreground image or the background image in ndarray form.

  • foreground_mask (str | numpy.ndarray) – Mask of foreground image which indicates the foreground object region in the foreground image.

  • background_image (str | numpy.ndarray) – The path to background image or the background image in ndarray form.

  • bbox (list) – The bounding box which indicates the foreground’s location in the background. [x1, y1, x2, y2].

  • option (str) – ‘none’, ‘gaussian’, or ‘poisson’. Image blending method. default: None.

Returns

Generated composite image with the same resolution as input background image. composite_mask (numpy.ndarray): Generated composite mask with the same resolution as composite image.

Return type

composite_image (numpy.ndarray)

Examples

>>> from libcom import get_composite_image
>>> from libcom.utils.process_image import make_image_grid, draw_bbox_on_image
>>> import cv2
>>> test_dir = 'source/'
>>> img_list = ['1.jpg', '8.jpg']
>>> bbox_list = [[1000, 895, 1480, 1355], [1170, 944, 2331, 3069]]
>>> for i,img_name in enumerate(img_list):
>>>     bg_img  = test_dir + 'background/' + img_name
>>>     bbox    = bbox_list[i] # x1,y1,x2,y2
>>>     fg_img  = test_dir + 'foreground/' + img_name
>>>     fg_mask = test_dir + 'foreground_mask/' + img_name.replace('.jpg', '.png')
>>>     # generate composite images by naive methods
>>>     comp_img1, comp_mask1 = get_composite_image(fg_img, fg_mask, bg_img, bbox, 'none')
>>>     comp_img2, comp_mask2 = get_composite_image(fg_img, fg_mask, bg_img, bbox, 'gaussian')
>>>     comp_img3, comp_mask3 = get_composite_image(fg_img, fg_mask, bg_img, bbox, 'poisson')
>>>     vis_list = [bg_img, fg_img, comp_img1, comp_mask1, comp_img2, comp_mask2, comp_img3, comp_mask3]
>>>     # visualization results
>>>     grid_img  = make_image_grid(vis_list, cols=4)
>>>     cv2.imwrite(f'../docs/_static/image/generatecomposite_result{i+1}.jpg', grid_img)

Expected result:

_images/generatecomposite_result1.jpg _images/generatecomposite_result2.jpg

libcom.opa_score

class libcom.opa_score.OPAScoreModel(device=0, model_type='SimOPA', **kwargs)[source]

OPA score prediction model.

Parameters
  • device (str | torch.device) – gpu id

  • model_type (str) – predefined model type.

  • kwargs (dict) – other parameters for building model

Examples

>>> from libcom import OPAScoreModel
>>> from libcom import get_composite_image
>>> from libcom.utils.process_image import make_image_grid
>>> import cv2
>>> net = OPAScoreModel(device=0, model_type='SimOPA')
>>> test_dir  = './source'
>>> bg_img    = 'source/background/17.jpg'
>>> fg_img    = 'source/foreground/17.jpg'
>>> fg_mask   = 'source/foreground_mask/17.png'
>>> bbox_list = [[475, 697, 1275, 1401], [475, 300, 1275, 1004]]
>>> comp1, comp_mask1 = get_composite_image(fg_img, fg_mask, bg_img, bbox_list[0])
>>> comp2, comp_mask2 = get_composite_image(fg_img, fg_mask, bg_img, bbox_list[1])
>>> score1 = net(comp1, comp_mask1)
>>> score2 = net(comp2, comp_mask2)
>>> grid_img  = make_image_grid([comp1, comp_mask1, comp2, comp_mask2], text_list=[f'opa_score:{score1:.2f}', 'composite-mask', f'opa_score:{score2:.2f}', 'composite-mask'])
>>> cv2.imwrite('../docs/_static/image/opascore_result1.jpg', grid_img)

Expected result:

_images/opascore_result1.jpg
__call__(composite_image, composite_mask)

Predicting the object placement assessment (opa) score for the given composite image, which evaluates the rationality of foreground object placement.

Parameters
  • composite_image (str | numpy.ndarray) – The path to composite image or the compposite image in ndarray form.

  • composite_mask (str | numpy.ndarray) – Mask of composite image which indicates the foreground object region in the composite image.

Returns

Predicted opa score ranges from 0 to 1, where a larger score indicates more reasonable placement.

Return type

opa_score (float)

libcom.image_harmonization

class libcom.image_harmonization.ImageHarmonizationModel(device=0, model_type='PCTNet', **kwargs)[source]

Image harmonization model.

Parameters
  • device (str | torch.device) – gpu id

  • model_type (str) – predefined model type, ‘PCTNet’ or ‘LBM’

  • kwargs (dict) – other parameters for building model. For LBM, you can set ‘ckpt_path’ here.

Examples

>>> from libcom import ImageHarmonizationModel
>>> import cv2
>>> import os
>>> import numpy as np
>>> from PIL import Image
>>> #Use PCTNet
>>> PCTNet = ImageHarmonizationModel(device=0, model_type='PCTNet')
>>> comp_img1  = '../tests/source/composite/comp1_PCTNet.jpg'
>>> comp_mask1 = '../tests/source/composite_mask/mask1_PCTNet.png'
>>> PCT_result1 = PCTNet(comp_img1, comp_mask1)
>>> cv2.imwrite('../docs/_static/image/image_harmonization_PCT_result1.jpg', np.concatenate([cv2.imread(comp_img1), cv2.imread(comp_mask1), PCT_result1],axis=1))
>>> #Use LBM
>>> LBM = ImageHarmonizationModel(device=0, model_type='LBM')
>>> comp_img  = '../tests/source/composite/1.jpg'
>>> comp_mask = '../tests/source/composite_mask/1.png'
>>> LBM_result = LBM(comp_img, comp_mask, steps=4)
>>> cv2.imwrite('../docs/_static/image/image_harmonization_LBM_result.jpg', np.concatenate([cv2.imread(comp_img), cv2.imread(comp_mask), LBM_result],axis=1))

Expected result:

_images/image_harmonization_PCT_result1.jpg _images/image_harmonization_LBM_result.jpg
__call__(composite_image, composite_mask, **kwargs)

Given a composite image and a foreground mask, perform harmonization on the foreground.

Parameters
  • composite_image (str | numpy.ndarray) – The path to composite image or the compposite image in ndarray form.

  • composite_mask (str | numpy.ndarray) – Mask of composite image which indicates the foreground object region in the composite image.

  • **kwargs – Extra parameters for inference (e.g., steps=4, resolution=1024 for LBM).

Returns

The harmonized result.

Return type

harmonized_image (np.array)

libcom.inharmonious_region_localization

class libcom.inharmonious_region_localization.InharmoniousLocalizationModel(device=0, model_type='IHDRNet', **kwargs)[source]

Inharmonious region localization model.

Parameters
  • device (str | torch.device) – gpu id

  • model_type (str) – predefined model type

  • kwargs (dict) – other parameters for building model

Examples

>>> from libcom import InharmoniousLocalizationModel
>>> import cv2
>>> net = InharmoniousLocalizationModel(device=0)
>>> comp_img1  = '../tests/source/composite/comp1_MadisNet.png'
>>> inharmonious_localization1 = net(comp_img1)
>>> comp_img2  = '../tests/source/composite/comp2_MadisNet.png'
>>> inharmonious_localization2 = net(comp_img2)
>>> cv2.imwrite('../docs/_static/image/inharmonious_localization_result1.jpg', np.concatenate([cv2.resize(cv2.imread(comp_img1),(256,256)), inharmonious_localization1],axis=1))
>>> cv2.imwrite('../docs/_static/image/inharmonious_localization_result2.jpg', np.concatenate([cv2.resize(cv2.imread(comp_img2),(256,256)), inharmonious_localization2],axis=1))

Expected result:

_images/inharmonious_localization_result3_4.jpg
__call__(composite_image)

Given a composite image, predict the mask of the inharmonious region.

Parameters

composite_image (str | numpy.ndarray) – The path to composite image or the compposite image in ndarray form.

Returns

The inharmonious mask.

Return type

inharmonious_mask (np.array)

libcom.painterly_image_harmonization

class libcom.painterly_image_harmonization.PainterlyHarmonizationModel(device=0, model_type='PHDNet', **kwargs)[source]

Painterly image harmonization prediction model.

Parameters
  • device (str | torch.device) – gpu id

  • model_type (str) – predefined model type

  • kwargs (dict) – use_residual (bool): whether to use adapter with residual or not for PHDiffusion

Examples

>>> from libcom.utils.process_image import make_image_grid
>>> from libcom import PainterlyHarmonizationModel
>>> import cv2
>>> import torch
>>> task_name = 'painterly_image_harmonization'
>>> MODEL_TYPE = 'PHDNet' # choose from 'PHDNet', 'PHDiffusion'
>>> comp_img = '../tests/painterly_harmonization_source/composite/3.png'
>>> comp_mask = '../tests/painterly_harmonization_source/composite_mask/3.png'
>>> net = PainterlyHarmonizationModel(device=0, model_type=MODEL_TYPE)
>>> output_img = net(comp_img, comp_mask)
>>> grid_img = make_image_grid([comp_img, comp_mask, output_img])
>>> cv2.imshow('painterly_image_harmonization_demo', grid_img)

Expected result:

_images/painterly_image_harmonization_result2.jpg _images/painterly_image_harmonization_result3.jpg
__call__(composite_image, composite_mask, sample_steps=50, strength=0.7, random_seed=None)

Generating the harmonized image for the given composite image and the corresponding composite mask.

Parameters
  • composite_image (str | numpy.ndarray) – The path to the composite image or the composite image in ndarray form.

  • composite_mask (str | numpy.ndarray) – The path to the composite mask or the composite mask in ndarray form.

  • sample_steps (int) – Default total step in the inference process of PHDiffusion.

  • strength (float) – A hyper-parameter that decides the total step (strength * sample_steps) for PHDiffusion.

Returns

Generated harmonized image for the given composite image and the corresponding composite mask, with BGR channel.

Return type

preds (numpy.ndarray)

libcom.fopa_heat_map

class libcom.fopa_heat_map.FOPAHeatMapModel(device=0, model_type='fopa', **kwargs)[source]

Generate a heatmap for a pair of scaled foreground and background.

Parameters
  • device (str | torch.device) – gpu id

  • model_type (str) – predefined model type

Examples

>>> test_set = get_test_list_fopa_heatmap()
>>> result_dir = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'results', task_name)
>>> if os.path.exists(result_dir):
>>>     shutil.rmtree(result_dir)
>>> os.makedirs(result_dir, exist_ok=True)
>>> os.makedirs(os.path.join(result_dir, 'grid'), exist_ok=True)
>>> print(f'begin testing {task_name}...')
>>> net = FOPAHeatMapModel(device=0)
>>> for pair in test_set[:1]:
>>>     fg_img, fg_mask, bg_img = pair['foreground'], pair['foreground_mask'], pair['background']
>>>     bboxes, heatmaps = net(fg_img, fg_mask, bg_img, cache_dir=os.path.join(result_dir, 'cache'), heatmap_dir=os.path.join(result_dir, 'heatmap'))
>>>     img_name  = os.path.basename(bg_img).replace('.png', '.jpg')
>>>     grid_img  = make_image_grid([bg_img, fg_img, heatmaps[0]])
>>>     res_path  = os.path.join(result_dir, 'grid', img_name)
>>>     cv2.imwrite(res_path, grid_img)
>>>     print('save result to ', res_path)
>>> print(f'end testing {task_name}!')

Expected result:

_images/fopa_heatmap_FOPA_result1.png _images/fopa_heatmap_FOPA_result3.png
__call__(foreground_image, foreground_mask, background_image, cache_dir, heatmap_dir, fg_scale_num=16, composite_num_choose=3, composite_num=50)

Generate a heatmap for a pair of scaled foreground and background.

Parameters
  • foreground_image – foreground image path

  • foreground_mask – foreground mask path

  • background_image – background image path

  • cache_dir – folder path where scaled foreground images, scaled mask images and composite images are stored

  • heatmap_dir – folder path where heatmaps are stored

  • fg_scale_num – number of scales of scaled foreground images and mask images

  • composite_num_choose – the number of chosen composite images

  • composite_num – the number of composite images with the highest score

Returns

the path of concatenated background image, foreground image and corresponding heatmap heatmap_list: the path of heatmaps

Return type

box_list

libcom.os_insert

class libcom.os_insert.OSInsertModel(device: str = 'cuda:0', model_dir: Optional[Union[str, Path]] = None, *, eager_aggressive_init: bool = False, objectstitch_ckpt_path: Optional[Union[str, Path]] = None, objectstitch_config_path: Optional[Union[str, Path]] = None, objectstitch_clip_dir: Optional[Union[str, Path]] = None, sam_checkpoint: Optional[Union[str, Path]] = None, flux_fill_path: Optional[Union[str, Path]] = None, flux_redux_path: Optional[Union[str, Path]] = None, ia_lora_path: Optional[Union[str, Path]] = None)[source]

High-level OSInsert interface.

This model provides a unified interface for object insertion with two modes: conservative and aggressive. It internally combines multiple sub-models such as InsertAnything, ObjectStitch, and SAM.

Modes

  • aggressive:

    ObjectStitch + SAM + InsertAnything pipeline. Suitable for more complex and flexible compositions.

  • conservative:

    Directly uses background + bbox to generate mask, then performs insertion via InsertAnything. Faster and more stable.

param device

Device to run the model on (e.g., “cuda:0”, “cpu”).

type device

str

param model_dir

Root directory of all model checkpoints.

type model_dir

str | Path | None

param eager_aggressive_init

If True, preload ObjectStitch and SAM models at initialization. Otherwise, they will be lazily loaded when first used.

type eager_aggressive_init

bool

param objectstitch_ckpt_path

Path to ObjectStitch checkpoint.

type objectstitch_ckpt_path

str | Path | None

param objectstitch_config_path

Path to ObjectStitch config file.

type objectstitch_config_path

str | Path | None

param objectstitch_clip_dir

Path to CLIP model directory used by ObjectStitch.

type objectstitch_clip_dir

str | Path | None

param sam_checkpoint

Path to SAM (Segment Anything Model) checkpoint.

type sam_checkpoint

str | Path | None

param flux_fill_path

Path to Flux Fill model directory.

type flux_fill_path

str | Path | None

param flux_redux_path

Path to Flux Redux model directory.

type flux_redux_path

str | Path | None

param ia_lora_path

Path to LoRA weights for InsertAnything.

type ia_lora_path

str | Path | None

Notes

  • InsertAnything is initialized during class construction.

  • ObjectStitch and SAM are lazily initialized (unless eager_aggressive_init=True), and then cached for reuse.

  • Conservative mode does not require ObjectStitch.

Examples

>>> import cv2
>>> from libcom import OSInsertModel
>>> model = OSInsertModel(
>>>     device="cuda:0"
>>> )
>>> bg = cv2.imread("tests/osinsert/background/Demo_0.png")
>>> fg = cv2.imread("tests/osinsert/foreground/Demo_0.png")
>>> fg_mask = cv2.imread(
>>>     "tests/osinsert/foreground_mask/Demo_0.png",
>>>     cv2.IMREAD_GRAYSCALE
>>> )
>>> bbox = (175, 184, 363, 372)
>>> result = model.infer_images(
>>>     background=bg,
>>>     foreground=fg,
>>>     foreground_mask=fg_mask,
>>>     bbox_xyxy=bbox,
>>>     mode="conservative",   # or "aggressive"
>>>     verbose=False,
>>>     seed=123,
>>>     strength=1.0,
>>>     split_ratio=0.33,
>>>     save_path="result_dir/conservative",
>>> )
Expected result:

The foreground object is inserted into the background image at the specified bounding box, with realistic blending.

_images/os_insert_result.jpg
__call__(background_path: str | pathlib.Path, foreground_path: str | pathlib.Path, foreground_mask_path: str | pathlib.Path, bbox: list[int], result_dir: str | pathlib.Path, mode: Literal['aggressive', 'conservative'] = 'conservative', cleanup_intermediate: bool = True, verbose: bool = False, seed: int = 123, strength: float = 1.0, split_ratio: float = 0.5) numpy.ndarray | None[source]

Run a single OSInsert inference.

Parameters
  • background_path – Path to the background image.

  • foreground_path – Path to the foreground image used as the InsertAnything reference image.

  • foreground_mask_path – Binary mask for the foreground image.

  • bbox – List containing [x1, y1, x2, y2], specifying the insertion region on the background image.

  • result_dir – Directory where the final composed image will be written.

  • mode

    • "conservative": background + bbox -> mask -> InsertAnything.

    • "aggressive": ObjectStitch + SAM -> combined source/mask -> InsertAnything.

  • cleanup_intermediate – Deprecated. Present for backward compatibility.

  • verbose – If True, save intermediate artifacts into result_dir/intermediates. Default False (do not save intermediates).

  • seed – Random seed for InsertAnything.

  • strength – InsertAnything strength parameter.

  • Returns – Generated composited image (np.array): The inserted result.

libcom.kontext_blending_harmonization

class libcom.kontext_blending_harmonization.KontextBlendingHarmonizationModel(device=0, model_type='Kontext_blend', **kwargs)[source]

Flux Kontext based image blending and harmonization model.

Parameters
  • device (str | torch.device) – gpu id

  • model_type (str) – predefined model type. “Kontext_blend” refers to the version fintuned on the image blending task. “Kontext_harm” refers to the version finetuned on the image harmonization task. default: “Kontext_blend”

  • kwargs (dict) – other parameters for building model

Examples

>>> from libcom import KontextBlendingHarmonizationModel
>>> from libcom.utils.process_image import make_image_grid, draw_bbox_on_image
>>> import cv2
>>> net = KontextBlendingHarmonizationModel(device=0, model_type="Kontext_blend")
>>> img_names = ["000000049931.png", "000000460450.png", "6c5601278dcb5e6d_m09728_f5cd2891_17.png"]
>>> bboxes = [[168, 137, 488, 413], [134, 158, 399, 511], [130, 91, 392, 271]]
>>> test_dir  = 'tests/controllable_composition/'
>>> for i in range(len(img_names)):
>>>     bg_img  = test_dir + 'background/' + img_names[i]
>>>     fg_img  = test_dir + 'foreground/' + img_names[i]
>>>     bbox    = bboxes[i]
>>>     mask    = test_dir + 'foreground_mask/' + img_names[i]
>>>     comp    = net(bg_img, fg_img, bbox, mask)
>>>     bg_img  = draw_bbox_on_image(bg_img, bbox)
>>>     grid_img = make_image_grid([bg_img, fg_img, comp[0]])
>>>     cv2.imwrite('../docs/_static/image/kontext_result{}.jpg'.format(i+1), grid_img)

Expected result:

_images/kontext_result1.jpg _images/kontext_result2.jpg
__call__(background_image, foreground_image, bbox, foreground_mask=None, prompt='put it here', num_samples=1, sample_steps=28, guidance_scale=2.5, seed=321)

Kontext based image blending and harmonization.

Parameters
  • background_image (str) – The path to background image.

  • foreground_image (str) – The path to foreground image.

  • bbox (list) – The bounding box which indicates the foreground’s location in the background. [x1, y1, x2, y2].

  • foreground_mask (None | str) – Mask of foreground image which indicates the foreground object region in the foreground image. default: None.

  • prompt (str) – The text prompt to guide the image generation. default: ‘put it here’.

  • num_samples (int) – Number of images to be generated for each task. default: 1.

  • sample_steps (int) – Number of denoising steps. The recommended setting is 28 for FlowMatchEulerDiscreteScheduler. default: 28.

  • guidance_scale (int) – Scale in classifier-free guidance (minimum: 1; maximum: 20). default: 2.5.

  • seed (int) – Random Seed is used to reproduce results and same seed will lead to same results.

Returns

Generated images with a shape of 512x512x3 or Nx512x512x3, where N indicates the number of generated images.

Return type

composite_images (numpy.ndarray)

libcom.reflection_generation

class libcom.reflection_generation.ReflectionGenerationModel(device=0, model_type='ReflectionGeneration', **kwargs)[source]

Foreground reflection generation model based on diffusion model and control net.

Parameters
  • device (str | torch.device) – gpu id

  • model_type (str) – predefined model type

  • kwargs (dict) – other parameters for building model

Examples

>>> from libcom import ReflectionGenerationModel
>>> from libcom.utils.process_image import make_image_grid
>>> import cv2
>>> net = ReflectionGenerationModel(device=2, model_type='ReflectionGeneration')
>>> comp_image1 = "../tests/reflection_generation/composite/1.png"
>>> comp_mask1 = "../tests/reflection_generation/composite_mask/1.png"
>>> preds = net(comp_image1, comp_mask1, number=5)
>>> grid_img  = make_image_grid([comp_image1, comp_mask1] + preds)
>>> cv2.imwrite('../docs/_static/image/reflection_generation_result1.jpg', grid_img)
>>> comp_image2 = "../tests/reflection_generation/composite/2.png"
>>> comp_mask2 = "../tests/reflection_generation/composite_mask/2.png"
>>> preds = net(comp_image2, comp_mask2, number=5)
>>> grid_img  = make_image_grid([comp_image2, comp_mask2] + preds)
>>> cv2.imwrite('../docs/_static/image/reflection_generation_result2.jpg', grid_img)

Expected result:

_images/reflection_generation1.jpg _images/reflection_generation2.jpg
__call__(composite_image, composite_mask, number=5, seed=42)

Generate reflection for foreground object.

Parameters
  • composite_img (str | numpy.ndarray) – The path to composite image or composite image in ndarray form.

  • composite_mask (str | numpy.ndarray) – The path to foreground object mask or foreground object mask in ndarray form.

  • number (int) – Number of images to be inferenced. default: 5.

  • seed – Random Seed is used to reproduce results and same seed will lead to same results.

Returns

A list of images with generated foreground reflections. Each image is in ndarray form with a shape of 512x512x3

Return type

generated_images (list)

libcom.shadow_generation

class libcom.shadow_generation.ShadowGenerationModel(device=0)[source]

Foreground Shadow generation model based on diffusion model.

Parameters

device (str | torch.device) – gpu id

Examples

>>> from libcom import ShadowGenerationModel
>>> from libcom.utils.process_image import make_image_grid
>>> import cv2
>>> net = ShadowGenerationModel()
>>> comp_image1 = "../tests/shadow_generation/composite/1.png"
>>> comp_mask1 = "../tests/shadow_generation/composite_mask/1.png"
>>> preds = net(comp_image1, comp_mask1, number=5)
>>> grid_img  = make_image_grid([comp_image1, comp_mask1] + preds)
>>> cv2.imwrite('../docs/_static/image/shadow_generation_result1.jpg', grid_img)
>>> comp_image2 = "../tests/shadow_generation/composite/2.png"
>>> comp_mask2 = "../tests/shadow_generation/composite_mask/2.png"
>>> preds = net(comp_image2, comp_mask2, number=5)
>>> grid_img  = make_image_grid([comp_image2, comp_mask2] + preds)
>>> cv2.imwrite('../docs/_static/image/shadow_generation_result2.jpg', grid_img)

Expected result:

_images/shadow_generation_result1.jpg _images/shadow_generation_result2.jpg
__call__(shadowfree_img, object_mask, number=5)[source]

Generate shadow for foreground object.

Parameters
  • shadowfree_img (str | numpy.ndarray) – The path to composite image or composite image in ndarray form.

  • object_mask (str | numpy.ndarray) – The path to foreground object mask or foreground object mask in ndarray form.

  • number (int) – Number of images to be inferenced. default: 5.

Returns

A list of images with generated foreground shadows. Each image is in ndarray form with a shape of 512x512x3

Return type

generated_images (list)

post_process(decoded, shadowfree_img, object_mask)[source]

decoded: np.uint8 HWC [0-255] RGB shadowfree_img: np.uint8 HWC [0-255] RGB object_mask: np.uint8 HW [0-255]